[Libguestfs] Libguestfs Failure on latest Ubuntu 22.04 LTS
Laszlo Ersek
lersek at redhat.com
Tue Mar 21 06:35:35 UTC 2023
Hi Justin,
On 3/20/23 16:47, Justin Churchey wrote:
> Hello Laszlo,
>
> Thank you for the rundown. I enabled the
> additional LIBGUESTFS_BACKEND_SETTINGS, and I have attached a follow up
> to the libguestfs-test-tool output.
Your computer has faulty RAM.
Your libguestfs-test-tool log file contains the following line (read it
very carefully):
LIBGUESTFS_BANKEND_SETTINGS=force_tcg
I was staring my eyes out at your log, not understanding why the
"force_tcg" setting wouldn't take effect -- because it didn't, the log
file confirms the repeated test run still used KVM.
That was when I copied and pasted the above line (before the equal sign)
into a git-grep, and then a "git log -S". It turns out that "whatever"
variable name was captured in the libguestfs-test-tool log, libguestfs
never checks that variable, worse, libguestfs has *never* checked it
over its entire history.
So then I thought, "aha, Justin must have typed the variable name from
memory, instead of using the clipboard". But that's not possible: even
if you mistyped the variable name when setting the environment,
libguestfs-test-tool would not look for that (misnamed) variable, and
log it!
So the only explanation is that your RAM is faulty; a single character
in the variable name got corrupted in this instance (C -> N):
LIBGUESTFS_BACKEND_SETTINGS
LIBGUESTFS_BANKEND_SETTINGS
^
With faulty RAM, there's nothing more to investigate here; the guest
kernel crash (page fault) can be trivially explained by a pointer
getting corrupted and pointing into outer space.
I suggest running MemTest86 or MemTest86+.
(NB, faulty RAM is not as infrequent as one would think. In my life, if
I count right, this is actually the third occasion that I've determined
faulty RAM for a user -- not necessarily via the same program /
misbehavior, of course. Also I think a faulty disk is much less likely:
non-ECC RAM exists, but disks without redundancy checks don't /
shouldn't exist, as far as I know.)
Laszlo
>
> I also checked out my CPU settings (cat /proc/cpuinfo output attached),
> and the host does appear to support PCLMULQDQ (AMD Ryzen 7 5700X). I
> also checked the cpuinfo in one of the guests I have created (Ubuntu
> 18.04, unstable due to intermittent kernel panics), and the cpuinfo
> indicates that this feature seems to be passed down to my guest as well.
>
> I noticed that the libguestfs-test-tool didn't seem to like the qemu
> settings it tried to boot with. So, I went back to basics and built a
> disk using qemu-img (qcow2) and utilized qemu-system-x86_64 to do the
> base install (Ubuntu 18.04). The resulting image boots and I import the
> resulting image with virt-install. However, the GUI/console seems to
> want to lock up shortly after boot if I am using virt-tools. The guest
> seems more stable when I boot it directly with `qemu-system,` and this
> may be my workaround for now.
>
> In virt-tools, I can consistently get a panic on the guest by trying to
> enable the qemu-guest-agent: `systemctl enable qemu-guest-agent.`
> Unfortunately, I cannot get the full output from that panic (attached).
> It would seem that this problem is more than just libguestfs-tools. Is
> there a KVM listserv that this might be more appropriate for?
>
> Sincerely,
>
> On Mon, Mar 20, 2023 at 1:31 AM Laszlo Ersek <lersek at redhat.com
> <mailto:lersek at redhat.com>> wrote:
>
> On 3/17/23 16:10, Justin Churchey wrote:
> > Hello Everyone,
> >
> > I was having some difficulties converting OVA images yesterday. At
> > first, I thought it may have been a compatibility issue with
> > VirtualBox 7.0. However, when I went to run libguestfs-test-tool, it
> > began failing with the exact same error as the conversions, which
> > leads me to believe the issue may lie with libguestfs and not the
> > images themselves.
> >
> > To test further, I created a fresh install of Ubuntu 22.04, and the
> > libguestfs-test-tool seems to fail with the same error, even on a
> > fresh install. I am attaching the libguestfs-test-tool output for
> > reference.
> >
> > Ubuntu 22.04 is running libguestfs-tools 1.46.2-10ubuntu3
> >
> > If anybody has any insight into the issue, or if you feel a bug report
> > needs to be filed, please let me know.
>
> Your appliance kernel crashes.
>
> Here's my theory on why this might happen, based on your log.
>
> The guestfish appliance runs with KVM acceleration.
>
> The crash happens after/while inserting the modules crc32-pclmul.ko and
> crct10dif-pclmul.ko.
>
> The "pclmul" in the names of those modules indicates that these modules
> calculate various (crc32) checksums with the PCLMULQDQ instruction. I
> believe that PCLMULQDQ is an advanced / accelerated instruction and not
> all CPUs may support it.
>
> Your appliance guest is started with "-cpu max" on the QEMU command line
> (from libguestfs commit 30f74f38bd6e, "appliance: Use -cpu max.",
> 2021-01-28). This is probably why the appliance kernel thinks PCLMULQDQ
> is available.
>
> I think the PCLMULQDQ instruction may cause an issue here. I don't know
> why it misbehaves under KVM, but that's my suspicion anyway.
>
> Note that the kernel crash log provides the following instruction
> (assembly binary) dump:
>
> 46 70 48 8b 56 68 48 03 97 90 01 00 00 48 c1 e0 06 48 03 46 20 48 89 97
> 08 02 00 00 48 be ab aa aa aa aa aa aa aa 48 8b 48 10 <48> 89 0a 48 8b
> 50 20 48 8b 8f 08 02 00 00 48 89 d0 48 f7 e6 48 c1
>
> with the instruction starting at <48> causing the page fault, as the
> direct symptom. Now, we can disassemble this:
>
> printf \
> '%b' \
>
> '\x46\x70\x48\x8b\x56\x68\x48\x03\x97\x90\x01\x00\x00\x48\xc1\xe0\x06\x48\x03\x46\x20\x48\x89\x97\x08\x02\x00\x00\x48\xbe\xab\xaa\xaa\xaa\xaa\xaa\xaa\xaa\x48\x8b\x48\x10\x48\x89\x0a\x48\x8b\x50\x20\x48\x8b\x8f\x08\x02\x00\x00\x48\x89\xd0\x48\xf7\xe6\x48\xc1' \
> > bin
>
> $ ndisasm -b64 bin
>
> 00000000 467048 jo 0x4b
> 00000003 8B5668 mov edx,[rsi+0x68]
> 00000006 48039790010000 add rdx,[rdi+0x190]
> 0000000D 48C1E006 shl rax,byte 0x6
> 00000011 48034620 add rax,[rsi+0x20]
> 00000015 48899708020000 mov [rdi+0x208],rdx
> 0000001C 48BEABAAAAAAAAAA mov rsi,0xaaaaaaaaaaaaaaab
> -AAAA
> 00000026 488B4810 mov rcx,[rax+0x10]
> 0000002A 48890A mov [rdx],rcx <----------- crash
> 0000002D 488B5020 mov rdx,[rax+0x20]
> 00000031 488B8F08020000 mov rcx,[rdi+0x208]
> 00000038 4889D0 mov rax,rdx
> 0000003B 48F7E6 mul rsi
> 0000003E 48 rex.w
> 0000003F C1 db 0xc1
>
> Note the constant 0xaaaaaaaaaaaaaaab; that seems very special. We can
> search the kernel tree for it (I'm not bothering about checking out the
> particular ubuntu kernel version for now):
>
> $ git grep -i aaaaaaaaaaaaaaab
> arch/x86/math-emu/poly_atan.c:/* 0xaaaaaaaaaaaaaaabLL, transferred
> to fixedpterm[] */
> arch/x86/math-emu/poly_sin.c: 0xaaaaaaaaaaaaaaabLL,
> arch/x86/math-emu/poly_tan.c:static const unsigned long long
> twothirds = 0xaaaaaaaaaaaaaaabLL;
>
> In particular, in the last file (poly_tan.c) contains a snippet like
>
> mul64_Xsig(&accum, &twothirds);
>
> which seems vagely related to
>
> 0000001C 48BEABAAAAAAAAAA mov rsi,0xaaaaaaaaaaaaaaab
> -AAAA
> ...
> 0000003B 48F7E6 mul rsi
>
> Now this does not seem connected to PCLMULQDQ, but it does somehow look
> connected to multiplication.
>
> I don't really know where to go with this, except for asking KVM
> experts.
>
> For now, can you try:
>
> export LIBGUESTFS_BACKEND_SETTINGS=force_tcg
>
> from <https://libguestfs.org/guestfs.3.html#backend-settings
> <https://libguestfs.org/guestfs.3.html#backend-settings>>, and see
> if that makes a difference?
>
> Laszlo
>
More information about the Libguestfs
mailing list