[libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

Kashyap Chamarthy kchamart at redhat.com
Wed Feb 7 15:31:08 UTC 2018


[Cc: KVM upstream list.]

On Tue, Feb 06, 2018 at 04:11:46PM +0100, Florian Haas wrote:
> Hi everyone,
> 
> I hope this is the correct list to discuss this issue; please feel
> free to redirect me otherwise.
> 
> I have a nested virtualization setup that looks as follows:
> 
> - Host: Ubuntu 16.04, kernel 4.4.0 (an OpenStack Nova compute node)
> - L0 guest: openSUSE Leap 42.3, kernel 4.4.104-39-default
> - Nested guest: SLES 12, kernel 3.12.28-4-default
> 
> The nested guest is configured with "<type arch='x86_64'
> machine='pc-i440fx-1.4'>hvm</type>".
> 
> This is working just beautifully, except when the L0 guest wakes up
> from managed save (openstack server resume in OpenStack parlance).
> Then, in the L0 guest we immediately see this:

[...] # Snip the call trace from Florian.  It is here:
https://www.redhat.com/archives/libvirt-users/2018-February/msg00014.html
 
> What does fix things, of course, is to switch from the nested guest
> from KVM to Qemu — but that also makes things significantly slower.
> 
> So I'm wondering: is there someone reading this who does run nested
> KVM and has managed to successfully live-migrate or managed-save? If
> so, would you be able to share a working host kernel / L0 guest kernel
> / nested guest kernel combination, or any other hints for tuning the
> L0 guest to support managed save and live migration?
 
Following up from our IRC discussion (on #kvm, Freenode).  Re-posting my
comment here:

So I just did a test of 'managedsave' (which is just "save the state of
the running VM to a file" in libvirt parlance) of L1, _while_ L2 is
running, and I seem to reproduce your case (see the call trace
attached).

    # Ensure L2 (the nested guest) is running on L1.  Then, from L0, do
    # the following:
    [L0] $ virsh managedsave L1
    [L0] $ virsh start L1 --console

Result: See the call trace attached to this bug.  But L1 goes on to
start "fine", and L2 keeps running, too.  But things start to seem
weird.  As in: I try to safely, read-only mount the L2 disk image via
libguestfs (by setting export LIBGUESTFS_BACKEND=direct, which uses
direct QEMU): `guestfish --ro -a -i ./cirros.qcow2`.  It throws the call
trace again on the L1 serial console.  And the `guestfish` command just
sits there forever


  - L0 (bare metal) Kernel: 4.13.13-300.fc27.x86_64+debug
  - L1 (guest hypervisor) kernel: 4.11.10-300.fc26.x86_64
  - L2 is a CirrOS 3.5 image
   
I can reproduce this at least 3 times, with the above versions.

I'm using libvirt 'host-passthrough' for CPU (meaning: '-cpu host' in
QEMU parlance) for both L1 and L2.

My L0 CPU is:  Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz.

Thoughts?

---

[/me wonders if I'll be asked to reproduce this with newest upstream
kernels.]

[...]

-- 
/kashyap
-------------- next part --------------
$> virsh start f26-devstack --console                                                          
Domain f26-devstack started                                                                    
Connected to domain f26-devstack                                                               
Escape character is ^]                                                                                                                                                                        [ 1323.605321] ------------[ cut here ]------------
[ 1323.608653] kernel BUG at arch/x86/kvm/x86.c:336!                                           
[ 1323.611661] invalid opcode: 0000 [#1] SMP                                                   
[ 1323.614221] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat tun bridge stp llc ebtable_filter ebtables ip6table_filter
ip6_tables sb_edac edac_core kvm_intel openvswitch nf_conntrack_ipv6 kvm nf_nat_ipv6 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack irqbypass cr
ct10dif_pclmul sunrpc crc32_pclmul ppdev ghash_clmulni_intel parport_pc joydev virtio_net virtio_balloon parport tpm_tis i2c_piix4 tpm_tis_core tpm xfs libcrc32c virtio_blk virtio_console vi
rtio_rng crc32c_intel serio_raw virtio_pci ata_generic virtio_ring virtio pata_acpi qemu_fw_cfg
[ 1323.645674] CPU: 0 PID: 18587 Comm: CPU 0/KVM Not tainted 4.11.10-300.fc26.x86_64 #1
[ 1323.649592] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-1.fc27 04/01/2014
[ 1323.653935] task: ffff8b5be13ca580 task.stack: ffffa8b78147c000
[ 1323.656783] RIP: 0010:kvm_spurious_fault+0x9/0x10 [kvm]
[ 1323.659317] RSP: 0018:ffffa8b78147fc78 EFLAGS: 00010246
[ 1323.661808] RAX: 0000000000000000 RBX: ffff8b5be13c0000 RCX: 0000000000000000
[ 1323.665077] RDX: 0000000000006820 RSI: 0000000000000292 RDI: ffff8b5be13c0000
[ 1323.668287] RBP: ffffa8b78147fc78 R08: ffff8b5be13c0090 R09: 0000000000000000
[ 1323.671515] R10: ffffa8b78147fbf8 R11: 0000000000000000 R12: ffff8b5be13c0088
[ 1323.674598] R13: 0000000000000001 R14: 00000131e2372ee6 R15: ffff8b5be1360040
[ 1323.677643] FS:  00007fd602aff700(0000) GS:ffff8b5bffc00000(0000) knlGS:0000000000000000
[ 1323.681130] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1323.683628] CR2: 000055d650532c20 CR3: 0000000221260000 CR4: 00000000001426f0
[ 1323.686697] Call Trace:
[ 1323.687817]  intel_pmu_get_msr+0xd23/0x3f44 [kvm_intel]
[ 1323.690151]  ? vmx_interrupt_allowed+0x19/0x40 [kvm_intel]
[ 1323.692583]  kvm_arch_vcpu_runnable+0xa5/0xe0 [kvm]
[ 1323.694767]  kvm_vcpu_check_block+0x12/0x50 [kvm]
[ 1323.696858]  kvm_vcpu_block+0xa3/0x2f0 [kvm]
[ 1323.698762]  kvm_arch_vcpu_ioctl_run+0x165/0x16a0 [kvm]
[ 1323.701079]  ? kvm_arch_vcpu_load+0x6d/0x290 [kvm]
[ 1323.703175]  ? __check_object_size+0xbb/0x1b3
[ 1323.705109]  kvm_vcpu_ioctl+0x2a6/0x620 [kvm]
[ 1323.707021]  ? kvm_vcpu_ioctl+0x2a6/0x620 [kvm]
[ 1323.709006]  do_vfs_ioctl+0xa5/0x600
[ 1323.710570]  SyS_ioctl+0x79/0x90
[ 1323.712011]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[ 1323.714033] RIP: 0033:0x7fd610fb35e7
[ 1323.715601] RSP: 002b:00007fd602afe7c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1323.718869] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd610fb35e7
[ 1323.721972] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000013
[ 1323.725044] RBP: 0000563dab190300 R08: 0000563dab1ab7d0 R09: 01fc2de3f821e99c
[ 1323.728124] R10: 000000003b9aca00 R11: 0000000000000246 R12: 0000563dadce20a6
[ 1323.731195] R13: 0000000000000000 R14: 00007fd61a84c000 R15: 0000563dadce2000
[ 1323.734268] Code: 8d 00 00 01 c7 05 1c e6 05 00 01 00 00 00 41 bd 01 00 00 00 44 8b 25 2f e6 05 00 e9 db fe ff ff 66 90 0f 1f 44 00 00 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55
 89 ff 48 89 e5 41 54 53
[ 1323.742385] RIP: kvm_spurious_fault+0x9/0x10 [kvm] RSP: ffffa8b78147fc78
[ 1323.745438] ---[ end trace 92fa23c974db8b7e ]---


More information about the libvirt-users mailing list