[vfio-users] Kernel panic at vfio_intx_handler leads to low performance in guest VM

Thu May 25 15:05:06 UTC 2017

On Thu, 25 May 2017 10:53:29 +0000
Hu Zhifeng <zhifeng.hu at hotmail.com> wrote:

> Dear all,
> 
> I am running a fresh Fedora 23 and want to use kvm/qemu to run a windows VM with GPU passthrough.
> 
> My setup is as follow:
> Host OS: Fedora 23 (Workstation x86_64)
> Kernel: 4.2.3-300.fc23.x86_64
> QEMU version: qemu-2.4.0.1-1.fc23
> Guest VM: Windows 7
> CPU: Intel i7-6700K
> Motherboard: Gigabyte B150-HD3
> IGD: Intel® HD Graphics 530 (used by the host)
> Graphics Card: GT710 (used by the VM)
> 
> First, enable IOMMU by appending the `intel_iommu=on` parameter to GRUB.
> Next, prevent the kernel modules i915, nouveau and snd_hda_intel from being loaded for both initramfs and system.
> Then, load vfio-pci with ids (modprobe vfio-pci ids=10de:128b,10de:0e0f)
> Last, run qemu like this:
> qemu-system-x86_64 -enable-kvm -m 4G -cpu host,kvm=off -smp 4,sockets=1,cores=2,threads=2 -hda ~/win7.img -usbdevice host:093a:2510 -usbdevice host:0c45:7603 -device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 -vga none

You really want to avoid x-vga=on, especially with IGD host graphics.
I'm also not sure why you're preventing i915 from loading if you
intend to use IGD for the host graphics.

> Everything looks good and the dedicated GPU detected by the guest VM (N.B. GPU driver `378.92-desktop-win8-win7-64bit-international-whql.exe` was ready),
> But the guest VM is running very slow, and I observed kernel panic which generated by vfio_pci.
> 
> Here's the log from dmesg:
> [  737.317946] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
> [  737.356996] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
> [  737.367606] vfio_pci: add [10de:128b[ffff:ffff]] class 0x000000/00000000
> [  737.378437] vfio_pci: add [10de:0e0f[ffff:ffff]] class 0x000000/00000000
> [  738.233680] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
> [  739.755715] kvm: zapping shadow pages for mmio generation wraparound
> [  739.874265] irq 16: nobody cared (try booting with the "irqpoll" option)
> [  739.874269] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.3-300.fc23.x86_64 #1
> [  739.874270] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./B150-HD3-CF, BIOS F5 03/11/2016
> [  739.874271]  0000000000000000 e5300c14e6af3df1 ffff880470c03e28 ffffffff81771fca
> [  739.874272]  0000000000000000 ffff88045b2844a4 ffff880470c03e58 ffffffff810f88a5
> [  739.874273]  ffff880081f42e50 ffff88045b284400 0000000000000000 0000000000000010
> [  739.874275] Call Trace:
> [  739.874276]  <IRQ>  [<ffffffff81771fca>] dump_stack+0x45/0x57
> [  739.874281]  [<ffffffff810f88a5>] __report_bad_irq+0x35/0xd0
> [  739.874282]  [<ffffffff810f8c44>] note_interrupt+0x244/0x290
> [  739.874284]  [<ffffffff810f607c>] handle_irq_event_percpu+0x11c/0x180
> [  739.874285]  [<ffffffff810f6110>] handle_irq_event+0x30/0x60
> [  739.874286]  [<ffffffff810f91f4>] handle_fasteoi_irq+0x84/0x150
> [  739.874287]  [<ffffffff81016e42>] handle_irq+0x72/0x120
> [  739.874289]  [<ffffffff810bd66a>] ? atomic_notifier_call_chain+0x1a/0x20
> [  739.874291]  [<ffffffff8177b5df>] do_IRQ+0x4f/0xe0
> [  739.874292]  [<ffffffff817794eb>] common_interrupt+0x6b/0x6b
> [  739.874292]  <EOI>  [<ffffffff81108a4f>] ? hrtimer_start_range_ns+0x1bf/0x3b0
> [  739.874296]  [<ffffffff816160c0>] ? cpuidle_enter_state+0x130/0x270
> [  739.874297]  [<ffffffff8161609b>] ? cpuidle_enter_state+0x10b/0x270
> [  739.874298]  [<ffffffff81616237>] cpuidle_enter+0x17/0x20
> [  739.874300]  [<ffffffff810dfcc2>] call_cpuidle+0x32/0x60
> [  739.874301]  [<ffffffff81616213>] ? cpuidle_select+0x13/0x20
> [  739.874302]  [<ffffffff810dff58>] cpu_startup_entry+0x268/0x320
> [  739.874304]  [<ffffffff8176870c>] rest_init+0x7c/0x80
> [  739.874305]  [<ffffffff81d5702d>] start_kernel+0x49d/0x4be
> [  739.874307]  [<ffffffff81d56120>] ? early_idt_handler_array+0x120/0x120
> [  739.874308]  [<ffffffff81d56339>] x86_64_start_reservations+0x2a/0x2c
> [  739.874309]  [<ffffffff81d56485>] x86_64_start_kernel+0x14a/0x16d
> [  739.874309] handlers:
> [  739.874313] [<ffffffffa05172d0>] vfio_intx_handler [vfio_pci]
> [  739.874313] Disabling IRQ #16

What's happening here is that the spurious interrupt handling code is
noting that there are too many unhandled interrupts on this IRQ and
disabling it, which switches to a polling mode behavior and yes,
performance will be terrible.  My write-up on making Windows use MSI
covers some of the background for this:

http://vfio.blogspot.com/2014/09/vfio-interrupts-and-how-to-coax-windows.html

In summary we rely on the device to tell us when an interrupt is
pending to claim the interrupt, if it doesn't then we assume it's
another device sharing the interrupt and let it go.  If it's actually
our device interrupting without indicating so or there's another device
shouting on the same interrupt line, you can hit this problem.

> What I've tried so far:
> 1. Different graphics card (GTX750Ti), with same results

My question would be whether the problem interrupt is the GPU or the
audio.  You could remove the audio assignment and see if it still
occurs.  If it is the audio device, then follow the guide above as
GeForce audio interrupts are only marginally functional anyway.

> 2. Different host OS (Fedora 24: Kernel 4.5.5-300.fc24.x86_64 + qemu-2.6.2-8.fc24), without any issues

That's interesting, I don't know what would be different, but also why
are you running the original FC23 kernel when I know there are FC23
updates that bring it up to a 4.8 kernel?  If you don't keep your
software up to date, bugs are to be expected.

> 3. Load vfio-pci with `nointxmask=1`, without any issues

With this option we get an exclusive interrupt for the device and then
we handle each interrupt under the assumption that it's for our
device.  If there's really something else pulling this interrupt, that
might me we're injecting additional (spurious) interrupts into the
guest.  Generally this is ok so long as we don't hit a rate sufficient
to trigger similar spurious interrupt shutdown in the guest.

> 4. Remove `-hda ~/win7.img` from QEMU command (seabios only), still get the same crash

So you don't even have real guest drivers loaded... look
in /proc/interrupts with the new kernel, are there multiple devices on
the interrupt line with that kernel?

> So I have some questions now:
> 1. Is this a known issue? what is the root cause?

Not a known issue, root cause covered above, certainly something that
may be fixed in updated kernels, or maybe updated kernels just shutdown
or have a driver for the device sharing the interrupt.

> 2. Why Fedora 24 does not have this issue? related to kernel, qemu or other components?

You could try updating one or the other.

> 3. Is `nointxmask=1` the right way to avoid crash?

This is a valid workaround, but it means that vfio-pci will always
require an exclusive INTx interrupt for any assigned device, which
often makes it difficult to achieve a working configuration.  As above,
if the additional interrupts are not generated by the GPU/audio, then
we're potentially injecting spurious interrupts into the guest.  Thanks,

Alex