[vfio-users] 答复: Kernel panic at vfio_intx_handler leads to low performance in guest VM

Wed May 31 10:15:59 UTC 2017

Thank you Alex,

I took a look at the link you provided, follow the guide and enable MSI, the performance of guest VM improved significantly.

Regarding to the answers & questions you mentioned in previous mail, here are my update:
> You really want to avoid x-vga=on, especially with IGD host graphics.
Why did you say that? And I did not see any other parameters that can be used to replace x-vga=on.

> I'm also not sure why you're preventing i915 from loading if you
> intend to use IGD for the host graphics.
I disabled i915 driver because I don't want to apply neither i915 VGA arbiter patch nor ACS override patch.

> My question would be whether the problem interrupt is the GPU or the
> audio.  You could remove the audio assignment and see if it still
> occurs.  If it is the audio device, then follow the guide above as
> GeForce audio interrupts are only marginally functional anyway.
The problem interrupt is the GPU, since the GPU (01:00.0) and the audio (01:00.1) are together in IOMMU group 5, 
I usually assign both of them at the same time to avoid "vfio: error, group 5 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver."
I also tried removing the audio assignment but got the same problem

> So you don't even have real guest drivers loaded... look
> in /proc/interrupts with the new kernel, are there multiple devices on
> the interrupt line with that kernel?
Yes, you are right, there are up to 6 devices sharing interrupt 16
lspci -v
...
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	Capabilities: [40] Express Root Port (Slot+), MSI 00
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
	Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Device 5001
	Capabilities: [a0] Power Management version 3
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [220] #19
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #9 (rev f1) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
	Memory behind bridge: df100000-df1fffff
	Capabilities: [40] Express Root Port (Slot+), MSI 00
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
	Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Device 5001
	Capabilities: [a0] Power Management version 3
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [220] #19
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:1f.3 Audio device: Intel Corporation Sunrise Point-H HD Audio (rev 31)
	Subsystem: Gigabyte Technology Co., Ltd Device a182
	Flags: fast devsel, IRQ 16
	Memory at df240000 (64-bit, non-prefetchable) [size=16K]
	Memory at df220000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [50] Power Management version 3
	Capabilities: [60] MSI: Enable- Count=1/1 Maskable- 64bit+
	Kernel modules: snd_hda_intel

00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31)
	Subsystem: Gigabyte Technology Co., Ltd Device 5001
	Flags: medium devsel, IRQ 16
	Memory at df24a000 (64-bit, non-prefetchable) [size=256]
	I/O ports at f040 [size=32]
	Kernel modules: i2c_i801

01:00.0 VGA compatible controller: NVIDIA Corporation Device 128b (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 8c93
	Flags: fast devsel, IRQ 16
	Memory at de000000 (32-bit, non-prefetchable) [size=16M]
	Memory at d0000000 (64-bit, prefetchable) [size=128M]
	Memory at d8000000 (64-bit, prefetchable) [size=32M]
	I/O ports at e000 [size=128]
	Expansion ROM at df000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Legacy Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: vfio-pci
	Kernel modules: nouveau

03:00.0 Non-Volatile memory controller: Intel Corporation Device f1a5 (rev 03) (prog-if 02 [NVM Express])
	Subsystem: Intel Corporation Device 390a
	Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 0
	Memory at df100000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [b0] MSI-X: Enable+ Count=16 Masked-
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [158] #19
	Capabilities: [178] Latency Tolerance Reporting
	Capabilities: [180] L1 PM Substates
	Kernel driver in use: nvme
	Kernel modules: nvme

> Not a known issue, root cause covered above, certainly something that
> may be fixed in updated kernels, or maybe updated kernels just shutdown
> or have a driver for the device sharing the interrupt
That is what I want to figure out.

> You could try updating one or the other.
I had tried to upgrade the kernel and QEMU to the corresponding version of Fedora 24, the problem still exists.

> This is a valid workaround, but it means that vfio-pci will always
> require an exclusive INTx interrupt for any assigned device, which
> often makes it difficult to achieve a working configuration.  As above,
> if the additional interrupts are not generated by the GPU/audio, then
> we're potentially injecting spurious interrupts into the guest.
As I tested, `nointxmask=1` may cause a new error "vfio: Error: Failed to setup INTx fd: Device or resource busy" when assign GPU and onboard audio together, This error was mentioned in https://www.redhat.com/archives/vfio-users/2016-March/msg00035.html

modprobe vfio-pci ids=10de:128b,10de:0e0f,8086:a170 nointxmask=1
qemu-system-x86_64 -enable-kvm -m 4G -cpu host,kvm=off -smp 4,sockets=1,cores=2,threads=2 -hda ~/win7.img -usbdevice host:093a:2510 -usbdevice host:0c45:7603 -device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 -vga none -device vfio-pci,host=00:1f.3
qemu-system-x86_64: -device vfio-pci,host=00:1f.3: vfio: Error: Failed to setup INTx fd: Device or resource busy
qemu-system-x86_64: -device vfio-pci,host=00:1f.3: Device initialization failed

dmesg:
[   77.750742] VFIO - User Level meta-driver version: 0.3
[   77.754872] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
[   77.765719] vfio_pci: add [10de:128b[ffff:ffff]] class 0x000000/00000000
[   77.776720] vfio_pci: add [10de:0e0f[ffff:ffff]] class 0x000000/00000000
[   77.787714] vfio_pci: add [8086:a170[ffff:ffff]] class 0x000000/00000000
[   83.681186] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
[   83.705664] genirq: Flags mismatch irq 16. 00000000 (vfio-intx(0000:00:1f.3)) vs. 00000000 (vfio-intx(0000:01:00.0))
[   83.705666] CPU: 2 PID: 1953 Comm: qemu-system-x86 Not tainted 4.5.5-300.fc24.x86_64 #1
[   83.705667] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./B150-HD3-CF, BIOS F5 03/11/2016
[   83.705668]  0000000000000086 00000000842ff643 ffff88043eb87ca8 ffffffff813d35af
[   83.705670]  ffff88045d97f000 00000000fffffff0 ffff88043eb87d00 ffffffff811011ae
[   83.705671]  0000000000000246 ffff88045d97f09c ffff88044e5db248 00000000842ff643
[   83.705673] Call Trace:
[   83.705676]  [<ffffffff813d35af>] dump_stack+0x63/0x84
[   83.705678]  [<ffffffff811011ae>] __setup_irq+0x5ee/0x640
[   83.705682]  [<ffffffffa050f2d0>] ? vfio_intx_disable+0x60/0x60 [vfio_pci]
[   83.705683]  [<ffffffff81101388>] request_threaded_irq+0xf8/0x1a0
[   83.705685]  [<ffffffffa050f0c5>] vfio_intx_set_signal+0x105/0x1d0 [vfio_pci]
[   83.705686]  [<ffffffffa050f437>] vfio_pci_set_intx_trigger+0xc7/0x160 [vfio_pci]
[   83.705687]  [<ffffffffa050f9bf>] vfio_pci_set_irqs_ioctl+0x3f/0xa0 [vfio_pci]
[   83.705689]  [<ffffffffa050dd8e>] vfio_pci_ioctl+0x2fe/0x9c0 [vfio_pci]
[   83.705690]  [<ffffffff8128ed94>] ? eventfd_write+0x94/0x210
[   83.705692]  [<ffffffff810d0220>] ? wake_up_q+0x70/0x70
[   83.705694]  [<ffffffffa0461183>] vfio_device_fops_unl_ioctl+0x23/0x30 [vfio]
[   83.705696]  [<ffffffff81256183>] do_vfs_ioctl+0xa3/0x5d0
[   83.705697]  [<ffffffff81256729>] SyS_ioctl+0x79/0x90
[   83.705699]  [<ffffffff817cecee>] entry_SYSCALL_64_fastpath+0x12/0x6d

BRs
Zhifeng
________________________________________
发件人: Alex Williamson <alex.williamson at redhat.com>
发送时间: 2017年5月25日 23:05
收件人: Hu Zhifeng
抄送: vfio-users at redhat.com
主题: Re: [vfio-users] Kernel panic at vfio_intx_handler leads to low performance in guest VM

On Thu, 25 May 2017 10:53:29 +0000
Hu Zhifeng <zhifeng.hu at hotmail.com> wrote:

> Dear all,
>
> I am running a fresh Fedora 23 and want to use kvm/qemu to run a windows VM with GPU passthrough.
>
> My setup is as follow:
> Host OS: Fedora 23 (Workstation x86_64)
> Kernel: 4.2.3-300.fc23.x86_64
> QEMU version: qemu-2.4.0.1-1.fc23
> Guest VM: Windows 7
> CPU: Intel i7-6700K
> Motherboard: Gigabyte B150-HD3
> IGD: Intel® HD Graphics 530 (used by the host)
> Graphics Card: GT710 (used by the VM)
>
> First, enable IOMMU by appending the `intel_iommu=on` parameter to GRUB.
> Next, prevent the kernel modules i915, nouveau and snd_hda_intel from being loaded for both initramfs and system.
> Then, load vfio-pci with ids (modprobe vfio-pci ids=10de:128b,10de:0e0f)
> Last, run qemu like this:
> qemu-system-x86_64 -enable-kvm -m 4G -cpu host,kvm=off -smp 4,sockets=1,cores=2,threads=2 -hda ~/win7.img -usbdevice host:093a:2510 -usbdevice host:0c45:7603 -device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 -vga none

You really want to avoid x-vga=on, especially with IGD host graphics.
I'm also not sure why you're preventing i915 from loading if you
intend to use IGD for the host graphics.

> Everything looks good and the dedicated GPU detected by the guest VM (N.B. GPU driver `378.92-desktop-win8-win7-64bit-international-whql.exe` was ready),
> But the guest VM is running very slow, and I observed kernel panic which generated by vfio_pci.
>
> Here's the log from dmesg:
> [  737.317946] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
> [  737.356996] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
> [  737.367606] vfio_pci: add [10de:128b[ffff:ffff]] class 0x000000/00000000
> [  737.378437] vfio_pci: add [10de:0e0f[ffff:ffff]] class 0x000000/00000000
> [  738.233680] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
> [  739.755715] kvm: zapping shadow pages for mmio generation wraparound
> [  739.874265] irq 16: nobody cared (try booting with the "irqpoll" option)
> [  739.874269] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.3-300.fc23.x86_64 #1
> [  739.874270] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./B150-HD3-CF, BIOS F5 03/11/2016
> [  739.874271]  0000000000000000 e5300c14e6af3df1 ffff880470c03e28 ffffffff81771fca
> [  739.874272]  0000000000000000 ffff88045b2844a4 ffff880470c03e58 ffffffff810f88a5
> [  739.874273]  ffff880081f42e50 ffff88045b284400 0000000000000000 0000000000000010
> [  739.874275] Call Trace:
> [  739.874276]  <IRQ>  [<ffffffff81771fca>] dump_stack+0x45/0x57
> [  739.874281]  [<ffffffff810f88a5>] __report_bad_irq+0x35/0xd0
> [  739.874282]  [<ffffffff810f8c44>] note_interrupt+0x244/0x290
> [  739.874284]  [<ffffffff810f607c>] handle_irq_event_percpu+0x11c/0x180
> [  739.874285]  [<ffffffff810f6110>] handle_irq_event+0x30/0x60
> [  739.874286]  [<ffffffff810f91f4>] handle_fasteoi_irq+0x84/0x150
> [  739.874287]  [<ffffffff81016e42>] handle_irq+0x72/0x120
> [  739.874289]  [<ffffffff810bd66a>] ? atomic_notifier_call_chain+0x1a/0x20
> [  739.874291]  [<ffffffff8177b5df>] do_IRQ+0x4f/0xe0
> [  739.874292]  [<ffffffff817794eb>] common_interrupt+0x6b/0x6b
> [  739.874292]  <EOI>  [<ffffffff81108a4f>] ? hrtimer_start_range_ns+0x1bf/0x3b0
> [  739.874296]  [<ffffffff816160c0>] ? cpuidle_enter_state+0x130/0x270
> [  739.874297]  [<ffffffff8161609b>] ? cpuidle_enter_state+0x10b/0x270
> [  739.874298]  [<ffffffff81616237>] cpuidle_enter+0x17/0x20
> [  739.874300]  [<ffffffff810dfcc2>] call_cpuidle+0x32/0x60
> [  739.874301]  [<ffffffff81616213>] ? cpuidle_select+0x13/0x20
> [  739.874302]  [<ffffffff810dff58>] cpu_startup_entry+0x268/0x320
> [  739.874304]  [<ffffffff8176870c>] rest_init+0x7c/0x80
> [  739.874305]  [<ffffffff81d5702d>] start_kernel+0x49d/0x4be
> [  739.874307]  [<ffffffff81d56120>] ? early_idt_handler_array+0x120/0x120
> [  739.874308]  [<ffffffff81d56339>] x86_64_start_reservations+0x2a/0x2c
> [  739.874309]  [<ffffffff81d56485>] x86_64_start_kernel+0x14a/0x16d
> [  739.874309] handlers:
> [  739.874313] [<ffffffffa05172d0>] vfio_intx_handler [vfio_pci]
> [  739.874313] Disabling IRQ #16

What's happening here is that the spurious interrupt handling code is
noting that there are too many unhandled interrupts on this IRQ and
disabling it, which switches to a polling mode behavior and yes,
performance will be terrible.  My write-up on making Windows use MSI
covers some of the background for this:

http://vfio.blogspot.com/2014/09/vfio-interrupts-and-how-to-coax-windows.html

In summary we rely on the device to tell us when an interrupt is
pending to claim the interrupt, if it doesn't then we assume it's
another device sharing the interrupt and let it go.  If it's actually
our device interrupting without indicating so or there's another device
shouting on the same interrupt line, you can hit this problem.

> What I've tried so far:
> 1. Different graphics card (GTX750Ti), with same results

My question would be whether the problem interrupt is the GPU or the
audio.  You could remove the audio assignment and see if it still
occurs.  If it is the audio device, then follow the guide above as
GeForce audio interrupts are only marginally functional anyway.

> 2. Different host OS (Fedora 24: Kernel 4.5.5-300.fc24.x86_64 + qemu-2.6.2-8.fc24), without any issues

That's interesting, I don't know what would be different, but also why
are you running the original FC23 kernel when I know there are FC23
updates that bring it up to a 4.8 kernel?  If you don't keep your
software up to date, bugs are to be expected.

> 3. Load vfio-pci with `nointxmask=1`, without any issues

With this option we get an exclusive interrupt for the device and then
we handle each interrupt under the assumption that it's for our
device.  If there's really something else pulling this interrupt, that
might me we're injecting additional (spurious) interrupts into the
guest.  Generally this is ok so long as we don't hit a rate sufficient
to trigger similar spurious interrupt shutdown in the guest.

> 4. Remove `-hda ~/win7.img` from QEMU command (seabios only), still get the same crash

So you don't even have real guest drivers loaded... look
in /proc/interrupts with the new kernel, are there multiple devices on
the interrupt line with that kernel?

> So I have some questions now:
> 1. Is this a known issue? what is the root cause?

Not a known issue, root cause covered above, certainly something that
may be fixed in updated kernels, or maybe updated kernels just shutdown
or have a driver for the device sharing the interrupt.

> 2. Why Fedora 24 does not have this issue? related to kernel, qemu or other components?

You could try updating one or the other.

> 3. Is `nointxmask=1` the right way to avoid crash?

This is a valid workaround, but it means that vfio-pci will always
require an exclusive INTx interrupt for any assigned device, which
often makes it difficult to achieve a working configuration.  As above,
if the additional interrupts are not generated by the GPU/audio, then
we're potentially injecting spurious interrupts into the guest.  Thanks,

Alex