[vfio-users] Kernel panic at vfio_intx_handler leads to low performance in guest VM

Alex Williamson alex.williamson at redhat.com
Wed May 31 16:26:16 UTC 2017


On Wed, 31 May 2017 10:15:59 +0000
Zhifeng Hu <zhifeng.hu at hotmail.com> wrote:

> Thank you Alex,
> 
> I took a look at the link you provided, follow the guide and enable MSI, the performance of guest VM improved significantly.
> 
> Regarding to the answers & questions you mentioned in previous mail, here are my update:
> > You really want to avoid x-vga=on, especially with IGD host graphics.  
> Why did you say that? And I did not see any other parameters that can be used to replace x-vga=on.

Because...

http://vfio.blogspot.com/2014/08/whats-deal-with-vga-arbitration.html
http://vfio.blogspot.com/2014/08/primary-graphics-assignment-without-vga.html
https://www.youtube.com/watch?v=NhZ9eIpg2nM

The better solution is to use a GPU that supports UEFI in the ROM and
use OVMF for the guest firmware.

> > I'm also not sure why you're preventing i915 from loading if you
> > intend to use IGD for the host graphics.  
> I disabled i915 driver because I don't want to apply neither i915 VGA arbiter patch nor ACS override patch.

Disabling i915 is not enough, the device is still claiming transactions
to the VGA space, so the VM BIOS initialization is actually writing to
the IGD, not the assigned device.  You need to do one of a) avoid VGA
by using UEFI/OVMF, b) disable IGD in the host BIOS and use a different
GPU for the primary graphics, c) patch i915.

> > My question would be whether the problem interrupt is the GPU or the
> > audio.  You could remove the audio assignment and see if it still
> > occurs.  If it is the audio device, then follow the guide above as
> > GeForce audio interrupts are only marginally functional anyway.  
> The problem interrupt is the GPU, since the GPU (01:00.0) and the audio (01:00.1) are together in IOMMU group 5, 
> I usually assign both of them at the same time to avoid "vfio: error, group 5 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver."
> I also tried removing the audio assignment but got the same problem

You can bind both devices to vfio-pci without assigning both devices to
the guest to avoid this issue.  Or simply 'virsh nodedev-detach
pci_0000_01_00_1' before you run with only the GPU assigned.

> > So you don't even have real guest drivers loaded... look
> > in /proc/interrupts with the new kernel, are there multiple devices on
> > the interrupt line with that kernel?  
> Yes, you are right, there are up to 6 devices sharing interrupt 16
> lspci -v
> ...
> 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1) (prog-if 00 [Normal decode])
> 	Flags: bus master, fast devsel, latency 0, IRQ 16
> 	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
> 	Capabilities: [40] Express Root Port (Slot+), MSI 00
> 	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
> 	Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Device 5001
> 	Capabilities: [a0] Power Management version 3
> 	Capabilities: [100] Advanced Error Reporting
> 	Capabilities: [220] #19
> 	Kernel driver in use: pcieport
> 	Kernel modules: shpchp
> 
> 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #9 (rev f1) (prog-if 00 [Normal decode])
> 	Flags: bus master, fast devsel, latency 0, IRQ 16
> 	Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
> 	Memory behind bridge: df100000-df1fffff
> 	Capabilities: [40] Express Root Port (Slot+), MSI 00
> 	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
> 	Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Device 5001
> 	Capabilities: [a0] Power Management version 3
> 	Capabilities: [100] Advanced Error Reporting
> 	Capabilities: [220] #19
> 	Kernel driver in use: pcieport
> 	Kernel modules: shpchp
> 
> 00:1f.3 Audio device: Intel Corporation Sunrise Point-H HD Audio (rev 31)
> 	Subsystem: Gigabyte Technology Co., Ltd Device a182
> 	Flags: fast devsel, IRQ 16
> 	Memory at df240000 (64-bit, non-prefetchable) [size=16K]
> 	Memory at df220000 (64-bit, non-prefetchable) [size=64K]
> 	Capabilities: [50] Power Management version 3
> 	Capabilities: [60] MSI: Enable- Count=1/1 Maskable- 64bit+
> 	Kernel modules: snd_hda_intel
> 
> 00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31)
> 	Subsystem: Gigabyte Technology Co., Ltd Device 5001
> 	Flags: medium devsel, IRQ 16
> 	Memory at df24a000 (64-bit, non-prefetchable) [size=256]
> 	I/O ports at f040 [size=32]
> 	Kernel modules: i2c_i801
> 
> 01:00.0 VGA compatible controller: NVIDIA Corporation Device 128b (rev a1) (prog-if 00 [VGA controller])
> 	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 8c93
> 	Flags: fast devsel, IRQ 16
> 	Memory at de000000 (32-bit, non-prefetchable) [size=16M]
> 	Memory at d0000000 (64-bit, prefetchable) [size=128M]
> 	Memory at d8000000 (64-bit, prefetchable) [size=32M]
> 	I/O ports at e000 [size=128]
> 	Expansion ROM at df000000 [disabled] [size=512K]
> 	Capabilities: [60] Power Management version 3
> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> 	Capabilities: [78] Express Legacy Endpoint, MSI 00
> 	Capabilities: [100] Virtual Channel
> 	Capabilities: [128] Power Budgeting <?>
> 	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> 	Kernel driver in use: vfio-pci
> 	Kernel modules: nouveau
> 
> 03:00.0 Non-Volatile memory controller: Intel Corporation Device f1a5 (rev 03) (prog-if 02 [NVM Express])
> 	Subsystem: Intel Corporation Device 390a
> 	Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 0
> 	Memory at df100000 (64-bit, non-prefetchable) [size=16K]
> 	Capabilities: [40] Power Management version 3
> 	Capabilities: [70] Express Endpoint, MSI 00
> 	Capabilities: [b0] MSI-X: Enable+ Count=16 Masked-
> 	Capabilities: [100] Advanced Error Reporting
> 	Capabilities: [158] #19
> 	Capabilities: [178] Latency Tolerance Reporting
> 	Capabilities: [180] L1 PM Substates
> 	Kernel driver in use: nvme
> 	Kernel modules: nvme
> 
> > Not a known issue, root cause covered above, certainly something that
> > may be fixed in updated kernels, or maybe updated kernels just shutdown
> > or have a driver for the device sharing the interrupt  
> That is what I want to figure out.
> 
> > You could try updating one or the other.  
> I had tried to upgrade the kernel and QEMU to the corresponding version of Fedora 24, the problem still exists.

So the problem does not occur on fc24, but does occur on fc23 even if
using the same kernel and qemu versions as fc24?  I'm skeptical.  Also,
why not just use fc24 if it works?

> > This is a valid workaround, but it means that vfio-pci will always
> > require an exclusive INTx interrupt for any assigned device, which
> > often makes it difficult to achieve a working configuration.  As above,
> > if the additional interrupts are not generated by the GPU/audio, then
> > we're potentially injecting spurious interrupts into the guest.  
> As I tested, `nointxmask=1` may cause a new error "vfio: Error: Failed to setup INTx fd: Device or resource busy" when assign GPU and onboard audio together, This error was mentioned in https://www.redhat.com/archives/vfio-users/2016-March/msg00035.html

Not surprising with all those other devices on the same interrupt.
nointxmask requires that the interrupt is exclusive on the host.  The
log below just shows that vfio-pci can register both 00:1f.3 and
01:00.0 with nointxmask at the same time because they share an
interrupt.  Thanks,

Alex

> modprobe vfio-pci ids=10de:128b,10de:0e0f,8086:a170 nointxmask=1
> qemu-system-x86_64 -enable-kvm -m 4G -cpu host,kvm=off -smp 4,sockets=1,cores=2,threads=2 -hda ~/win7.img -usbdevice host:093a:2510 -usbdevice host:0c45:7603 -device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 -vga none -device vfio-pci,host=00:1f.3
> qemu-system-x86_64: -device vfio-pci,host=00:1f.3: vfio: Error: Failed to setup INTx fd: Device or resource busy
> qemu-system-x86_64: -device vfio-pci,host=00:1f.3: Device initialization failed
> 
> dmesg:
> [   77.750742] VFIO - User Level meta-driver version: 0.3
> [   77.754872] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
> [   77.765719] vfio_pci: add [10de:128b[ffff:ffff]] class 0x000000/00000000
> [   77.776720] vfio_pci: add [10de:0e0f[ffff:ffff]] class 0x000000/00000000
> [   77.787714] vfio_pci: add [8086:a170[ffff:ffff]] class 0x000000/00000000
> [   83.681186] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
> [   83.705664] genirq: Flags mismatch irq 16. 00000000 (vfio-intx(0000:00:1f.3)) vs. 00000000 (vfio-intx(0000:01:00.0))
> [   83.705666] CPU: 2 PID: 1953 Comm: qemu-system-x86 Not tainted 4.5.5-300.fc24.x86_64 #1
> [   83.705667] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./B150-HD3-CF, BIOS F5 03/11/2016
> [   83.705668]  0000000000000086 00000000842ff643 ffff88043eb87ca8 ffffffff813d35af
> [   83.705670]  ffff88045d97f000 00000000fffffff0 ffff88043eb87d00 ffffffff811011ae
> [   83.705671]  0000000000000246 ffff88045d97f09c ffff88044e5db248 00000000842ff643
> [   83.705673] Call Trace:
> [   83.705676]  [<ffffffff813d35af>] dump_stack+0x63/0x84
> [   83.705678]  [<ffffffff811011ae>] __setup_irq+0x5ee/0x640
> [   83.705682]  [<ffffffffa050f2d0>] ? vfio_intx_disable+0x60/0x60 [vfio_pci]
> [   83.705683]  [<ffffffff81101388>] request_threaded_irq+0xf8/0x1a0
> [   83.705685]  [<ffffffffa050f0c5>] vfio_intx_set_signal+0x105/0x1d0 [vfio_pci]
> [   83.705686]  [<ffffffffa050f437>] vfio_pci_set_intx_trigger+0xc7/0x160 [vfio_pci]
> [   83.705687]  [<ffffffffa050f9bf>] vfio_pci_set_irqs_ioctl+0x3f/0xa0 [vfio_pci]
> [   83.705689]  [<ffffffffa050dd8e>] vfio_pci_ioctl+0x2fe/0x9c0 [vfio_pci]
> [   83.705690]  [<ffffffff8128ed94>] ? eventfd_write+0x94/0x210
> [   83.705692]  [<ffffffff810d0220>] ? wake_up_q+0x70/0x70
> [   83.705694]  [<ffffffffa0461183>] vfio_device_fops_unl_ioctl+0x23/0x30 [vfio]
> [   83.705696]  [<ffffffff81256183>] do_vfs_ioctl+0xa3/0x5d0
> [   83.705697]  [<ffffffff81256729>] SyS_ioctl+0x79/0x90
> [   83.705699]  [<ffffffff817cecee>] entry_SYSCALL_64_fastpath+0x12/0x6d




More information about the vfio-users mailing list