[vfio-users] GPU driver crashes when running a second VM if either VM has a virtual disk stored on physical media other than the root disk. Tested on three X58 chipset MBs

Brian Yglesias brian at atlanticdigitalsolutions.com
Tue Nov 14 08:10:41 UTC 2017


To put it another way, running concurrent VMs when at least one VM has an assigned GPU will always result in a GPU driver crash, unless all VMs and all their attached media reside on the root disk.  I've been able to replicate this consistently across three motherboards, all with the X58 chipset (I don't have anything else on hand to test with, and at this point I suspect it's a problem with the chipset).

Example:
-The OS is on /dev/sda
-VM1's root disk is also on /dev/sda, and is the only disk
-VM2's root disk is also on /dev/sda, and is the only disk

*This works.

-Now, add a second physical drive to the server - /dev/sdb
-Attach a virtual disk to VM2 which is stored on /dev/sdb

*Any and all VMs with assigned GPUs will now eventually crash.


If they are at near idle it may take a while.  If their GPUs are being utilized somewhat, it may take only seconds.  The only solution I've found is to stop all VMs and use VMs with assigned GPU(s) by themselves.

This has been the case since early 2016 when I began testing.  I've tried various invocations of kvm, as well as countless disk configurations, and I've upgraded the kernel, kvm, and the rest of the OS several times.  I'm not sure that this is a vfio problem, although the fact that the problem only occurs when an assigned GPU is involved is suggestive.  In any case, I also reported this to the qemu bugtracker (last year), but have not heard back.

I currently start the VMs as follows:


/usr/bin/kvm \
-id 110 \
-chardev 'socket,id=qmp,path=/var/run/qemu-server/110.qmp,server,nowait' \
-mon 'chardev=qmp,mode=control' \
-pidfile /var/run/qemu-server/110.pid \
-daemonize \
-smbios 'type=1,uuid=a4419ef3-5aef-4978-8849-d9d010e26e27' \
-drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/kvm/OVMF_CODE-pure-efi.fd' \
-drive 'if=pflash,unit=1,format=raw,file=/root/sbin/110-ovmf.fd' \
-name Brian-PC \
-smp '12,sockets=1,cores=12,maxcpus=12' \
-nodefaults \
-boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
-vga none \
-nographic \
-no-hpet \
-cpu 'host,hv_vendor_id=Nvidia43FIX,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,+kvm_pv_unhalt,+kvm_pv_eoi,kvm=off' \
-m 8192 \
-object 'memory-backend-ram,id=ram-node0,size=8192M' \
-numa 'node,nodeid=0,cpus=0-11,memdev=ram-node0' \
-k en-us \
-readconfig /usr/share/qemu-server/pve-q35.cfg \
-device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
-device vfio-pci,host=04:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0 \
-device vfio-pci,host=04:00.1,id=hostpci1,bus=ich9-pcie-port-2,addr=0x0 \
-device 'usb-host,hostbus=9,hostport=1.1,id=usb0' \
-device 'usb-host,hostbus=9,hostport=1.2,id=usb1' \
-device 'usb-host,hostbus=9,hostport=1.3,id=usb2' \
-device 'usb-host,hostbus=9,hostport=1.4,id=usb3' \
-device 'usb-host,hostbus=9,hostport=1.5,id=usb4' \
-device 'usb-host,hostbus=9,hostport=1.1.1,id=usb5' \
-device 'usb-host,hostbus=9,hostport=1.1.2,id=usb6' \
-device 'usb-host,hostbus=9,hostport=1.1.3,id=usb7' \
-device 'usb-host,hostbus=9,hostport=1.1.4,id=usb8' \
-device 'usb-host,hostbus=9,hostport=1.1.5,id=usb9' \
-device 'usb-host,hostbus=9,hostport=1.2.1,id=usb10' \
-device 'usb-host,hostbus=9,hostport=1.2.2,id=usb11' \
-device 'usb-host,hostbus=9,hostport=1.2.3,id=usb12' \
-device 'usb-host,hostbus=9,hostport=1.2.4,id=usb13' \
-device 'usb-host,hostbus=9,hostport=1.2.5,id=usb14' \
-device 'usb-host,hostbus=9,hostport=1.3.1,id=usb15' \
-device 'usb-host,hostbus=9,hostport=1.3.2,id=usb16' \
-device 'usb-host,hostbus=9,hostport=1.3.3,id=usb17' \
-device 'usb-host,hostbus=9,hostport=1.3.4,id=usb19' \
-device 'usb-host,hostbus=9,hostport=1.4.1,id=usb21' \
-device 'usb-host,hostbus=9,hostport=1.4.2,id=usb22' \
-device 'usb-host,hostbus=9,hostport=1.4.3,id=usb23' \
-device 'usb-host,hostbus=9,hostport=1.4.4,id=usb24' \
-device 'usb-host,hostbus=9,hostport=1.4.5,id=usb25' \
-device 'usb-host,hostbus=9,hostport=1.5.1,id=usb26' \
-device 'usb-host,hostbus=9,hostport=1.5.2,id=usb27' \
-device 'usb-host,hostbus=9,hostport=1.5.3,id=usb28' \
-device 'usb-host,hostbus=9,hostport=1.5.4,id=usb29' \
-device 'usb-host,hostbus=9,hostport=1.5.5,id=usb30' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:209b855ce18e' \
-drive 'file=/dev/zvol/fastpool/vm-110-disk-1,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect-zeroes=on' \
-device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' \
-drive 'file=/dev/zvol/rpool/data/vm-110-disk-1,if=none,id=drive-virtio1,cache=writeback,format=raw,aio=threads,detect-zeroes=on' \
-device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb' \
-netdev 'type=tap,id=net0,ifname=tap110i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=92:7F:88:0F:73:8D,netdev=net0,bus=pci.0,addr=0x12,id=net0' \
-rtc 'driftfix=slew,base=localtime' \
-machine 'type=q35' \
-global 'kvm-pit.lost_tick_policy=discard'


As you can see, this VM has two virtual disks which are stored on separate physical disks.  As such, this VM will crash unrecoverably if another VM is started on the same machine.  All VMs must have all of their disks stored on OS root to avoid a crash.  Obviously this is hardly an ideal setup.

I wanted to give this one more shot before I gave up on the platform (which is unfortunate because I bought two of them), and I was hoping someone could help me out.

Thanks,
Brian




# kvm --version
QEMU emulator version 2.9.0 pve-qemu-kvm_2.9.0-5

# uname -a
Linux proxmox-1 4.10.17-3-pve #1 SMP PVE 4.10.17-21 (Thu, 31 Aug 2017 14:57:17 +0200) x86_64 GNU/Linux




More information about the vfio-users mailing list