[vfio-users] Occasional failure to start VM with GPU attached

Thomas Lindroth thomas.lindroth at gmail.com
Sat Jul 23 15:25:30 UTC 2016


I think I've found the cause of this problem. Long story short it's caused by
vfio's use of the D3hot pci power state.

By comparing traces of failures with successful boots I noticed the pci config
space looked different. I started saving dumps of the config space for later
comparison. Every time the host reboots I got a new config space state. Some
of these states didn't work well and caused the VM hangs. Vfio resets the
device both before and after use but that sometimes doesn't help so I guess the
rumours about reset bugs with nvidia hardware is true.
http://sprunge.us/KMPS this is what the config space looks like just after an
unsuccessful boot (and vfio reset attempt). The BARs got an address of 0 in the
hexdump and lspci displays them as "virtual". That seems the be the main
difference between a successful boot and failure.

I tried to understand why the pci config space was different each boot and
remembered I had to use vfio-pci.disable_idle_d3=1 on my old Radeon HD 6770
because the state was corrupted after entering D3. This new nvidia card is not
corrupted as badly and sometimes works even with D3. After enabling
vfio-pci.disable_idle_d3=1 the pci config space is the same on every boot and
the VM always starts like it should.

I find it odd that no one else has noticed this problem before. Perhaps my
motherboard has problems with D3 state?




More information about the vfio-users mailing list