[vfio-users] VFIO and random host crashes

Zycorax Tokoroa zycorax at phoxden.xyz
Thu May 19 08:37:03 UTC 2016


> I’ve been running a dual gaming VM rig (2x dedicated GPU) for a little
> bit now, and everything works perfectly except when both VMs are under
> load, after an hour or so I get a hard crash and/or reboot. It will
> either reboot itself, or will hang so bad the physical ‘reset’ button on
> the box doesnt work.
>
> There is 0 evidence in the linux logs about the crash, I literally just
> see one of a few standard cron jobs as the syslog, then the next line is
> the kernel boot/start-up. Only real evidence I get is that- rarely I can
> hear windows crash first. Or windows will crash and Ill get maybe
> another second or 2 of ’top’ before the whole system goes down. I find
> it extremely odd that there’s some sort of (albeit fast) degradation,
> but absolutely nothing interesting in the logs.
This seems very similar to the issues I have. The system freezes with no 
log available for this specific problem, and there's hardly any sign 
anticipating it. Any device, including wifi, USB, SATA disks, etc is as 
powered off / severed from the machine. My only solution is to use the 
reset button - which causes a rather long reboot.

> So, I’m pretty sure it’s something hardware related- either PSU or my
> mobo is crap and is underpowered somewhere. During load, there are about
> 5 drives, 2 GTX GPUs, and GBe (~200mbps) all under constant load, so it
> seems likely it could be something chipset related.
The setup is similar, with 6 drives (though I mainly use two for the VM, 
a rotational HD and a SSD for caching. The bcache device is passed as a 
disk to the VM) and two GTX GPUs. I'm sure it's not a power issue as 
removing one from the motherboard and thus not using passtrough for 
graphics still gets me the issue. The load that causes me to crash is 
usually heavy I/O on the VM disk

> *So my question is really: is there ANY kind of kernel/vfio software
> level issue that could cause this crash? Or does this just sound like
> hardware?* I’ve tried several different power configurations at this
> point, I just want to be as sure as possible it’s hardware before i
> start replacing more things =\
I'm starting to think it's motherboard related, as it doesn't make sense 
that only few people get issues. Perhaps correlating onboard components 
could pin it down to something more specific

> This is an up to date Ubuntu Xenial, not really running anything
> special. I’ve gotten away with running my VMs almost as pure as
> possible, no funny workarounds or anything. OVMF, Windows 10, hyper-v
> flags. Skylake i7 @ z170M.
Both Xenial and Wily have this issue for me. Using a X99-Deluxe from 
ASUS with a i7-5930K. No kernel patching, default libvirt and qemu-kvm 
packets, default setup.

FWIW I used to isolate 6 of the 12 logical cores of my processor and to 
pin the vCPUs to them. I haven't seen the host choking, not tuning just 
gets a slightly worse performance on the VM

  Zycorax Tokoroa




More information about the vfio-users mailing list