[vfio-users] VFIO and random host crashes

Colin Godsey crgodsey at gmail.com
Wed May 18 15:35:27 UTC 2016


I’ve been running as much monitoring as possible these last few crashes,
thankfully the SSH sessions lock up too, so I can see the last stats.

top: looks totally normal when it crashes, maybe 60% CPU util,
swap/cache/sys all look normal.
context switches: seem mostly normal- total of maybe ~4k voluntary, ~300
non-voluntary.
disk usage: crazy up and down constantly… I use ZFS for the VMs which I’m
not entirely ruling out yet… but I think if anything it may contribute to
power fluctuations via the disks (4 magnetic total). The entire VM host is
on its own regular ext4 drive tho, so hoping that helps rule out ZFS
kernel/software issues.
interrupts: normal


On Wed, May 18, 2016 at 9:24 AM Brett Peckinpaugh <bp10 at erylflynn.com>
wrote:

> Are you monitoring processor utilization? 2 systems like you describe
> could tax a host. Maybe it is cpu starvation?
>
> On May 18, 2016 7:47:11 AM PDT, Colin Godsey <crgodsey at gmail.com> wrote:
>
>> I’ve been running a dual gaming VM rig (2x dedicated GPU) for a little
>> bit now, and everything works perfectly except when both VMs are under
>> load, after an hour or so I get a hard crash and/or reboot. It will either
>> reboot itself, or will hang so bad the physical ‘reset’ button on the box
>> doesnt work.
>>
>> There is 0 evidence in the linux logs about the crash, I literally just
>> see one of a few standard cron jobs as the syslog, then the next line is
>> the kernel boot/start-up. Only real evidence I get is that- rarely I can
>> hear windows crash first. Or windows will crash and Ill get maybe another
>> second or 2 of ’top’ before the whole system goes down. I find it extremely
>> odd that there’s some sort of (albeit fast) degradation, but absolutely
>> nothing interesting in the logs.
>>
>> So, I’m pretty sure it’s something hardware related- either PSU or my
>> mobo is crap and is underpowered somewhere. During load, there are about 5
>> drives, 2 GTX GPUs, and GBe (~200mbps) all under constant load, so it seems
>> likely it could be something chipset related.
>>
>> *So my question is really: is there ANY kind of kernel/vfio software
>> level issue that could cause this crash? Or does this just sound like
>> hardware?* I’ve tried several different power configurations at this
>> point, I just want to be as sure as possible it’s hardware before i start
>> replacing more things =\
>>
>> This is an up to date Ubuntu Xenial, not really running anything special.
>> I’ve gotten away with running my VMs almost as pure as possible, no funny
>> workarounds or anything. OVMF, Windows 10, hyper-v flags. Skylake i7 @
>> z170M.
>>
>> ------------------------------
>>
>> vfio-users mailing list
>> vfio-users at redhat.com
>> https://www.redhat.com/mailman/listinfo/vfio-users
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20160518/66e01641/attachment.htm>


More information about the vfio-users mailing list