[vfio-users] Host hard lockups

Thomas Lindroth thomas.lindroth at gmail.com
Sun Oct 16 19:52:31 UTC 2016


> It took longer than expected, but a definite crash happened yesterday.
> Sadly, it seems that MSI was not a fix for the in-use crashes.
> 
> At this point I'm worried that it's some sort of weird hardware-specific
> interaction that is unlikely to be fixed. If anybody experiences similar
> symptoms or can suggest any debugging techniques, I'd greatly appreciate
> any suggestions.

I do experience something similar and have been since June. I get about 1-2
crashes per month and the symptoms are very similar to yours. After the last
crash I went ahead and setup netconsole logging. That way all kernel messages
are sent to another machine and are saved after the crash. 

https://www.kernel.org/doc/Documentation/networking/netconsole.txt
It's easy to setup using the "Dynamic reconfiguration" solution but you'll
need another machine to log the messages.

Today I finally got another crash and it looks identical to this:
https://lkml.org/lkml/2016/9/14/527

It's a problem with fuse that's only triggered under memory pressure. I
always assumed the crashes are related to kvm because it usually happens soon
after starting a VM but perhaps the VM only introduced the memory pressure
needed to trigger the fuse crash. Do you also use fuse?

The patch to fix it are marked <stable at vger.kernel.org> [3.15+] but so far
only 4.8.0 and above got the fix. I upgraded to 4.8.2 and hopefully that'll
fix the crashes for me.

After some googeling I even found this:
https://github.com/trapexit/mergerfs#mergerfs-under-heavy-load-and-memory-preasure-leads-to-kernel-panic
mergerfs is what I use fuse for.




More information about the vfio-users mailing list