[vfio-users] Host hard lockups
Thomas Lindroth
thomas.lindroth at gmail.com
Sun Oct 16 19:52:31 UTC 2016
> It took longer than expected, but a definite crash happened yesterday.
> Sadly, it seems that MSI was not a fix for the in-use crashes.
>
> At this point I'm worried that it's some sort of weird hardware-specific
> interaction that is unlikely to be fixed. If anybody experiences similar
> symptoms or can suggest any debugging techniques, I'd greatly appreciate
> any suggestions.
I do experience something similar and have been since June. I get about 1-2
crashes per month and the symptoms are very similar to yours. After the last
crash I went ahead and setup netconsole logging. That way all kernel messages
are sent to another machine and are saved after the crash.
https://www.kernel.org/doc/Documentation/networking/netconsole.txt
It's easy to setup using the "Dynamic reconfiguration" solution but you'll
need another machine to log the messages.
Today I finally got another crash and it looks identical to this:
https://lkml.org/lkml/2016/9/14/527
It's a problem with fuse that's only triggered under memory pressure. I
always assumed the crashes are related to kvm because it usually happens soon
after starting a VM but perhaps the VM only introduced the memory pressure
needed to trigger the fuse crash. Do you also use fuse?
The patch to fix it are marked <stable at vger.kernel.org> [3.15+] but so far
only 4.8.0 and above got the fix. I upgraded to 4.8.2 and hopefully that'll
fix the crashes for me.
After some googeling I even found this:
https://github.com/trapexit/mergerfs#mergerfs-under-heavy-load-and-memory-preasure-leads-to-kernel-panic
mergerfs is what I use fuse for.
More information about the vfio-users
mailing list