[vfio-users] 1 GB hugepages cause host crash on guest shutdown with some GPUs

Hristo Iliev hristo at hiliev.eu
Tue Dec 8 08:01:55 UTC 2015


Hi Okky,

On Tue, 8 Dec 2015 07:08:13 Okky Hendriansyah <okky at nostratech.com> wrote:

> On December 5, 2015 at 20:38:17, Hristo Iliev (hristo at hiliev.eu) wrote:
>> Hi Okky, 
>> 
>> Just to add another data point. My system is also Haswell-E (i7-5820K on
>> an X99 motherboard) and my Win10 VM does not make it past the OVMF spash
>> screen with the symptomatic 100% CPU usage when the host is running
>> linux-lts, but boots flawlessly on linux-vfio-lts.
>
> Hi Hristo,
> 
> Actually the 100% CPU usage issue reside on the changes on the kernel
> linux starting 4.2.x and the implementation of OVMF. Since the latest
> linux-vfio is based on linux 4.2.x, the issue also reside on that kernel.
> There’s a patch [1] for linux 4.3-mainline if you want to upgrade your
> kernel, I’ve tried it out myself on Z87 platform, but I haven’t found a
> patch for linux 4.2.x. If you want stableness, linux-lts-4.1.13 seems the
> latest stable right now.
> 

My linux-lts is 4.1.13-1 from the core repo. I took some time over the
weekend to examine the situation more carefully and discovered that it is
actually the complete reverse of what I've written in my last email.

When I shut down the host completely and then boot it with linux-lts, the
VM boots up fine. I can reboot or shut down the VM and then boot it up again
multiple times without any issues. But should I reboot the host without
shutting it down, the VM would reliably hang on the OVMF splash. It's
really perplexing. Something I haven't tried yet though is to disable the
huge pages guest memory backing and see if it has any effect.

Perhaps, I should simply replace the reboot action of my display manager
with shutdown and declare it a solution :)

> I just upgraded my rig to a Haswell-E platform (Intel Core i7-5820K also,
> plus ASUS X99-A) last weekend, and I found no issues running pure
> linux-lts-4.1.13 (even the config is from lts). Though, I recompiled the
> kernel again with ABS and native CFLAGS hoping to have even better
> performance. Are you sure you meant linux-lts not linux-vfio (4.2.x)?
>

Pretty sure I meant linux-lts.

>> Oddly enough, sometimes the VM seems to be able to boot on linux-lts
>> too, but only once it was successfully booted on the vfio-patched kernel.
>> Could have something to do with some sort of initialisation of the GPU
>> I'm passing through (GTX 970), which is able to survive host reboots. I'm
>> using OVMF from the RPMs linked on the Arch Wiki and updating it
>> regularly.
>

Again, I've got this in reverse. The VM actually boots on linux-lts after a
complete shutdown of the host. It doesn't boot after the host is rebooted
with linux-lts, no matter what kernel ran before the reboot.

> Hmm, that’s weird. Haswell-E platform should not need any PCIe
> ACS workaround patch. Its IOMMU groups are separated nicely for each
> device. Can you try using linux-vfio-lts’ config and recompile linux-lts
> using the config with ABS and try booting the guest again?

I can probaby do that. But, as I've understood from this thread, the ACS
patch has to be activated explicitly via some kernel parameters that are not
present in my case.

> 
>> I would really prefer to use linux-lts instead of waiting for the newest 
>> linux-vfio-lts to finish compiling each time it gets updated, but that 
>> doesn't seem currently possible.
>
> Recompiling on your machine probably just took around 10 minutes
> actually. :D
>

I'm actually taking issue with the AUR and the fact that I have to always
abort yaourt after the vfio kernel is compiled once. It produces three
packages at once (kernel, headers, docs), but once those are installed, it
proceeds to recompiling the kernel again in order to produce the headers
package, which was already produced in the previous step... Anyway, Arch is
pretty new to me, so I guess I will find a fix (or the proper way to handle
it) soon. Maybe Dan could shed some light here.

Hristo 

> 
> 
> [1] http://www.spinics.net/lists/kvm/msg123325.html
> 
> -- 
> Okky Hendriansyah





More information about the vfio-users mailing list