[vfio-users] GPU passthrough errors with linux 5.1 and newer

Zoltán Kővágó dirty.ice.hu at gmail.com
Wed Jul 31 19:08:23 UTC 2019


On 2019-07-31 15:41, José Ramón Muñoz Pekkarinen wrote:
> On Sun, 21 Jul 2019 at 21:59, Zoltán Kővágó <dirty.ice.hu at gmail.com> wrote:
>>
>> Hi,
>>
>> Recently my previously perfectly working GPU passthrough setup (with a
>> win8.1 x64 guest with OVMF) started to malfunction in various ways:
>> screen randomly turned off for a few seconds, BSOD with
>> VIDEO_TDR_FAILURE, 3d apps randomly crashing, not drawing the windows'
>> content, and graphical glitches (for example in furmark the OSD text
>> flickers).
>>
>> After fiddling around with various qemu versions, nvidia driver versions
>> on the guest, I figured out that with a linux 5.0 kernel it works fine,
>> but with 5.1 it randomly fails. I bisected it and it looks like the
>> culprit is the commit 4e103134b862 "KVM: x86/mmu: Zap only the relevant
>> pages when removing a memslot"[1]. I tried to revert in on top of 5.2.1
>> but too many things changed in the meantime. Anyway, if I replace the
>> body of kvm_mmu_invalidate_zap_pages_in_memslot with
>> kvm_mmu_zap_all(kvm); it works again (probably with horrible performance
>> degradation).
>>
>> Did anyone experience anything like this? I'm using Alex's ACS override
>> patch, maybe it violates some assumption that the new code has?
> 
>      Hi,
> 
>      I noticed some changes that made 5.0 not working well when
> detecting screen speakers through hdmi, but this I didn't see anytime.
> My problem flew away with 5.1.15(the one I currently use), and no
> other spread. I never needed the ACS override patch in my setup,
> what happen if you try without it, does your groups comes wrong in
> any ways?
> 
>      Best regards.
> 
>      José.
> 

Hi,

Unfortunately without pcie_acs_override=downstream my iommu groups look 
like this (i.e. both video cards and their pci bridges are in one 
group), and I never had a problem with it in the last ~4.5 years.

# ls /sys/kernel/iommu_groups/*/devices
/sys/kernel/iommu_groups/0/devices:
0000:00:00.0

/sys/kernel/iommu_groups/10/devices:
0000:00:1c.3

/sys/kernel/iommu_groups/11/devices:
0000:00:1d.0

/sys/kernel/iommu_groups/12/devices:
0000:00:1f.0  0000:00:1f.2  0000:00:1f.3

/sys/kernel/iommu_groups/1/devices:
0000:00:01.0  0000:00:01.1  0000:01:00.0  0000:01:00.1  0000:02:00.0 
0000:02:00.1

/sys/kernel/iommu_groups/2/devices:
0000:00:02.0

/sys/kernel/iommu_groups/3/devices:
0000:00:03.0

/sys/kernel/iommu_groups/4/devices:
0000:00:14.0

/sys/kernel/iommu_groups/5/devices:
0000:00:16.0

/sys/kernel/iommu_groups/6/devices:
0000:00:19.0

/sys/kernel/iommu_groups/7/devices:
0000:00:1a.0

/sys/kernel/iommu_groups/8/devices:
0000:00:1b.0

/sys/kernel/iommu_groups/9/devices:
0000:00:1c.0

Regards,
Zoltan




More information about the vfio-users mailing list