[vfio-users] Intel IGD passthrough and OpRegion

Alex Williamson alex.williamson at redhat.com
Tue Jun 27 21:52:57 UTC 2017


On Fri, 23 Jun 2017 21:01:19 +1000
Aa Aa <jimbothom at yandex.com> wrote:

> I have run in to a few problems with IGD passthrough with a linux guest. I am not running in legacy mode, so I guess that I might not be supported.

Theoretically UPT mode is the one Intel supports, but issues abound
since there's no stable hardware spec and the "universal"-ness of UPT
isn't shaping up to be what was expected.

> The first thing that I noticed was on some intel machines, when the VFIO IOMMU module was loaded from qemu I was getting a whole lot of DMAR faults. The address I found was the same as the one that was being set by the kernel here:
>  
> Jun 23 10:21:35 phys kernel: DMAR: Setting RMRR:
> Jun 23 10:21:35 phys kernel: DMAR: Setting identity map for device 0000:00:02.0 [0xcb000000 - 0xcf1fffff]
>  
> This happened on two machines at different memory locations. I was able to fix this by hardcoding an entry thus:
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index a8a079ba9477..3c0f134c1669 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1270,6 +1270,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>                         goto out_domain;
>         }
>  
> +       ret = iommu_map(domain->domain, 0xcb000000u,0xcb000000u , 0x4200000u,  IOMMU_READ | IOMMU_WRITE);
>  
> and the machine because usable again. As this fixed my problem, I didn't bother checking what RMRR does, but should this be handled or should entries that doesn't apepar in the PCI configation space not be remove from the DMAR?

In general, RMRRs exclude devices from being eligible for user
assignment, see justification here:

https://access.redhat.com/sites/default/files/attachments/rmrr-wp1.pdf

IGD and USB are special cases to this though as the RMRR defined
regions are for use by the device rather than for platform monitoring
and back channel communications to access the device.  These cases are
therefore allowed, but the RMRR is not honored and no attempt is made
by anyone to map these regions.  In the case of IGD, the RMRR is
typically describing the stolen memory of the device, and the code
above is identity mapping that massive region into the user address
space, regardless of the VM memory layout.  Not good. You've avoided
some DMAR faults, likely at the expense of a stable VM.

Originally we were told that UPT mode devices don't need stolen memory
(legacy mode uses a partially successful hack to re-allocate stolen
memory from the guest address space), but Intel seems to be going back
on that as UPT becomes less and less universal.

> Anyhow, I tried to get the OpRegion working, adding an x-idg-opregion and overrring the VGA check

There's no VGA check in x-idg-opregion, 

> in the vfio kernel module. But I noticed in the VM the OpRegion ins't
 mapped:
>  b
> On the guest the region FC is zero whereas it contains a 32bit address from the host:
> lspci -xxxx -s 00:2
> 00:02.0 Display controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
> 00: 86 80 12 04 07 04 90 00 06 00 80 03 00 00 80 00
> ....
> f0: 00 00 00 00 00 00 00 00 00 00 06 00 00 00 00 00
>  
>  lspci -xxxx -s 00:2
> 00:02.0 Display controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
> 00: 86 80 12 04 07 04 90 00 06 00 80 03 00 00 00 00
> ...
> f0: 00 00 00 00 00 00 00 00 00 00 06 00 18 c0 d5 c8
>  
> How is the OpRegion mapped in the guest?

Unfortunately you don't say what kernel you're using as QEMU's
x-idg-opregion is dependent a version of vfio-pci that exposes this for
IGD devices.  This was added in v4.6.  You also don't indicate which VM
BIOS you're using, but only SeaBIOS supports the necessary fw_cfg
interfaces for reserving memory for the OpRegion.  The way this is
supposed to work is that QEMU reads the OpRegion from a vfio region on
the vfio device file descriptor and stores that into fw_cfg for
SeaBIOS.  SeaBIOS finds the fw_cfg tag, allocates the necessary memory,
creating a reserved memory area in the VM, copies the OpRegion data
into that reserved memory, then writes the address to the 0xFC register
on the device.  Since you're not getting that, clearly something is
broken in this chain.  Thanks,

Alex




More information about the vfio-users mailing list