[vfio-users] IOMMU group is not a symlink - SR-IOV on 32Bit MMIO platform?

Alex Williamson alex.williamson at redhat.com
Fri Jul 7 04:32:08 UTC 2017


On Thu, 6 Jul 2017 23:15:11 -0400
"Taiidan at gmx.com" <Taiidan at gmx.com> wrote:

> On 07/05/2017 02:11 PM, Alex Williamson wrote:
> 
> > On Wed, 5 Jul 2017 02:33:52 -0400
> > "Taiidan at gmx.com" <Taiidan at gmx.com> wrote:
> >  
> >> error: internal error: Process exited prior to exec: libvirt:  error :
> >> internal error: Invalid device 0000:09:00.1 iommu_group file
> >> /sys/bus/pci/devices/0000:09:00.1/iommu_group is not a symlink
> >>
> >> (intel 82576 quad port nic, assigning itself works fine)
> >>
> >> I am trying to get sr-iov working on a platform with 32bit MMIO space
> >> (is that even possible)
> >>
> >> So I have added pci-realloc and pci-assign-buses to the kernel command
> >> line, without that I get a "Not enough MMIO resources for SR-IOV" error
> >> even when adding just one VF via sysfs.  
> > This is not really a 32-bit vs 64-bit issue, it just means your BIOS
> > didn't allocate enough resources on the bus to enable SR-IOV.  The
> > 82576 dual-port cards typically get away with working on non-SR-IOV
> > aware systems because the additional resources they need for SR-IOV
> > fits within the minimum bridge apertures anyway.  This is not true for
> > the quad-port card or various other SR-IOV devices.  
> I was told that coreboot's 32bit MMIO space means that you have issues 
> with too many devices.

Well sure, if you run out of 32bit MMIO and can't support 64bit, you're
in trouble.  More devices makes that more likely to happen, but 82576
VFs don't take much resources.

> Do you have any firmware implementation guides for SR-IOV that you can 
> share? my chipset supports it and ARI.

(Re ARI: no it doesn't)

How about just implement the SR-IOV spec?  SR-IOV devices have
additional BAR and bus number resource requirements, the BIOS needs to
parse the SR-IOV capability of devices to find and account for those.
 
> Does this also apply to newer devices like the i350?

There are hardly, if any, differences between 82576 other than branding.

> > pci-realloc should help with this, whether pci-assign-buses is
> > necessary would depend on the SR-IOV config of the device (82576
> > typically doesn't need an additional bus number).  
> I get an error saying invalid bus if I don't do assign buses.

Aha, there was an ARI problem on these quad port cards.  Several
problems linked together here...

> > Hmm, I don't understand why they wouldn't have an iommu group
> > associated with them.  The quad-port 82576 cards had some special kind
> > of brokenness about them though that I can't recall, perhaps something
> > about ARI.  Having no iommu group would imply that the devices don't
> > live downstream of an iommu.  Is this an Intel system?  The DMAR ACPI
> > table on Intel has path structures designed to take bus re-numbering
> > into account, but maybe you're not on Intel or maybe the BIOS has done
> > something particularly awful to negate this.  
> It is an AMD Opteron G34 Bulldozer, KGPE-D16 board running a libre 
> coreboot.
> I can forward other devices including GPU's and the cards PF's.
> 
> The PF's have a group, but the VF's are not assigned to any group at all.
> 
> Is ARI support advertised on the PCI-e chipset like it is advertised on 
> the NIC? or is it just ARIFwd+ in DevCap?

No, it's a PCIe capability, would look like:

        Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 1
                ARICtl: MFVC- ACS-, Function Group: 0

Not only is it not supported on your root port (therefore a dual port
card would also have problems), but Intel implemented these quad port
cards with a PCIe switch that doesn't support ARI.  This leads to:

	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
		IOVCap:	Migration-, Interrupt Message Number: 000
		IOVCtl:	Enable+ Migration- Interrupt- MSE+ ARIHierarchy-
		IOVSta:	Migration-
		Initial VFs: 8, Total VFs: 8, Number of VFs: 1, Function Dependency Link: 00
		VF offset: 384, stride: 2, Device ID: 10ca
  --->                     ^^^
		Supported Page Size: 00000553, System Page Size: 00000001
		Region 0: Memory at 00000000e8bc4000 (64-bit, non-prefetchable)
		Region 3: Memory at 00000000e8be4000 (64-bit, non-prefetchable)
		VF Migration: offset: 00000000, BIR: 0

Normally with an 82576 you'd see a VF offset of 128 which would mean
that for a PF at 0a:00.0, the first VF would be at 0a:10.0.  In that
case the BIOS doesn't need to allocate an additional bus number under
the bridge for the VFs.  But, since the downstream switch port here
doesn't support ARI we do require an additional bus number to host the
VFs.

> >> Would disabling devices in the BIOS help?  
> > Probably not.  Logs please.  dmesg, sudo lspci -vvv, /tmp/DMAR.dsl
> > after running:
> >
> > # iasl -d -p /tmp/DMAR /sys/firmware/acpi/tables/DMAR
> >
> > (assuming an Intel system).  Also:
> >
> > # find /sys/class/iommu/*/ -type l
> >
> > And
> >
> > # find /sys/kernel/iommu_groups/ -type l
> >
> > Thanks,
> >
> > Alex  
> Stuff attached! (in off list email as I don't know if the lists supports it)
> Opteron AMD-Vi so no DMAR.

So in that case it's the IVRS table, but I can see from the iommu
groups you listed that the VFs are not included there.  The IVRS table
can list specific devices are being translated by the IOMMU for ranges
of devices, including bus numbers.  My suspicion is that your BIOS is
being conservative in reporting the range of devices translated, then
we go and renumber the buses and the VFs fall outside that range.  So
it seems to boil down to your BIOS is junk wrt SR-IOV.  Sorry.  Thanks,

Alex




More information about the vfio-users mailing list