[libvirt] [PATCH v2 0/4] PCI hostdev partial assignment support

Alex Williamson alex.williamson at redhat.com
Thu Nov 21 23:08:44 UTC 2019

On Thu, 21 Nov 2019 19:19:13 -0300
Daniel Henrique Barboza <danielhb413 at gmail.com> wrote:

> Changes from previous version [1], all of them result of
> feedback from Alex Williamson and Abdulla Bubshait:
> - use <address type='none'/> instead of creating a new subsys
> attribute;
> - expand the change to all PCI hostdevs. Former patch 01 was
> discarded because we don't need the PCI Multifunction helpers
> for now; 
> - series changed name to reflect what it's being done
> - new patch 04: add documentation to formatdomain.html.in
> To avoid a huge wall of text please refer to [1] for context
> about the road up to here. Commit msgs of the first 3 patches
> tells the story as well.
> [1] https://www.redhat.com/archives/libvir-list/2019-October/msg00298.html  
> What I want to discuss here instead is a caveat that I've found
> while testing this work, since its first version. This test was
> done in a Power 8 system with a Broadcom BCM5719 PCIe Multifunction
> card, with 4 virtual functions. This series enables Libvirt to
> declare PCI hostdevs that will not be visible to the guest using
> address type='none'. During the tests I faced a scenario that I
> expected to fail, but it didn't. This is the relevant XML except:
>     <hostdev mode='subsystem' type='pci' managed='yes'>
>       <driver name='vfio'/>
>       <source>
>         <address domain='0x0001' bus='0x09' slot='0x00' function='0x0'/>
>       </source>
>       <address type='none'/>
>     </hostdev>
>     <hostdev mode='subsystem' type='pci' managed='yes'>
>       <driver name='vfio'/>
>       <source>
>         <address domain='0x0001' bus='0x09' slot='0x00' function='0x1'/>
>       </source>
>       <address type='none'/>
>     </hostdev>
>     <hostdev mode='subsystem' type='pci' managed='yes'>
>       <driver name='vfio'/>
>       <source>
>         <address domain='0x0001' bus='0x09' slot='0x00' function='0x2'/>
>       </source>
>       <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x2'/>
>     </hostdev>
>     <hostdev mode='subsystem' type='pci' managed='yes'>
>       <driver name='vfio'/>
>       <source>
>         <address domain='0x0001' bus='0x09' slot='0x00' function='0x3'/>
>       </source>
>       <address type='none'/>
>     </hostdev>
> I'm declaring all the BCM5719 functions in the XML, but I am making
> functions 0, 1 and 3 unassignable by the guest using address type='none'.
> This test was meant to fail, but it didn't. To my surprise the guest
> booted and the device is functional:
> $ lspci
> 0000:00:01.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
> 0000:00:03.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)
> 0000:00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device
> 0001:00:01.2 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
> $ 
> I've talked with Michael Roth (QEMU PPC64 developer that worked with
> the PCI multifunction hotplug/unplug support in the PPC64 machine)
> about this. He mentioned that this is intended. I'll quote here what he
> had to say about it:
> "The current logic is that we only emit the hotplug event when function
> 0 is attached, but if some other function is attached at boot-time the
> guest will still see it on the bus, and whether that works or not I
> think is up to the device/driver"
> This explains why this test didn't fail as I expected. At least for
> the PPC64 machine, depending on the device support, this setup is
> allowed. PPC64 machine uses function 0 hotplug as a signal of
> 'plug all the queue functions and function 0', but function 0
> isn't required at boot time.  I would like to hear other opinions
> in this because I can't say whether this is allowed in x86.

I would expect that on x86 a Linux guest would need to be booted with
the pci=pcie_scan_all kernel option to find this device.  The PCI-core
will only scan for devfn 00.0 behind a downstream port and then only
scan non-zero functions if that device indicate multifunction support.
I'd expect more archs to behave this way than what you see on PPC where
it "just works".

> I am mentioning all this now because this had a direct impact on the
> design of this work since the previous version, and I failed
> to bring it up back then. I am *not* checking for the assignment of
> function 0 at guest boot time in Libvirt, leaving the user free to
> decide what to do. I am aware that this will be inconsistent to the
> logic of the PCI multifunction hotplug/unplug support, where
> function 0 is required. This also puts a lot of faith in the user,
> relying that the user is fully aware of the capabilities of the
> hardware.
> My question is: should Libvirt force function 0 to be present in
> boot time as well, regardless of whether the PPC64 guest or some
> cards are able to boot without it?

In my reading of the PCI 3.0 spec, indicates that
multifunction devices are required to implement function 0 and that
configuration software is under no obligation to scan for higher number
functions if function 0 is not present and does not have the
multifunction bit set in the header type register.  With PCIe, where
link information, payload size, ARI, VC, etc are exposed in config
space, many of these are only valid or have specific means only for
function 0.

What you're seeing on PPC is, I think, more typical of paravirtualized
PCI enumeration, where you're only seeing a view of the bus as allowed
by a hypervisor.  I can't say whether the pcie_scan_all boot option was
added to allow discovery of devices as in your configuration or simply
to correct cases where function 0 forgot to implement the multifunction
bit.  Whether libvirt wants to prevent this is more of a support
question, it seems spec compliant hardware should never do this, but
not all hardware is spec compliant.  Libvirt should never generate such
a configuration on it's own, but I wouldn't necessarily object if it
allows a user to shoot themselves in the foot.  Thanks,


More information about the libvir-list mailing list