[libvirt] Proposal PCI/PCIe device placement on PAPR guests

Thu Jan 12 06:19:40 UTC 2017

On 12/01/17 14:52, David Gibson wrote:
> On Fri, Jan 06, 2017 at 12:57:58PM +0100, Greg Kurz wrote:
>> On Thu, 5 Jan 2017 16:46:18 +1100
>> David Gibson <david at gibson.dropbear.id.au> wrote:
>>
>>> There was a discussion back in November on the qemu list which spilled
>>> onto the libvirt list about how to add support for PCIe devices to
>>> POWER VMs, specifically 'pseries' machine type PAPR guests.
>>>
>>> Here's a more concrete proposal for how to handle part of this in
>>> future from the libvirt side.  Strictly speaking what I'm suggesting
>>> here isn't intrinsically linked to PCIe: it will make adding PCIe
>>> support sanely easier, as well as having a number of advantages for
>>> both PCIe and plain-PCI devices on PAPR guests.
>>>
>>> Background:
>>>
>>>  * Currently the pseries machine type only supports vanilla PCI
>>>    buses.
>>>     * This is a qemu limitation, not something inherent - PAPR guests
>>>       running under PowerVM (the IBM hypervisor) can use passthrough
>>>       PCIe devices (PowerVM doesn't emulate devices though).
>>>     * In fact the way PCI access is para-virtalized in PAPR makes the
>>>       usual distinctions between PCI and PCIe largely disappear
>>>  * Presentation of PCIe devices to PAPR guests is unusual
>>>     * Unlike x86 - and other "bare metal" platforms, root ports are
>>>       not made visible to the guest. i.e. all devices (typically)
>>>       appear as though they were integrated devices on x86
>>>     * In terms of topology all devices will appear in a way similar to
>>>       a vanilla PCI bus, even PCIe devices
>>>        * However PCIe extended config space is accessible
>>>     * This means libvirt's usual placement of PCIe devices is not
>>>       suitable for PAPR guests
>>>  * PAPR has its own hotplug mechanism
>>>     * This is used instead of standard PCIe hotplug
>>>     * This mechanism works for both PCIe and vanilla-PCI devices
>>>     * This can hotplug/unplug devices even without a root port P2P
>>>       bridge between it and the root "bus
>>>  * Multiple independent host bridges are routine on PAPR
>>>     * Unlike PC (where all host bridges have multiplexed access to
>>>       configuration space) PCI host bridges (PHBs) are truly
>>>       independent for PAPR guests (disjoint MMIO regions in system
>>>       address space)
>>>     * PowerVM typically presents a separate PHB to the guest for each
>>>       host slot passed through
>>>
>>> The Proposal:
>>>
>>> I suggest that libvirt implement a new default algorithm for placing
>>> (i.e. assigning addresses to) both PCI and PCIe devices for (only)
>>> PAPR guests.
>>>
>>> The short summary is that by default it should assign each device to a
>>> separate vPHB, creating vPHBs as necessary.
>>>
>>>   * For passthrough sometimes a group of host devices can't be safely
>>>     isolated from each other - this is known as a (host) Partitionable
>>>     Endpoint (PE).  In this case, if any device in the PE is passed
>>>     through to a guest, the whole PE must be passed through to the
>>>     same vPHB in the guest.  From the guest POV, each vPHB has exactly
>>>     one (guest) PE.
>>>   * To allow for hotplugged devices, libvirt should also add a number
>>>     of additional, empty vPHBs (the PAPR spec allows for hotplug of
>>>     PHBs, but this is not yet implemented in qemu).  When hotplugging
>>>     a new device (or PE) libvirt should locate a vPHB which doesn't
>>>     currently contain anything.
>>>   * libvirt should only (automatically) add PHBs - never root ports or
>>>     other PCI to PCI bridges
>>>
>>> In order to handle migration, the vPHBs will need to be represented in
>>> the domain XML, which will also allow the user to override this
>>> topology if they want.
>>>
>>> Advantages:
>>>
>>> There are still some details I need to figure out w.r.t. handling PCIe
>>> devices (on both the qemu and libvirt sides).  However the fact that
>>
>> One such detail may be that PCIe devices should have the
>> "ibm,pci-config-space-type" property set to 1 in the DT,
>> for the driver to be able to access the extended config
>> space.
> 
> So, we have a bit of an oddity here.  It looks like we currently set
> 'ibm,pci-config-space-type' to 1 in the PHB, rather than individual
> device nodes.  Which, AFAICT, is simply incorrect in terms of PAPR.


I asked Paul how to read the spec and this is rather correct but not enough
- having type=1 on a PHB means that extended access requests can go behind
it but underlying devices and bridges still need to have type=1 if they
support extended space. Having type set to 0 (or none at all) on a PHB
would mean that extended config space is not available on anything under
this PHB.


-- 
Alexey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 839 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20170112/9f48fd77/attachment-0001.sig>