[libvirt] new libvirt "pci" controller type and pcie/q35 (was Re: [PATCH 4/7] add pci-bridge controller type)

Michael S. Tsirkin mst at redhat.com
Mon Apr 15 21:58:40 UTC 2013


On Mon, Apr 15, 2013 at 11:27:03AM -0600, Alex Williamson wrote:
> On Fri, 2013-04-12 at 11:46 -0400, Laine Stump wrote:
> > On 04/11/2013 07:23 AM, Michael S. Tsirkin wrote:
> > > On Thu, Apr 11, 2013 at 07:03:56AM -0400, Laine Stump wrote:
> > >> On 04/10/2013 05:26 AM, Daniel P. Berrange wrote:
> > >>> On Tue, Apr 09, 2013 at 04:06:06PM -0400, Laine Stump wrote:
> > >>>> On 04/09/2013 04:58 AM, Daniel P. Berrange wrote:
> > >>>>> On Mon, Apr 08, 2013 at 03:32:07PM -0400, Laine Stump wrote:
> > >>>>> Actually I do wonder if we should reprent a PCI root as two
> > >>>>> <controller> elements, one representing the actual PCI root
> > >>>>> device, and the other representing the host bridge that is
> > >>>>> built-in.
> > >>>>>
> > >>>>> Also we should use the actual model names, not 'pci-root' or
> > >>>>> 'pcie-root' but rather i440FX for "pc" machine type, and whatever
> > >>>>> the q35 model name is.
> > >>>>>
> > >>>>>  - One PCI root with built-in PCI bus (ie todays' setup)
> > >>>>>
> > >>>>>    <controller type="pci-root" index="0">
> > >>>>>      <model name="i440FX"/>
> > >>>>>    </controller>
> > >>>>>    <controller type="pci" index="0"> <!-- Host bridge -->
> > >>>>>      <address type='pci' domain='0' bus='0' slot='0''/>
> > >>>> Isn't this saying that the bridge connects to itself? (since bus 0 is
> > >>>> this bus)
> > >>>>
> > >>>> I understand (again, possibly wrongly) that the builtin PCI bus connects
> > >>>> to the chipset using its own slot 0 (that's why it's reserved), but
> > >>>> that's its address on itself. How is this bridge associated with the
> > >>>> pci-root?
> > >>>>
> > >>>> Ah, I *think* I see it - the domain attribute of the pci controller is
> > >>>> matched to the index of the pci-root controller, correct? But there's
> > >>>> still something strange about the <address> of the pci controller being
> > >>>> self-referential.
> > >>> Yes, the index of the pci-root matches the 'domain' of <address>
> > >>
> > >> Okay, then the way that libvirt differentiates between a pci bridge that
> > >> is connected to the root, and one that is connected to a slot of another
> > >> bridge is 1) the "bus" attribute of the bridge's <address> matches the
> > >> "index" attribute of the bridge itself, and 2) "slot" is always 0. Correct?
> > >>
> > >> (The corollary of this is that if slot == 0 and bus != index, or bus ==
> > >> index and slot != 0, it is a configuration error).
> > >>
> > >> I'm still unclear on the usefulness of the pci-root controller though -
> > >> all the necessary information is contained in the pci controller, except
> > >> for the type of root. But in the case of pcie root, I think you're not
> > >> allowed to connect a standard bridge to it, only a "dmi-to-pci-bridge"
> > >> (i82801b11-bridge)
> > > Yes you can connect a pci bridge to pcie-root.
> > > It's represented as a root complex integrated device.
> 
> Is this accurate?  Per the PCI express spec, any PCI express device
> needs to have a PCI express capability, which our pci-bridge does not.
> I think this is one of the main differences for our i82801b11-bridge,
> that it exposes itself as a root complex integrated endpoint, so we know
> it's effectively a PCIe-to-PCI bridge.

If it does not have an express link upstream it's not a
PCIe-to-PCI bridge, is it?

>  We'll be asking for trouble
> if/when we get guest IOMMU support if we are lax about using PCI-to-PCI
> bridges where we should have PCIe-to-PCI bridges.

I recall the spec saying somewhere that integrated endpoints are outside
the root complex hierarchy.  I think IOMMU will simply not apply to
these.

> There are plenty of
> examples to the contrary of root complex integrated endpoints without an
> express capability, but that doesn't make it correct to the spec.

Is there something in the spec explicitly forbidding this?  I merely
find: The PCI Express Capability structure is required for PCI Express
device Functions.
So if it's not an express device it does not have to have
an express capability?

Maybe we should send an example dump to pci sig and ask them...

> > ARGHH!! Just when I think I'm starting to understand *something* about
> > these devices...
> > 
> > (later edit: after some coaching on IRC, I *think* I've got a bit better
> > handle on it.)
> > 
> > >>>>>    </controller>
> > >>>>>    <interface type='direct'>
> > >>>>>       ...
> > >>>>>      <address type='pci' domain='0' bus='0' slot='3'/>
> > >>>>>    </controller>
> > >>>>>
> > >>>>>  - One PCI root with built-in PCI bus and extra PCI bridge
> > >>>>>
> > >>>>>    <controller type="pci-root" index="0">
> > >>>>>      <model name="i440FX"/>
> > >>>>>    </controller>
> > >>>>>    <controller type="pci" index="0"> <!-- Host bridge -->
> > >>>>>      <address type='pci' domain='0' bus='0' slot='0'/>
> > >>>>>    </controller>
> > >>>>>    <controller type="pci" index="1"> <!-- Additional bridge -->
> > >>>>>      <address type='pci' domain='0' bus='0' slot='1'/>
> > >>>>>    </controller>
> > >>>>>    <interface type='direct'>
> > >>>>>       ...
> > >>>>>      <address type='pci' domain='0' bus='1' slot='3'/>
> > >>>>>    </controller>
> > >>>>>
> > >>>>>  - One PCI root with built-in PCI bus, PCI-E bus and and extra PCI bridge
> > >>>>>    (ie possible q35 setup)
> > >>>> Why would a q35 machine have an i440FX pci-root?
> > >>> It shouldn't, that's a typo
> > >>>
> > >>>>>    <controller type="pci-root" index="0">
> > >>>>>      <model name="i440FX"/>
> > >>>>>    </controller>
> > >>>>>    <controller type="pci" index="0"> <!-- Host bridge -->
> > >>>>>      <address type='pci' domain='0' bus='0' slot='0'/>
> > >>>>>    </controller>
> > >>>>>    <controller type="pci" index="1"> <!-- Additional bridge -->
> > >>>>>      <address type='pci' domain='0' bus='0' slot='1'/>
> > >>>>>    </controller>
> > >>>>>    <controller type="pci" index="1"> <!-- Additional bridge -->
> > >>>>>      <address type='pci' domain='0' bus='0' slot='1'/>
> > >>>>>    </controller>
> > >>>> I think you did a cut-paste here and intended to change something, but
> > >>>> didn't - those two bridges are identical.
> > >>> Yep, the slot should be 2 in the second one
> > >>>
> > >>>>>    <interface type='direct'>
> > >>>>>       ...
> > >>>>>      <address type='pci' domain='0' bus='1' slot='3'/>
> > >>>>>    </controller>
> > >>>>>
> > >>>>> So if we later allowed for mutiple PCI roots, then we'd have something
> > >>>>> like
> > >>>>>
> > >>>>>    <controller type="pci-root" index="0">
> > >>>>>      <model name="i440FX"/>
> > >>>>>    </controller>
> > >>>>>    <controller type="pci-root" index="1">
> > >>>>>      <model name="i440FX"/>
> > >>>>>    </controller>
> > >>>>>    <controller type="pci" index="0"> <!-- Host bridge 1 -->
> > >>>>>      <address type='pci' domain='0' bus='0' slot='0''/>
> > >>>>>    </controller>
> > >>>>>    <controller type="pci" index="0"> <!-- Host bridge 2 -->
> > >>>>>      <address type='pci' domain='1' bus='0' slot='0''/>
> > >>>>>    </controller>
> > 
> > 
> > There is a problem here - within a given controller type, we will now
> > have the possibility of multiple controllers with the same index - the
> > differentiating attribute will be in the <address> subelement, which
> > could create some awkwardness. Maybe instead this should be handled with
> > a different model of pci controller, and we can add a "domain" attribute
> > at the toplevel rather than specifying an <address>?
> 
> On real hardware, the platform can specify the _BBN (Base Bus Number =
> bus) and the _SEG (Segment = domain) of the host bridge.  So perhaps you
> want something like:
> 
> <controller type="pci-host-bridge">
>   <model name="i440FX"/>
>   <address type="pci-host-bridge-addr" domain='1' bus='0'/>
> </controller>

Yes, we could specify segments, though it's not the same as
a domain as linux guests define it (I assume this is what libvirt wants
to call a domain): if memory serves a segment does not have to be a root
based hierarchy, linux domains are all root based.

We are better off not specifying BBN for all buses I think -
it's intended for multi-root support for legacy OSes.

> "index" is confusing to me.

I'd prefer ID for bus not a number, I'm concerned users will
assume it's bus number and get confused by a mismatch.

> > >>>>>    <interface type='direct'> <!-- NIC on host bridge 2 -->
> > >>>>>       ...
> > >>>>>      <address type='pci' domain='1' bus='0' slot='3'/>
> > >>>>>    </controller>
> > >>>>>
> > >>>>>
> > >>>>> NB this means that 'index' values can be reused against the
> > >>>>> <controller>, provided they are setup on different pci-roots.
> > >>>>>
> > >>>>>> (also note that it might happen that the bus number in libvirt's config
> > >>>>>> will correspond to the bus numbering that shows up in the guest OS, but
> > >>>>>> that will just be a happy coincidence)
> > >>>>>>
> > >>>>>> Does this make sense?
> > >>>>> Yep, I think we're fairly close.
> > >>>> What about the other types of pci controllers that are used by PCIe? We
> > >>>> should make sure they fit in this model before we settle on it.
> > >>> What do they do ?
> > 
> > (The descriptions of different models below tell what each of these
> > other devices does; in short, they're all just some sort of electronic
> > Lego to help connect PCI and PCIe devices into a tree).
> > 
> > Okay, I'll make yet another attempt at understanding these devices, and
> > suggesting how they can all be described in the XML. I'm thinking that
> > *all* of the express hubs, switch ports, bridges, etc can be described
> > in xml in the manner above, i.e.:
> > 
> >    <controller type='pci' index='n'>
> >      <model type='xxx'/>
> >    </controller>
> > 
> > and that the method for connecting a device to any of them would be by
> > specifying:
> > 
> >      <address type='pci' domain='n' bus='n' slot='n' function='n'/>
> > 
> > Any limitations about which devices/controllers can connect to which
> > controllers, and how many devices can connect to any particular
> > controller will be derived from the <model type='xxx'/>. (And, as we've
> > said before, although qemu doesn't assign each of these controllers a
> > numeric bus id, and although we can make no guarantee that the bus id we
> > use for a particular controller is what will be used by the guest
> > BIOS/OS, it's still a convenient notation and works well with other
> > hypervisors as well as qemu. I'll also note that when I run lspci on an
> > X58-based machine I have here, *all* of the relationships between all
> > the devices listed below are described with simple bus:slot.function
> > numbers.)
> > 
> > Here is a list of the pci controller model types and their restrictions
> > (thanks to mst and aw for repeating these over and over to me; I'm sure
> > I still have made mistakes, but at least it's getting closer).
> > 
> > 
> > <controller type='pci-root'>
> > ============================
> > 
> > Upstream:         nothing
> > Downstream:       only a single pci-root-bus (implied)
> > qemu commandline: nothing (it's implied in the q35 machinetype)
> > 
> > Explanation:
> > 
> > Each machine will have a different controller called "pci-root" as
> > outlined above by Daniel. Two types of pci-root will be supported:
> > i440FX and q35. If a pci-root is not spelled out in the config, one will
> > be auto-added (depending on machinetype).
> > 
> > An i440FX pci-root has an implicitly added pci-bridge at 0:0:0.0 (and
> > any bridge that has an address of slot='0' on its own bus is, by
> > definition, connected to a pci-root controller - the two are matched by
> > setting "domain" in the address of the pci-bridge to "index" of the
> > pci-root). This bridge can only have PCI devices added.
> > 
> > A q35 pci-root also implies a different kind of pci-bridge device - one
> > that can only have PCIe devices/controllers attached, but is otherwise
> > identical to the pci-bridge added for i440FX. This bus will be called
> > "root-bus" (Note that there are generally followed conventions for what
> > can be connected to which slot on this bus, and we will probably follow
> > those conventions when building a machine, *but* we will not hardcode
> > this convention into libvirt; each q35 machine will be an empty slate)
> > 
> > 
> > <controller type='pci'>
> > =======================
> > 
> > This will be used for *all* of the following controller devices
> > supported by qemu:
> > 
> > <model type='pcie-root-bus'/> (implicit/integrated)
> > ----------------------------
> > 
> > Upstream:         connect to pci-root controller *only*
> > Downstream:       32 slots, PCIe devices only, no hotplug.
> > qemu commandline: nothing (implicit in the q35-* machinetype)
> > 
> > This controller is the bus described above that connects to a q35's
> > pci-root, and provides places for PCIe devices to connect. Examples are
> > root-ports, dmi-to-pci-bridges sata controllers, integrated
> > sound/usb/ethernet devices (do any of those that can be connected to the
> > pcie-root-bus exist yet?).
> > 
> > There is only one of these controllers, and it will *always* be
> > index='0', and will always have the following address:
> > 
> >   <address type='pci' domain='0' bus='0' slot='0' function='0'/>
> 
> Implicit devices make me nervous, why wouldn't this just be a pcie-root
> (or pcie-host-bridge)?  If we want to support multiple host bridges,
> there can certainly be more than one, so the index='0' assumption seems
> to fall apart.
> 
> > <model type='root-port'/> (ioh3420)
> > -------------------------
> > 
> > Upstream:         PCIe, connect to pcie-root-bus *only* (?)
> 
> yes
> 
> > Downstream:       1 slot, PCIe devices only (?)
> 
> yes
> 
> > qemu commandline: -device ioh3420,...
> > 
> > These can only connect to the "pcie-root-bus" of of a q35 (implying that
> > this bus will need to have a different model name than the simple
> > "pci-bridge"
> > 
> > 
> > <model type='dmi-to-pci-bridge'/> (i82801b11-bridge)
> 
> I'm worried this name is either too specific or too generic.  What
> happens when we add a generic pcie-bridge and want to use that instead
> of the i82801b11-bridge?  The guest really only sees this as a
> PCIe-to-PCI bridge, it just happens that on q35 this attaches at the DMI
> port of the MCH.
> 
> > ---------------------------------
> > 
> > (btw, what does "dmi" mean?)
> 
> http://en.wikipedia.org/wiki/Direct_Media_Interface
> 
> > Upstream:         pcie-root-bus *only*
> 
> And only to a specific q35 slot (1e.0) for the i82801b11-bridge.
> 
> > Downstream:       32 slots, any PCI device, no hotplug (?)
> 
> Yet, but I think this is where we want to implement ACPI based hotplug.
> 
> > qemu commandline: -device i82801b11-bridge,...
> > 
> > 
> > <model type='upstream-switch-port'/> (x3130-upstream)
> > ------------------------------------
> > 
> > Upstream:         PCIe, connect to pcie-root-bus, root-port, or
> > downstream-switch-port (?)
> 
> yes
> 
> > Downstream:       32 slots, connect *only* to downstream-switch-port
> 
> I can't verify that there are 32 slots, mst?  I've only setup downstream
> ports within slot 0.
> 
> > qemu-commandline: -device x3130-upstream
> > 
> > 
> > This is the upper side of a switch that can multiplex multiple devices
> > onto a single port. It's only useful when one or more downstream switch
> > ports are connected to it.
> > 
> > <model type='downstream-switch-port'/> (xio3130-downstream)
> > --------------------------------------
> > 
> > Upstream:         connect *only* to upstream-switch-port
> > Downstream:       1 slot, any PCIe device
> > qemu commandline: -device xio3130-downstream
> > 
> > You can connect one or more of these to an upstream-switch-port in order
> > to effectively plug multiple devices into a single PCIe port.
> > 
> > <model type='pci-bridge'/> (pci-bridge)
> > --------------------------
> > 
> > Upstream:         PCI, connect to 1) pci-root, 2) dmi-to-pci-bridge, 3)
> > another pci-bridge
> > Downstream:       any PCI device, 32 slots
> > qemu commandline: -device pci-bridge,...
> > 
> > This differs from dmi-to-pci-bridge in that its upstream connection is
> > PCI rather than PCIe (so it will work on an i440FX system, which has no
> > root PCIe bus) and that hotplug is supported. In general, if a guest
> > will have any PCI devices, one of these controllers should be added, and
> > 
> > ===============================================================
> > 
> > 
> > Comment: I'm not quite convinced that we really need the separate
> > "pci-root" device. Since 1) every pci-root will *always* have either a
> > pcie-root-bus or a pci-bridge connected to it, 2) the pci-root-bus will
> > only ever be connected to the pci-root, and 3) the pci-bridge that
> > connects to it will need special handling within the pci-bridge case
> > anyway, why not:
> > 
> > 1) eliminate the separate pci-root controller type
> > 
> > 2) within <controller type='pci'>, a new <model type='pci-root-bus'/>
> > will be added.
> > 
> > 3) a pcie-root-bus will automatically be added for q35 machinetypes, and
> > pci-root-bus for any machinetype that supports a PCI bus (e.g. "pc-*")
> > 
> > 4) model type='pci-root-bus' will behave like pci-bridge, except that it
> > will be an implicit device (nothing on qemu commandline) and it won't
> > need an <address> element (neither will pcie-root-bus).
> 
> I think they should both have a domain + bus address to make it possible
> to build multi-domain/multi-host bridge systems.  They do not use any
> slots through.
> 
> > 5) to support multiple domains, we can simply add a "domain" attribute
> > to the toplevel of controller.
> > 
> 
> Or this Wouldn't even be unnecessary if we supported a 'pci-root-addr'
> address type for the above with the default being domain=0, bus=0?  I
> suppose it doesn't matter whether it's a separate attribute or new
> address type though.  Thanks,
> 
> Alex

Also AFAIK there's nothing in the spec that requires bus=0
to be root. The _BBN hack above is used sometimes to give !=0
bus numbers to roots.

-- 
MST




More information about the libvir-list mailing list