[libvirt] new libvirt "pci" controller type and pcie/q35 (was Re: [PATCH 4/7] add pci-bridge controller type)

Don Dutile ddutile at redhat.com
Mon Apr 15 22:14:19 UTC 2013


On 04/15/2013 04:09 PM, Laine Stump wrote:
> On 04/15/2013 06:29 AM, Daniel P. Berrange wrote:
>> On Fri, Apr 12, 2013 at 11:46:15AM -0400, Laine Stump wrote:
>>> On 04/11/2013 07:23 AM, Michael S. Tsirkin wrote:
>>>> On Thu, Apr 11, 2013 at 07:03:56AM -0400, Laine Stump wrote:
>>>>> On 04/10/2013 05:26 AM, Daniel P. Berrange wrote:
>>>>>>>> So if we later allowed for mutiple PCI roots, then we'd have something
>>>>>>>> like
>>>>>>>>
>>>>>>>>     <controller type="pci-root" index="0">
>>>>>>>>       <model name="i440FX"/>
>>>>>>>>     </controller>
>>>>>>>>     <controller type="pci-root" index="1">
>>>>>>>>       <model name="i440FX"/>
>>>>>>>>     </controller>
>>>>>>>>     <controller type="pci" index="0">  <!-- Host bridge 1 -->
>>>>>>>>       <address type='pci' domain='0' bus='0' slot='0''/>
>>>>>>>>     </controller>
>>>>>>>>     <controller type="pci" index="0">  <!-- Host bridge 2 -->
>>>>>>>>       <address type='pci' domain='1' bus='0' slot='0''/>
>>>>>>>>     </controller>
>>>
>>> There is a problem here - within a given controller type, we will now
>>> have the possibility of multiple controllers with the same index - the
>>> differentiating attribute will be in the<address>  subelement, which
>>> could create some awkwardness. Maybe instead this should be handled with
>>> a different model of pci controller, and we can add a "domain" attribute
>>> at the toplevel rather than specifying an<address>?
>> IIUC there is a limit on the number of PCI buses you can create per
>> domain, due to fixed size of PCI addresses. Google suggests to me
>> the limit is 256. So for domain 1, we could just start index at
>> 256, and domain 2 at 512, etc
>
> Okay. Whether we choose that method, or a separate domain attribute, I'm
> satisfied that we'll be able to find a way to solve it when the time
> comes (and it hasn't yet), so we can ignore that problem for now.
>
>
*PLEASE* don't create a new/competing naming/numbering scheme for differentiating
PCI domains.... as much as I dislike the overuse of the term 'domain', it's what
is used.  No sane person is going to look to assign PCI bus numbers > 256 in order
to get new/different domains.
The name sucks, but that's what it's called in the code, and what customers are use to.

>>
>>
>>> Comment: I'm not quite convinced that we really need the separate
>>> "pci-root" device. Since 1) every pci-root will *always* have either a
>>> pcie-root-bus or a pci-bridge connected to it, 2) the pci-root-bus will
>>> only ever be connected to the pci-root, and 3) the pci-bridge that
>>> connects to it will need special handling within the pci-bridge case
>>> anyway, why not:
>>>
>>> 1) eliminate the separate pci-root controller type
>> Ok, lets leave it out - we can always add it later if desired.
>
> Okay.
>

Not so fast.... something that represents the PCI Root Complex might be
handy -- error handling and embedded devices (like IOMMUs, intr-remapping table)
come to mind... ACPI tables (if they get duped from real systems) may
need unconventional naming schemes for qemu if an RC isn't modelled.

>>
>>> 2) within<controller type='pci'>, a new<model type='pci-root-bus'/>
>>> will be added.
>>>
>>> 3) a pcie-root-bus will automatically be added for q35 machinetypes, and
>>> pci-root-bus for any machinetype that supports a PCI bus (e.g. "pc-*")
>>>
>>> 4) model type='pci-root-bus' will behave like pci-bridge, except that it
>>> will be an implicit device (nothing on qemu commandline) and it won't
>>> need an<address>  element (neither will pcie-root-bus).
>>>
>>> 5) to support multiple domains, we can simply add a "domain" attribute
>>> to the toplevel of controller.
>> Or use index numbers modulo 256 to identify domain numbers.
>
>
> Right. One or the other. But we can defer that discussion.
>
Just say 'domain' .... again! ;-)

>
>
>> One note on q35 - we need to make sure whatever we do in terms of creating
>> default<controller>s in the XML 'just works' for applications. eg if they
>> define a guest using<type machine="q35">hvm</type>, and then add a
>> <interface>, it should do the right thing wrt PCI addressing/connectivity.
>> We must not require applications to manually add<controller>  elements
>> for q35 for things to work. Adding<controller>s must purely be an opt-in
>> for apps which have the detailed knowledge rquired&  need full control
>> over bus layout.
>
> Yep. What I see happening is that the place where we currently add
> default controllers will, in the future, automatically add this for
> machinetype pc* and rhel-*:
>
>     <controller type='pci'>  <!-- implied index='0' -->
>       <model type='pci-root'/>
>     </controller>
>
> and for machinetype q35* it will add (something like):
>
>     <controller type='pci'>  <!-- index='0' -->
>       <model type='pcie-root'/>
>     </controller>
>     <controller type='pci'>  <!-- index='1' -->
>       <model type='dmi-to-pci-bridge'/>
>       <address type='pci' bus='0' slot='0x1e'/>
>     </controller>
>     <controller type='pci'>  <!-- index='2' -->
>       <model type='pci-bridge'>
>       <address type='pci' bus='1' slot='1'/>
>     </controller>
>
> The slot-auto-reserve code will look through all pci controllers and
> only auto-reserve slots on controllers appropriate for the given device
> - controller 0 is already inappropriate for PCI devices, and we can mark
> the dmi-to-pci-bridge type as being inappropriate for auto-reserve
> (since, if I recall correctly, I was told that you can't hotplug devices
> on that bus). So, all new PCI devices in the config will get addresses
> with bus='2'.
>
> Of course this means that it will not be possible to switch an existing
> domain config from pc to q35 simply by changing the machinetype - the
> bus number in the address of all devices will need to be changed from 0
> to 2. But this is another case of "opt in", and already requires editing
> the domain config anyway. If someone creates a brand new q35 machine
> though, all PCI devices will get added with bus='whatever is the bus
> number of the first pci-root or pci-bridge controller' (in this case, '2').
>
> So, here are the proposed pci controller types cleaned up an
> re-summarized, followed by an example.
>
> <controller type='pci'>
> =======================
>
> This will be used for *all* of the following PCI controller devices
> supported by qemu:
>
>
> <model type='pci-root'/>  (implicit/integrated)
> ------------------------
>
> Upstream:         implicit connection to the host
> Downstream:       32 slots (slot 0 reserved), PCI devices only
> qemu commandline: nothing (implicit in the pc-* etc. machinetypes)
>
> This controller represents a pc* (or rhel-*) machine's integrated PCI
> bus (pci.0) and provides places for PCI devices to connect (including
> the "pci-bridge" type of PCI controller).
>
> There is only one of these controllers, and it will *always* be
> index='0', and will have no<address>  element.
>
ok.

>
> <model type='pcie-root'/>  (implicit/integrated)
> -------------------------
>
> Upstream:         implicit connection to the host
> Downstream:       32 slots (slot 0 reserved), PCIe devices only, no hotplug.
> qemu commandline: nothing (implicit in the q35-* machinetype)
>
> This controller represents a q35's PCI "root complex", and provides
> places for PCIe devices to connect. Examples are root-ports,
> dmi-to-pci-bridges sata controllers, integrated sound/usb/ethernet
> devices (do any of those integrated devices that can be connected to
> the pcie-root-bus exist yet?).
>
> There is only one of these controllers, and it will *always* be
> index='0', and will have no<address>  element.
>
ok.

>
> <model type='root-port'/>  (ioh3420)
> -------------------------
>
> Upstream:         PCIe, connect to pcie-root-bus *only* (?)
> Downstream:       1 slot, PCIe devices only (?)
> qemu commandline: -device ioh3420,...
>
> These can only connect to the "pcie-root" of a q35. Any PCIe
> devices can connect to it, including an upstream-switch-port.
>
ioh on q35; ich9/10/xx for other intel chipsets

>
> <model type='upstream-switch-port'/>  (x3130-upstream)
> ------------------------------------
>
> Upstream:         PCIe, connect to pcie-root-bus, root-port, or
>                    downstream-switch-port (?)
> Downstream:       32 slots, connect *only* to downstream-switch-port
> qemu-commandline: -device x3130-upstream
>
>
> This is the upper side of a switch that can multiplex multiple devices
> onto a single port. It's only useful when one or more downstream switch
> ports are connected to it.
>
>
> <model type='downstream-switch-port'/>  (xio3130-downstream)
> --------------------------------------
>
> Upstream:         connect *only* to upstream-switch-port
> Downstream:       1 slot, any PCIe device
> qemu commandline: -device xio3130-downstream
>
> You can connect one or more of these to an upstream-switch-port in order
> to effectively plug multiple devices into a single PCIe port.
>
ugh! one cannot have 3130-downstream w/o 3130upstream;
simplify: PCIe-PPB-up; PCIe-PPB-down -- then it can be anything (not TI, not IDT, not Intel, etc.).

>
> <model type='dmi-to-pci-bridge'/>  (i82801b11-bridge)
> ---------------------------------
>
> (btw, what does "dmi" mean?)
>
> Upstream:         pcie-root *only*
> Downstream:       32 slots, any PCI device (including "pci-bridge"),
>                    no hotplug (?)
> qemu commandline: -device i82801b11-bridge,...
>
> This is the gateway to the world of standard old PCI.
>
why needed?

>
> <model type='pci-bridge'/>  (pci-bridge)
> --------------------------
>
> Upstream:         PCI, connect to 1) pci-root, 2) dmi-to-pci-bridge
>                    3) another pci-bridge
> Downstream:       any PCI device, 32 slots
> qemu commandline: -device pci-bridge,...
>
> This differs from dmi-to-pci-bridge in that its upstream connection is
> PCI rather than PCIe (so it will work on an i440FX system, which has a
> pci-root rather than pcie-root) and that hotplug is supported. In
> general, if a guest will have any PCI devices, one of these
> controllers should be added, and the PCI devices connected to it
> rather than to the dmi-to-pci-bridge.
>
> ************************************
> (For q35, we *may* decide to always auto-add a dmi-to-pci-bridge at
> 00:1E.0, and a pci-bridge on slot 1 of the dmi-to-pci-bridge. This
> will allow a continuation of the tradition of simply adding new
> devices to the config without worrying about where they connect.)
>
>
> ============================================================================
> Just to make sure this config model will work, here is the XML to
> replicate the layout (only the ones involved in the PCI tree, along with
> 3 ethernet devices as examples) of the X58 hardware I have sitting under
> my desk (I've attached lspci and virsh nodedev-list --tree output from
> that machine):
>
>
>     <controller type='pci' index='0'>
>       <model type='pcie-root'/>
>     </controller>
>
>     <controller type='pci' index='1'>
>       <model type='root-port'/>
>       <address type='pci' bus='0' slot='1'/>
>     </controller>
>
> ( there is a scsi controller connected to bus='1')
>
>
>     <controller type='pci' index='2'>
>       <model type='root-port'/>
>       <address type='pci' bus='0' slot='3'/>
>     </controller>
>
> (the VGA controller is connected to bus='2')
>
>     <controller type='pci' index='3'>
>       <model type='root-port'/>
>       <address type='pci' bus='0' slot='7'/>
>     </controller>
>
> (PCIe SRIOV network card (in external PCIe slot) connected to bus='3')
>
>     <controller type='pci' index='4'>
>       <model type='root-port'/>
>       <address type='pci' bus='0' slot='0x1c' function='0'/>
>     </controller>
>
> (unused PCIe slot available on bus='4')
>
>     <!-- pcie-root (0:1c.4) ->  root-port (5:0.0) ->  onboard ethernet ->
>     <controller type='pci' index='5'>
>       <model type='root-port'/>
>       <address type='pci' bus='0' slot='0x1c' function='4'/>
>     </controller>
>     <interface type='blah'>
>       ...
>       <mac address='00:27:13:53:db:76'/>
>       <address type='pci' bus='5' slot='0' function='0'/>
>     </interface>
>
>     <!-- more complicated connection to 2nd systemboard ethernet -->
>     <!-- pcie-root ->(0:1c:5)root-port ->  (6:0.0)upstream-switch-port
>            ->  (7:3.0)downstream-switch-port ->  (9:0.0)ethernet -->
>     <controller type='pci' index='6'>
>       <model type='root-port'/>
>       <address type='pci' bus='0' slot='0x1c' function='5'/>
>     </controller>
>     <controller type='pci' index='7'>
>       <model type='upstream-switch-port'/>
>       <address type='pci' bus='6' slot='0' function='0'/>
>     </controller>
>     <controller type='pci' index='8'>
>       <model type='downstream-switch-port'/>
>       <address type='pci' bus='7' slot='2' function='0'/>
>     </controller>
>     <controller type='pci' index='9'>
>       <model type='downstream-switch-port'/>
>       <address type='pci' bus='7' slot='3' function='0'/>
>     </controller>
>     <interface type='blah'>
>       ...
>       <mac address='00:27:13:53:db:77'/>
>       <address type='pci' bus='9' slot='0' function='0'/>
>     </interface>
>
>
>     <!-- old-fashioned PCI ethernet in an external PCI slot -->
>     <controller type='pci' index='0x0a'>
>       <model type='dmi-to-pci-bridge'/>
>       <address type='pci' bus='0x1e' slot='0' function='0'/>
>     </controller>
>     <interface type='blah'>
>       ...
>       <mac address='00:03:47:7b:63:e6'/>
>       <address type='pci' bus='0x0a' slot='0x0e' function='0'/>
>     </interface>
>
> So I think this will all work. Does anyone see any problems?
>
> If not, then we can draw it all back to the *current* patchset - support
> for multiple PCI buses using the pci-bridge device. For *that*, we only
> need to implement the follow bits of the above:
>
> 1) There will be a new<controller type='pci'>  device, with a<model
> type='xyz'/>  subelement. Initially we will support types "pci-root" and
> "pci-bridge" (all the other types discussed above can be added later).
> pci-root will have *no<address>* element (and will generate nothing on
> the qemu commandline, but will create a 32 slot "bus='0'" to plug PCI
> devices into). pci-bridge will have an<address>  element, will generate
> a -device option on the qemu commandline, and will also create a 32 slot
> "bus='n'" to plug PCI devices into.
>
> 2) for machinetypes that have a PCI bus, the config should have this
> controller auto-added:
>
>     <controller type='pci'>
>       <model type='pci-root'/>
>     </controller>
>
> This will make bus='0' available (but add nothing to the qemu
> commandline). Any attempt to add a PCI device when there is no bus
> available should be an error.
>
> 3) The way to add more buses will be to add a controller like this:
>
>     <controller type='pci'>
>       <model type='pci-bridge'/>
>     </controller>
>
for legacy PCI, yes; but for PCIe, one needs PCIe-PPB-up & at least one PCI-PPB-down
One _cannot_ have just a single pci-bridge except as driving bridge from
a root-complex port.

> 4) When<controller type='usb'>  was added, resulting in auto-generated
> devices, that caused problems when migrating from a host with newer
> libvirt to one with older libvirt. We need to make sure we don't suffer
> the same problem this time. See the following two BZes for details
> (unless you have a better memory than me! :-):
>
>    https://bugzilla.redhat.com/show_bug.cgi?id=815503
>    https://bugzilla.redhat.com/show_bug.cgi?id=856864
>
> (and note how danpb eerily prophesied the current pending situation :-)
>
>
> I think everything else about Jan's/Liguang's pci-bridge patches can remain.
>




More information about the libvir-list mailing list