[libvirt] [PATCH 2/2] HACK: qemu: aarch64: Use virtio-pci if user specifies PCI controller

Fri Mar 4 22:05:18 UTC 2016

On 02/26/2016 10:13 AM, Andrea Bolognani wrote:
> On Wed, 2016-02-17 at 15:03 -0500, Laine Stump wrote:
>> On 01/28/2016 04:14 PM, Cole Robinson wrote:
>>>   
>>> If a user manually specifies this XML snippet for aarch64 machvirt:
>>>   
>>>      <controller type='pci' index='0' model='pci-root'/>
>> As you've noted below, this isn't correct. aarch64 machvirt has no
>> implicit pci-root controller (aka "pci.0"). It instead has a pcie-root
>> controller ("pcie.0"). Since a pci[e]-root controller cannot be
>> explicitly added, by definition this couldn't work.
>>   
>>>   
>>>   
>>> Libvirt will interpret this to mean that the OS supports virtio-pci,
>>> and will allocate PCI addresses (instead of virtio-mmio) for virtio
>>> devices.
>>>   
>>> This is a giant hack. Trying to improve it led me into the maze of PCI
>>> address code and I gave up for now. Here are the issues:
>>>   
>>> * I'd prefer that to be model='pcie-root' which matches what
>>> qemu-system-aarch64 -M virt actually provides by default... however
>>> libvirt isn't happy with a single pcie-root specified by the user, it
>>> will error with:
>>>   
>>> error: unsupported configuration: failed to create PCI bridge on bus 1: too many devices with fixed addresses
>> That's not the right error, but it's caused by the fact that libvirt
>> wants the pci-bridge device to be plugged into a standard PCI slot, but
>> all the slots of pcie-root are PCIe slots. Since we now know that qemu
>> doesn't mind if any standard PCI device is plugged into a PCIe slot,
> Should we rely on this behavior? Isn't this something that might
> change in the future? Or at least be quite puzzling for users?
>
> Just thinking out loud :)

It was my identical thinking that led to libvirt being initially very 
strict about plugging PCI into PCI and PCIe into PCIe. I've since 
received reasonable assurances that qemu will continue to be permissive 
about plugging PCI things into PCIe, so I allow it, but still default to 
"purity".

>
>> the
>> decision of how we want to solve this problem depends on whether or not
>> we want the devices in  question to be hot-pluggable - the ports of
>> pcie-root do not support hot-plugging devices (at least on Q35), while
>> the ports on pci-bridge do. So if we require that all devices be
>> hot-pluggable, then we have a few choices:
>>   
>> 1) create the same PCI controller Frankenstein we currently have for Q35
>> - a dmi-to-pci-bridge plugged into pcie-root, and a pci-bridge plugged
>> into dmi-to-pci-bridge. This is easiest because it already works, but it
>> does create an extra unnecessary controller.
> This is the current situation, right?
>
> qemu-kvm in current aarch64 RHEL doesn't have the i82801b11-bridge
> device compiled in, by the way. However, since qemu-system-aarch64
> in Fedora 23 *does* have it, I assume enabling it would simply be
> a matter of flipping a build configuration bit.
>
>> 2) auto-add a pci-bridge in cases when there is a pcie-root but not
>> standard PCI slots. This would take only a slight amount more work.
>>   
>> 3) auto-add a pcie-root-port to each port of the pcie-root controller.
>> This would still leave us with PCIe ports, so we would need to teach
>> libvirt that it's okay to plug PCI devices into PCIe ports.
> As mentioned above, I'm not sure this is a good idea. Maybe I'm just
> afraid of my own shadow though :)
>
>> If we don't require hot-pluggability, then we can just teach the
>> address-assignment code that PCI devices can plug into non-hotpluggable
>> PCIe ports and we're done.
>>   
>> Or we can do a hybrid that's kind of a continuation of the "use PCI if
>> it's available, otherwise mmio" - we could do this:
>>   
>> A) If there are any standard PCI slots, then auto-assign to PCI slots
>> (creating new pci-bridge controllers s necessary)
>>   
>> B) else if there are any PCIe slots, then auto-assign to hot-pluggable
>> PCIe if available, or straight PCIe if not.
>>   
>> C) else use virtio-mmio.
>>   
>> -------------------------------------------
>>   
>> Mixed in with all of this discussion is my thinking that we should have
>> some way to specify, in XML, constraints for the address of each device
>> *without specifying the address itself*. Things we need to be able to
>> specify:
>>   
>> 1) Is a PCI-only vs. PCIe-only vs. either one (maybe this could be used
>> in the future to constrain to virtio-mmio as well)?
>>   
>> 2) Must the device be hot-pluggable? (default would be yes)
>>   
>> 3) guest-side NUMA node? (I'm not sure if this needs to be user
>> specifiable - in the case of a vfio-assigned device, I think all we need
>> to to inform the guest which NUMA node the device is on in the host (via
>> putting it on a PXB controller that is configured with that same NUMA
>> node number). For emulated devices - is there any use to putting an
>> *emulated* device on the same controller as a particular vfio-assigned
>> device that is on a specific node? If not, then maybe it will never matter).
>>   
>> It would be better if these "address constraints" were in a different
>> part of the XML than the <address> element itself - this would maintain
>> the simplicity of being able to just remove all <address> elements in
>> order to force libvirt to re-assign all device addresses.
>>   
>> This isn't something that needs doing immediately, but worth keeping in
>> mind while putting together something that works for aarch64.
>>   
>>   
>>   
>>>   
>>>   
>>> Instead this patch uses hacks to make pci-root use the pcie.0 bus for
>>> aarch64, since that code path already works.
>> I think that's a dead-end that we would have to back-track on, so
>> probably not a good solution even temporarily.
>>   
>>   
>> Here's an attempt at a plan:
>>   
>> 1) change the PCI address assignment code so that for aarch64/virt it
>> prefers PCIe addresses, but still requires hot-pluggable (currently it
>> almost always prefers PCI, and requires hot-pluggable). (alternate - if
>> aarch64 doesn't support pcie-root-port or pcie-switch-*-port, then don't
>> require hot-pluggable either).
>>   
>> 2) put something on the front of that that checks for existence of
>> pcie-root, and if it's not found, uses virtio-mmio instead (is there
>> something already that auto-adds the virtio-mmio address? I haven't
>> looked and am too lazy to do so now).
>>   
>> At this point, as long as you manually add a bunch of pcie-root-port
>> controllers along with the manual pcie-root, everything should just
>> work. Then we would go to step 3:
>>   
>> 3) enhance the auto-assign code so that, in addition to auto-adding a
>> pci-bridge when needed, it would auto-add either a single pcie-root-port
>> or a pcie-switch-upstream-port and 32 pcie-switch-downstream-ports
>> anytime a hotpluggable PCIe port was needed and couldn't be found. (the
>> latter assumes that aarch64 supports those controllers).
>>   
>> Does that make any sense? I could try to code some of this up if you
>> could test it (or help me get setup to test it myself).
> I'm not sure I fully understand all of the above, but I'll pitch
> in with my own proposal regardless :)
>
> First, we make sure that
>
>    <controller type='pci' index='0' model='pcie-root'/>
>
> is always added automatically to the domain XML when using the
> mach-virt machine type. Then, if
>
>    <controller type='pci' index='1' model='dmi-to-pci-bridge'/>
>    <controller type='pci' index='2' model='pci-bridge'/>
>
> is present as well we default to virtio-pci, otherwise we use
> the current default of virtio-mmio. This should allow management
> applications, based on knowledge about the guest OS, to easily
> pick between the two address schemes.
>
> Does this sound like a good idea?

... or a variation of that, anyway :-)

What I think: If there are *any* pci controllers *beyond* pcie-root, or 
if there are any devices that already have a PCI address, then assign 
PCI addresses, else use mmio.

To make this more useful, we will want some enhancements:

1) add a "hotpluggable" attribute to <target> for every device (this 
will be a bit ugly, because there isn't a unified parser/formatter for 
the <target> element :-( )

2) add a "busHint" ("busType", "preferredBus", ??) attribute too, to 
allow specifying pci vs pcie (is it worth adding "mmio" here? Seems it 
will be deprecated before long anyway...). It will default to pcie 
rather than pci on platforms that support it.

3) change the auto-assign to pay attention to hotpluggable and 
preferredBus (damn! I need to think of a better name!)

This way someone can define a new aarch64 domain and just toss a bunch 
of pcie-root-ports into it:

   <controller type='pci' model='pcie-root-port'/>
   <controller type='pci' model='pcie-root-port'/>
   <controller type='pci' model='pcie-root-port'/>
   <controller type='pci' model='pcie-root-port'/>
   ...

then add a bunch of devices. The pcie-root-ports will be auto-assigned 
to ports on pcie.0, and the devices will be auto-assigned to root-ports. 
(alternately, they could toss in a bunch of 
pcie-downstream-switch-ports. These would trigger an auto-add of a 
pcie-upstream-switch-port, which would trigger an auto-add of a 
pcie-root-port, and those would all be connected).

(Ooh! Or instead of hotpluggable and preferredBus in the devices' 
<target>, just "preferredConnection" which would be exactly the model of 
pci controller preferred. So you'd do something like this:

    <interface type='network'>
      <target preferredConnection='pcie-switch-downstream-port'/>
      ...
    </interface>

This would look for an existing pcie-switch-downstream-port. If it 
couldn't find one available, it would add one (and all the underlying 
controllers necessary).

I'm just rambling now, so I'll stop. But I think a good first step is 
the simple thing in the "What I think" paragraph.