[libvirt] [PATCH 2/2] HACK: qemu: aarch64: Use virtio-pci if user specifies PCI controller

Fri Feb 26 15:13:43 UTC 2016

On Wed, 2016-02-17 at 15:03 -0500, Laine Stump wrote:
> On 01/28/2016 04:14 PM, Cole Robinson wrote:
> > 
> > If a user manually specifies this XML snippet for aarch64 machvirt:
> > 
> >    <controller type='pci' index='0' model='pci-root'/>
> As you've noted below, this isn't correct. aarch64 machvirt has no 
> implicit pci-root controller (aka "pci.0"). It instead has a pcie-root 
> controller ("pcie.0"). Since a pci[e]-root controller cannot be 
> explicitly added, by definition this couldn't work.
> 
> > 
> > 
> > Libvirt will interpret this to mean that the OS supports virtio-pci,
> > and will allocate PCI addresses (instead of virtio-mmio) for virtio
> > devices.
> > 
> > This is a giant hack. Trying to improve it led me into the maze of PCI
> > address code and I gave up for now. Here are the issues:
> > 
> > * I'd prefer that to be model='pcie-root' which matches what
> > qemu-system-aarch64 -M virt actually provides by default... however
> > libvirt isn't happy with a single pcie-root specified by the user, it
> > will error with:
> > 
> > error: unsupported configuration: failed to create PCI bridge on bus 1: too many devices with fixed addresses
> That's not the right error, but it's caused by the fact that libvirt 
> wants the pci-bridge device to be plugged into a standard PCI slot, but 
> all the slots of pcie-root are PCIe slots. Since we now know that qemu 
> doesn't mind if any standard PCI device is plugged into a PCIe slot,

Should we rely on this behavior? Isn't this something that might
change in the future? Or at least be quite puzzling for users?

Just thinking out loud :)

> the 
> decision of how we want to solve this problem depends on whether or not 
> we want the devices in  question to be hot-pluggable - the ports of 
> pcie-root do not support hot-plugging devices (at least on Q35), while 
> the ports on pci-bridge do. So if we require that all devices be 
> hot-pluggable, then we have a few choices:
> 
> 1) create the same PCI controller Frankenstein we currently have for Q35 
> - a dmi-to-pci-bridge plugged into pcie-root, and a pci-bridge plugged 
> into dmi-to-pci-bridge. This is easiest because it already works, but it 
> does create an extra unnecessary controller.

This is the current situation, right?

qemu-kvm in current aarch64 RHEL doesn't have the i82801b11-bridge
device compiled in, by the way. However, since qemu-system-aarch64
in Fedora 23 *does* have it, I assume enabling it would simply be
a matter of flipping a build configuration bit.

> 2) auto-add a pci-bridge in cases when there is a pcie-root but not 
> standard PCI slots. This would take only a slight amount more work.
> 
> 3) auto-add a pcie-root-port to each port of the pcie-root controller. 
> This would still leave us with PCIe ports, so we would need to teach 
> libvirt that it's okay to plug PCI devices into PCIe ports.

As mentioned above, I'm not sure this is a good idea. Maybe I'm just
afraid of my own shadow though :)

> If we don't require hot-pluggability, then we can just teach the 
> address-assignment code that PCI devices can plug into non-hotpluggable 
> PCIe ports and we're done.
> 
> Or we can do a hybrid that's kind of a continuation of the "use PCI if 
> it's available, otherwise mmio" - we could do this:
> 
> A) If there are any standard PCI slots, then auto-assign to PCI slots 
> (creating new pci-bridge controllers s necessary)
> 
> B) else if there are any PCIe slots, then auto-assign to hot-pluggable 
> PCIe if available, or straight PCIe if not.
> 
> C) else use virtio-mmio.
> 
> -------------------------------------------
> 
> Mixed in with all of this discussion is my thinking that we should have 
> some way to specify, in XML, constraints for the address of each device 
> *without specifying the address itself*. Things we need to be able to 
> specify:
> 
> 1) Is a PCI-only vs. PCIe-only vs. either one (maybe this could be used 
> in the future to constrain to virtio-mmio as well)?
> 
> 2) Must the device be hot-pluggable? (default would be yes)
> 
> 3) guest-side NUMA node? (I'm not sure if this needs to be user 
> specifiable - in the case of a vfio-assigned device, I think all we need 
> to to inform the guest which NUMA node the device is on in the host (via 
> putting it on a PXB controller that is configured with that same NUMA 
> node number). For emulated devices - is there any use to putting an 
> *emulated* device on the same controller as a particular vfio-assigned 
> device that is on a specific node? If not, then maybe it will never matter).
> 
> It would be better if these "address constraints" were in a different 
> part of the XML than the <address> element itself - this would maintain 
> the simplicity of being able to just remove all <address> elements in 
> order to force libvirt to re-assign all device addresses.
> 
> This isn't something that needs doing immediately, but worth keeping in 
> mind while putting together something that works for aarch64.
> 
> 
> 
> > 
> > 
> > Instead this patch uses hacks to make pci-root use the pcie.0 bus for
> > aarch64, since that code path already works.
> I think that's a dead-end that we would have to back-track on, so 
> probably not a good solution even temporarily.
> 
> 
> Here's an attempt at a plan:
> 
> 1) change the PCI address assignment code so that for aarch64/virt it 
> prefers PCIe addresses, but still requires hot-pluggable (currently it 
> almost always prefers PCI, and requires hot-pluggable). (alternate - if 
> aarch64 doesn't support pcie-root-port or pcie-switch-*-port, then don't 
> require hot-pluggable either).
> 
> 2) put something on the front of that that checks for existence of 
> pcie-root, and if it's not found, uses virtio-mmio instead (is there 
> something already that auto-adds the virtio-mmio address? I haven't 
> looked and am too lazy to do so now).
> 
> At this point, as long as you manually add a bunch of pcie-root-port 
> controllers along with the manual pcie-root, everything should just 
> work. Then we would go to step 3:
> 
> 3) enhance the auto-assign code so that, in addition to auto-adding a 
> pci-bridge when needed, it would auto-add either a single pcie-root-port 
> or a pcie-switch-upstream-port and 32 pcie-switch-downstream-ports 
> anytime a hotpluggable PCIe port was needed and couldn't be found. (the 
> latter assumes that aarch64 supports those controllers).
> 
> Does that make any sense? I could try to code some of this up if you 
> could test it (or help me get setup to test it myself).

I'm not sure I fully understand all of the above, but I'll pitch
in with my own proposal regardless :)

First, we make sure that

  <controller type='pci' index='0' model='pcie-root'/>

is always added automatically to the domain XML when using the
mach-virt machine type. Then, if

  <controller type='pci' index='1' model='dmi-to-pci-bridge'/>
  <controller type='pci' index='2' model='pci-bridge'/>

is present as well we default to virtio-pci, otherwise we use
the current default of virtio-mmio. This should allow management
applications, based on knowledge about the guest OS, to easily
pick between the two address schemes.

Does this sound like a good idea?

Cheers.

-- 
Andrea Bolognani
Software Engineer - Virtualization Team