[libvirt] [RFC] qemu: Use virtio-pci by default for mach-virt guests

Mon Oct 24 15:17:07 UTC 2016

On Sat, 2016-10-22 at 23:07 -0400, Laine Stump wrote:
> (When I first saw your mail I didn't realize it was a patch, because it 
> didn't have "PATCH" in the subject)

Fair enough, will send the next iteration as [RFC PATCH v2] :)

> I like that this makes pci truly the default in a simple manner, but 
> still allows switching back to mmio if necessary. On the other hand, it 
> puts the potential "switch" to decide whether or not to use mmio for all 
> devices down into the config of a single device, which is a bit weird to 
> explain. (On the other hand, how often will mmio be used in the future? 
> Maybe it doesn't matter if it's weird to explain...)

We really want to push for virtio-pci going forward as it has
a number of advantages, but there are still legacy guest OSs
out there that don't support virtio-pci at all yet we still
want to be able to run.

virt-install is already capable of overriding the default
address type on a per-device basis, eg.

  --network network=default,model=virtio,address.type=pci

or

  --disk size=10,bus=virtio,address.type=virtio-mmio

I think we should have some sort of global switch that sets
address.type for all devices, so you don't have to repeat
yourself. Or does something like that exist already?

I'm also not sure whether virt-manager is using libosinfo,
but ideally the default address type would be something that
can be chosen automatically based on the guest OS. CC'ing
Cole and Pavel so they can provide some insights.

I feel like I've strayed quite a bit from the original
question, so let me get back to it: going forward, the only
situation where we'd find ourselves choosing virtio-mmio
addressess instead of virtio-pci addresses is when dealing
with legacy guests. We definitely don't want to mix
virtio-pci and virtio-mmio in the same guest. So this
approach, I think, makes perfect sense and "just works" in
all reasonable situations.

We should also document this and other stuff in the same
general area somewhere. I'll try not to forget about it by
the time I submit this for real, so I can include some sort
of attempt at documenting along with it.

[...]
> > > +static bool
> > > +qemuDomainAnyDeviceHasAddressOfType(virDomainDefPtr def,
> > > +                                    virDomainDeviceAddressType type)
> > > +{
> > > +    size_t i;
> > > +
> > 
> > It's super-easy to miss something here, moreover it's easy to forget
> > adding stuff here in the future.  You should either use
> > virDomainDeviceInfoIterate() or at least (not a fan of that) copy the
> > check from virDomainDeviceInfoIterateInternal() here, so that people are
> > forced to add new device types here.
> 
> I agree with this (and I wish that the address assignment used 
> virDomainDeviceInfoIterate() when assigning addresses for the same 
> reasons (for brevity and to be sure new device types aren't forgotten); 
> the problem is that the order of devices during address assignment is 
> different, which would result in different PCI addresses for the same 
> input XML if we were to changeit, so we're stuck with that particular 
> extra manual enumeration of all the devices. But definitely let's not 
> make another.)

While I was writing this POC, I kept thinking "there has to
be a better way"... Turns out I was right, and
virDomainDeviceInfoIterate() is exactly what I was looking
for. Thanks to both of you for pointing me in the right
direction! The next version will use it :)

[...]
> > > static void
> > > qemuDomainPrimeVirtioDeviceAddresses(virDomainDefPtr def,
> > >                                      virDomainDeviceAddressType type)
> > > @@ -390,6 +502,8 @@ static void
> > > qemuDomainAssignARMVirtioMMIOAddresses(virDomainDefPtr def,
> > >                                        virQEMUCapsPtr qemuCaps)
> > > {
> > > +    bool usesMMIO;
> > > +
> > >     if (def->os.arch != VIR_ARCH_ARMV7L &&
> > >         def->os.arch != VIR_ARCH_AARCH64)
> > >         return;
> > > @@ -398,9 +512,17 @@ 
> > > qemuDomainAssignARMVirtioMMIOAddresses(virDomainDefPtr def,
> > >           qemuDomainMachineIsVirt(def)))
> > >         return;
> > > 
> > > -    if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_MMIO)) {
> > > -        qemuDomainPrimeVirtioDeviceAddresses(
> > > -            def, VIR_DOMAIN_DEVICE_ADDRESS_TYPE_VIRTIO_MMIO);
> > > +    /* We use virtio-mmio by default on mach-virt guests only if 
> > > they already
> > > +     * have at least one virtio-mmio device: in all other cases, we 
> > > prefer
> > > +     * virtio-pci */
> > > +    usesMMIO = qemuDomainAnyDeviceHasAddressOfType(def,
> > > + VIR_DOMAIN_DEVICE_ADDRESS_TYPE_VIRTIO_MMIO);
> 
> Any reason you created the temporary variable just to use it once? Also, 
> by calling this unconditionally in advance you eliminate the possibility 
> of avoiding that big long enumeration of all the devices in the case 
> that qemuDomainMachineHasPCIeRoot() returns false.

Fixed in the next version.

> > > +    if (qemuDomainMachineHasPCIeRoot(def) && !usesMMIO) {
> > 
> > I'm guessing you are checking for the pcie-root so that we're not adding
> > it for users that don't want any.  That way for us to actually use
> > virtio-pci by default you need to add that as well, which you haven't
> > mentioned in the commit message.  Not sure which one of those things was
> > intentional.
> 
> Actually that's not true. By the time this function is called (during 
> device address assignment), default devices have already been added to 
> the domain, which will include pci-root or pcie-root as appropriate.

Except we will only add pcie-root to a virt guest if the QEMU
binary has the QEMU_CAPS_OBJECT_GPEX capability: that's the
reason why we can't just check to see if the machine type is
virt.

[...]
> > > +    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
> > > +      <model name='i82801b11-bridge'/>
> > > +      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' 
> > > function='0x0'/>
> > > +    </controller>
> > > +    <controller type='pci' index='2' model='pci-bridge'>
> > > +      <model name='pci-bridge'/>
> > > +      <target chassisNr='2'/>
> > > +      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' 
> > > function='0x0'/>
> > > +    </controller>
> 
> ?? It shouldn't have needed to add these...

[...]
> Aha! You need to add the .....DISABLE_LEGACY capability to the test 
> cases so that libvirt will recognize that it can put virtio devices on a 
> PCIE slot.

[...]
> This is also caused by lack of DISABLE_LEGACY

[...]
> As is this. Once you add that cap to the test cases (for both xml2xml 
> and xml2argv) the test cases should come out right.

Should have pointed that out, sorry. I was aware of that, but
this commit is not the place to add capabilities to existing
test cases: it should be done in a separate commit, either
before or after this one.

Or not done at all: I think it's good that our test cases are
not too uniform in this case, because they cover more use
cases and, in a way, reflect more realistically the kind of
guest configuration we can expect to have found their way
into the wild over time. So I'd vote for keeping it this way.

-- 
Andrea Bolognani / Red Hat / Virtualization