[libvirt] [RFC] qemu: Use virtio-pci by default for mach-virt guests

Mon Oct 24 16:04:32 UTC 2016

On 10/24/2016 02:43 AM, Martin Kletzander wrote:
> On Sat, Oct 22, 2016 at 11:07:39PM -0400, Laine Stump wrote:
>> (When I first saw your mail I didn't realize it was a patch, because it
>> didn't have "PATCH" in the subject)
>>
>> On 10/22/2016 05:50 PM, Martin Kletzander wrote:
>>> On Fri, Oct 21, 2016 at 07:24:28PM +0200, Andrea Bolognani wrote:
>>>> virtio-pci is the way forward for aarch64 guests: it's faster
>>>> and less alien to people coming from other architectures.
>>>> Now that guest support is finally getting there, we'd like to
>>>> start using it by default instead of virtio-mmio.
>>>>
>>>> Users and applications can already opt-in by explicitly using
>>>>
>>>>  <address type='pci'/>
>>>>
>>>> inside the relevant elements, but that's kinda cumbersome and
>>>> requires all users and management applications to adapt, which
>>>> we'd really like to avoid.
>>>>
>>>> What we can do instead is use virtio-mmio only if the guest
>>>> already has at least one virtio-mmio device, and use virtio-pci
>>>> in all other situations.
>>>>
>>>> That means existing virtio-mmio guests will keep using the old
>>>> addressing scheme, and new guests will automatically be created
>>>> using virtio-pci instead. Users can still override the default
>>>> in either direction.
>>>> ---
>>>> Sending this as an RFC for the time being because it clearly
>>>> needs some more polish, but I wanted to get the idea out
>>>> there sooner rather than later.
>>>>
>>>
>>> Makes sense for the non-user of this (or rather not-yet-user 
>>> maybe).  So
>>> I mention only few details inline.
>>
>> I like that this makes pci truly the default in a simple manner, but
>> still allows switching back to mmio if necessary. On the other hand, it
>> puts the potential "switch" to decide whether or not to use mmio for all
>> devices down into the config of a single device, which is a bit weird to
>> explain. (On the other hand, how often will mmio be used in the future?
>> Maybe it doesn't matter if it's weird to explain...)
>>
>>>
>>>> It needs to be applied on top of Laine's PCI series[1].
>>>>
>>>>
>>>> [1]
>>>> https://www.redhat.com/archives/libvir-list/2016-October/msg00699.html
>>>>
>>>> src/qemu/qemu_domain_address.c                     | 128
>>>> ++++++++++++++++++++-
>>>> ...l2argv-aarch64-virt-2.6-virtio-pci-default.args |  14 ++-
>>>> .../qemuxml2argv-aarch64-virtio-pci-default.args   |  14 ++-
>>>> .../qemuxml2xmlout-aarch64-virtio-pci-default.xml  |  24 +++-
>>>> 4 files changed, 162 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/src/qemu/qemu_domain_address.c
>>>> b/src/qemu/qemu_domain_address.c
>>>> index f27d1e3..7f07764 100644
>>>> --- a/src/qemu/qemu_domain_address.c
>>>> +++ b/src/qemu/qemu_domain_address.c
>>>> @@ -265,6 +265,118 @@
>>>> qemuDomainAssignSpaprVIOAddresses(virDomainDefPtr def,
>>>> }
>>>>
>>>>
>>>> +static bool
>>>> +qemuDomainAnyDeviceHasAddressOfType(virDomainDefPtr def,
>>>> + virDomainDeviceAddressType type)
>>>> +{
>>>> +    size_t i;
>>>> +
>>>
>>> It's super-easy to miss something here, moreover it's easy to forget
>>> adding stuff here in the future.  You should either use
>>> virDomainDeviceInfoIterate() or at least (not a fan of that) copy the
>>> check from virDomainDeviceInfoIterateInternal() here, so that people 
>>> are
>>> forced to add new device types here.
>>
>> I agree with this (and I wish that the address assignment used
>> virDomainDeviceInfoIterate() when assigning addresses for the same
>> reasons (for brevity and to be sure new device types aren't forgotten);
>> the problem is that the order of devices during address assignment is
>> different, which would result in different PCI addresses for the same
>> input XML if we were to changeit, so we're stuck with that particular
>> extra manual enumeration of all the devices. But definitely let's not
>> make another.)
>>
>
> I'm not sure, but I would bet quite some sum of money thatwe don't have
> to guarantee how we allocate addresses for XMLs that don't have any
> address assigned.  That's because it wouldn't make any sense to
> guarantee that and no piece of code should depend on that.

Yes and no. Certainly it would be bad to knowingly make code dependent 
on it, but back when explicit PCI address support was first added, it 
was implemented (I *think* - I wasn't around then, or at least wasn't 
paying attention to PCI addresses) such that the explicit addresses 
assigned by libvirt would exactly match the addresses that would have 
been assigned by qemu if we didn't give any address info on the 
commandline. This was done to make sure that guests be started on newer 
libvirt/qemu would experience no changes in guest ABI. In other words, 
the order of devices in address assignment was  very important.

At this point it's almost a certainty that all persistent guests that 
were originally defined on libvirt old enough to not have explicit PCI 
addresses have had their config modified and resaved at least once, so 
changing the order of devices in PCI address allocation *probably* 
wouldn't have an affect on them. However, there is still the case of 
transient guests, where libvirt is handed the "original" config each 
time the guest is started. If there is any management application naive 
enough to do that without "priming" the PCI addresses, then we need to 
preserve the order of assigning addresses.

(Personally I'd be completely happy to start using 
virDomainDeviceInfoIterate() to assign addresses (Damn the torpedoes! 
Full steam ahead!!), and I've even made the function that calculates 
pciConnectFlags so that it will put a 0 in the flags for those devices 
that don't need a PCI address, which should make it quite easy to 
implement. I wouldn't want the finger pointed at me as the cause of some 
horrid regression in ovirt or whatever random management application 
though :-)

>
> Having said that, I should probably clean-up the patches for the address
> assignment and send them to the list as it looks like there will be no
> future work from the GSoC student we had for that.

Anything PCI-related may run into lots of merge conflicts. Especially 
after my next series...