[libvirt] [PATCH 2/2] HACK: qemu: aarch64: Use virtio-pci if user specifies PCI controller

Thu Mar 10 22:46:53 UTC 2016

On 03/09/2016 02:09 PM, Laine Stump wrote:
> On 03/09/2016 09:54 AM, Daniel P. Berrange wrote:
>> On Wed, Mar 09, 2016 at 01:40:36PM +0100, Andrea Bolognani wrote:
>>> On Fri, 2016-03-04 at 17:05 -0500, Laine Stump wrote:
>>>>> I'm not sure I fully understand all of the above, but I'll pitch
>>>>> in with my own proposal regardless :)
>>>>>   First, we make sure that
>>>>>        <controller type='pci' index='0' model='pcie-root'/>
>>>>>   is always added automatically to the domain XML when using the
>>>>> mach-virt machine type. Then, if
>>>>>        <controller type='pci' index='1' model='dmi-to-pci-bridge'/>
>>>>>      <controller type='pci' index='2' model='pci-bridge'/>
>>>>>   is present as well we default to virtio-pci, otherwise we use
>>>>> the current default of virtio-mmio. This should allow management
>>>>> applications, based on knowledge about the guest OS, to easily
>>>>> pick between the two address schemes.
>>>>>   Does this sound like a good idea?
>>>> ... or a variation of that, anyway :-)
>>>>   What I think: If there are *any* pci controllers *beyond* pcie-root, or
>>>> if there are any devices that already have a PCI address, then assign
>>>> PCI addresses, else use mmio.
>>> This sounds reasonable.
>>>
>>> However, I'm wondering if we shouldn't be even more explicit about
>>> this... From libvirt's point of view we just need to agree on some
>>> sort of "trigger" that causes it to allocate PCI addresses instead
>>> of MMIO addresses, and for that purpose "any PCI controller or any
>>> device with a PCI address" is perfectly fine; looking at that from
>>> the point of view of an user, though? Not sure about it.
>>>
>>> What about adding something like
>>>
>>>    <preferences>
>>>      <preference name="defaultAddressType" value="pci"/>
>>>    </preferences>
>>>
>>> to the domain XML? AFAICT libvirt doesn't have any element that
>>> could be used for this purpose at the moment, but maybe having a
>>> generic way to set domain-wide preferences could be useful in
>>> other situations as well?
>> [snip]
>>
>> Looking at this mail and laine's before I really get the impression we
>> are over-thinking this all. The automatic address assignment by libvirt
>> was written such that it matched what QEMU would have done, so that we
>> could introduce the concept of device addressing without breaking
>> existing apps which didn't supply addresses. The automatic addressing
>> logic certainly wasn't ever intended to be able to implement arbitrary
>> policies.
>>
>> As a general rule libvirt has always aimed to stay out of the policy
>> business, and focus on providing the mechanism only. So when we start
>> talking about adding  <preferences> to the XML this is a sure sign
>> that we've gone too far into trying to implement policy in libvirt.
>>
>> >From the POV of being able to tell libvirt to assign PCI instead of
>> MMIO addresses for the virt machine, I think it suffices to have
>> libvirt accept a simple  <address type="pci"/> element (ie with no
>> bus/slot/func filled in).
> 
> That's just a more limited case of the same type of feature creep though -
> you're specifying a preference for what type of address you want, just in a
> different place. (The odd thing is that we're adding it only to remedy what is
> apparently a temporary problem, and even then it's a problem that wouldn't
> exist if we just said simply this:  "If virtio-mmio is specified, then use
> virtio-mmio; If no address is specified, use pci". This puts the burden on
> people insisting on the old / soon-to-be-deprecated (?) virtio-mmio address
> type to add a little something to the XML (and a very simple little something
> at that!). For a short time, this could cause problems for people who create
> new domains using qemu binaries that don't have a pci bus on aarch64/virt yet,
> but that will be easily solvable (by adding "<address type='virtio-mmio'/>",
> which is nearly as simple as typing "<address type='pci'/>").
> 
> 
> A downside of setting your preference with <address type="pci"/> vs. having
> the preference in a separate element is that you can no longer cause a
> "re-addressing" of all the devices by simply removing all the <address>
> elements (and it's really that that got me thinking about adding
> hints/preferences in the <target> element). I don't know how important that
> is; I suppose when the producer of the XML is some other software it's a
> non-issue, when the producer is a human using virsh edit it would make life
> much simpler once in awhile (and the suggestion in the previous paragraph
> would eliminate that problem anyway).
> 
> So is there a specific reason why we need to keep virtio-mmio as the default,
> and require explicitly saying "address type='pci'" in order to get a PCI address?
> 

I explained this in my reply to Dan just now, but basically all currently
released distros don't work with virtio-pci. However long term I've suggested
from the start that we switch to virtio-pci by default, keying off a new
enough -M virt version, as an arbitrary marker in time when hopefully most
common distros work with virtio-pci

>> If applictions care about assigning devices to specify PCI buses,
>> ie to distinguish between PCI & PCIe, or to pick a different bus
>> when there is one bus per NUMA node, then really it is time for
>> the application to start specifying the full <address> element
>> with all details, and not try to invent ever more complex policies
>> inside libvirt which will never deal with all application use cases.
> 
> First, I agree with your vigilance against unnecessary feature creep. Both
> because keeping things simple makes it easier to maintain and use, and also
> because it's very difficult (or even impossible) to change something once it's
> gone in, in the case that you have second thoughts.
> 
> But, expanding beyond just the aarch64/virt mmio vs. pci problem (which I had
> already done in this thread, and which is what got us into this "feature
> creep" discussion in the first place :-)...
> 
> I remember when we were first adding support for Q35 that we said it should be
> as easy to create a domain with Q35 machinetype as it is for 440fx, i.e. you
> should be able to do it without manually specifying any PCI addresses (that's
> why we have the strange default bus hierarchy, with a dmi-to-pci-bridge and a
> pci-bridge, and all devices going onto the pci-bridge). But of course 440fx is
> very simplistic - all slots are standard PCI and all are hotpluggable. On Q35
> there could be room for choices though, in particular PCI vs PCIe and
> hotpluggable vs. not. (and yeah, also what NUMA node the bus is on; I agree
> that one is "getting out there"). So since there are choices, libvirt has by
> definition begun implementing policy when it auto-assigns *any* PCI address
> (current policy is "require all auto-assigned device addresses to be
> hotpluggable standard PCI). And if libvirt doesn't provide an easy way to
> choose, it will have implemented a defacto mandatory policy (unless you force
> full specification of PCI addresses, which goes against the "make it easy" goal).
> 
> And the current policy is looking (for Q35 and aarch64/virt at least) less and
> less like the correct one - in the years since it was put in, 1) we've learned
> that qemu doesn't care, and will not ever care, that a PCI device has been
> plugged into a PCIe slot, 2) we now have support for several PCIe controllers
> we didn't previously have, 3) people have started complaining that their
> devices are in PCI slots rather than PCIe, 4) even though it was stated at the
> time I wrote the code to auto-add a pci-bridge to every Q35 domain that
> pci-bridge's failure to support hotplug was a temporary bug, I just learned
> yesterday that it apparently still doesn't work, and 5) some platforms (e.g.
> our favorite aarch64/virt) are emulating a hardware platform that has *never*
> had standard PCI slots, only PCIe.
> 
> Beyond that, there is no place that provides a simple encoding of which type
> of controller provides which type of slot, and what is allowed to plug into
> what. If you require the management application/human to manually specify all
> the PCI addresses as soon as they have a need for one of these basic
> characteristics, then not only has it become cumbersome to define a domain
> (because the management app has to maintain a data structure to keep track of
> which PCI addresses are in use and which aren't), but it means that the
> management application also needs to know all sorts of rules about which PCI
> controllers are actually pcie vs. pci, and which accept hotplug devices vs.
> which don't, as well as things like the min/max slot number for each
> controller, and which ones can plug into where, e.g. a pcie-root-port can only
> plug into pcie-root or pcie-expander-bus, and a pcie-switch-downstream-port
> can only plug into a pcie-switch-upstream-port, etc. Requiring a management
> app to get all of that right just so that they can pick between a hotpluggable
> and non-hotpluggable slot seems like an overly large burden (and prone to error).
> 
> In the end, if libvirt makes it simple for the management app to specify what
> kind of slot it wants, rather than requiring it to specify the exact slot,
> then libvirt isn't making any policy decisions, it's just making it easier for
> the management app to implement its own policy decisions, without requiring
> the management app to know all the details about which controller implements
> what kind of connection etc, and that does seem to fit within libvirt's purpose.

I guess the main thing would be to understand usecases... outside of aarch64
is there a real reason for q35  that apps might need one address policy over
another? If it's something super obscure, like only for testing or dev, then I
think asking people to do it manually is fine.

At least for the arm case I think we can side step the question by adding the
<address type='pci'/> allocation request, and flipping the default from
virtio-mmio to virtio-pci at some future point

- Cole