[libvirt] [Qemu-devel] clean/simple Q35 support in libvirt+QEMU for guest OSes that don't support virtio-1.0

Fri Aug 17 09:29:36 UTC 2018

On Thu, Aug 16, 2018 at 06:20:29PM -0400, Laine Stump wrote:
> Summary of the problem:
> 
> 1) We want to persuade libvirt+QEMU users to move away from the i440fx
> machinetype in favor of Q35. (NB: Someday this *might* lead to the
> ability to deprecate and even remove the 440fx machinetype, but even if
> that were to happen, it would be a *very long* time from now, so this
> discussion is *not* about that!)

There are plenty of OS that will never support Q35 and are still interesting
to use under Q35. The set which could use Q35, but lack virtio1.0 is fairly
small. So removal of i440fx is really only something for downstream KVM vendors
to consider. Those vendors only care about modern OS, but upstream is much
more open minded about what QEMU is used for, so I see it probably living
forever in upstream, or at least long enough that current maintaniers will
be retired ;-P.

> 2) When Q35 machinetype is used, libvirt assigns virtio devices to a
> slot on a PCI Express controller (because why have modern PCIe
> controllers/slots available but force everything onto clunky old legacy
> controllers?).
> 
> 3) When a virtio device is plugged into an Express controller, QEMU
> disables the device's IO port space, and it is put into "modern-only"
> mode (this is done to avoid a rapid exhaustion of limited IO port space).
> 
> 4) modern-only virtio devices won't work with a legacy (virtio-0.9-only)
> guest driver, because virtio-0.9 requires IO port space.
> 
> 5) Some guest OSes that we still want to support (and which would
> otherwise work okay on a Q35 virtual machine) have virtio drivers too
> old to support virtio-1.0 (CentOS6 and RHEL6 are examples of such OSes),
> but due to the chain of reasons listed above, the "standard" config for
> a Q35 guest generated by libvirt doesn't support virtio-0.9, hence
> doesn't support these guest OSes.

Note when talking about "support" you're really saying it from the
downstream vendor, specifically RHEL, POV. From upstream or Fedora POV
essentially all x86 OS ever made are in scope for running under QEMU
if suitable virtual hardware models have been provided. QEMU doesn't
maintain any whitelist of "supported" OS that differs from what is
technically capable of being run, in the way downstream vendors do.

> And here's a list of possible solutions to this problem (note that
> "consumers" means management applications such as OpenStack, oVirt,
> virt-manager, virt-install, gnome-boxes, etc. In all cases, it's assumed
> that the consumer's decision on the action to take will be based on
> information from libosinfo). For completeness, I've included even the
> possibilities that have been rejected, along with a brief synopsis of
> (at least part of) the reason for rejection:
> 
>   (1) Add some way libvirt consumers can ask libvirt to place
>       virtio devices on a legacy pci slot instead of pcie when
>       the machinetype is q35 (qemu sets virtio devices in legacy
>       PCI slots to transitional mode, so io port space is enabled
>       and virtio-0.0 drivers will work).
> 
>       This has been proposed on libvir-list, but rejected. Here is
>       the most elquently stated reasoning for the rejection I could
>       find (with thanks to Dan Berrange):
> 
>          The domain XML is a way to express the configuration
>          of the guest virtual machine.  What we're talking about
>          here is a policy tunable for an internal libvirt QEMU
>          driver algorithm, as so does not belong anywhere in the
>          domain XML.

Indeed, that's a guiding principal in general, not just for this PCI
question.

>   (2) Add full-blown pci enumeration support to all libvirt consumers
>       (i.e. they will need to build a model of the PCI bus topology
>       of each guest, and keep track of which addresses are in use).
>       They can then manually place virtio devices on legacy pci slots
>       (again, triggering transitional mode) when the intended guest
>       OS doesn't support virtio-0.9.
> 
>       (This is seen as requiring too much duplicated effort for
>       development and support/maintenance, since up until now libvirt
>       has been the single point of action for PCI address assignment
>       (well, QEMU can do it too, but for > 10 years libvirt has
>       *always* provided full PCI addresses for all devices)

It really depends on the scope of the mgmt app - at some point the mgmt
apps needs to take charge to some degree if it has particular ideas
about how a machine should look. Libvirt's placement strategy is a good
default for 95% of use cases, but it'll never be 100%. An example is
setting up a particular PCI topology that is guest NUMA node aware,
using expander buses.

So some apps might take this option, but in the common case it is
undesirable.

>   (3) Add virtio-1.0 support to all guest OSes. If this is done,
>       existing libvirt configs will work.
> 
>       (Aside from the difficulty of backporting, and the fact that
>       there are going to be some OSes that don't get it *at all*,
>       there will always be older releases that haven't gotten the
>       backport. So this isn't a complete solution).

Yep, there will always be guest OS that don't support 1.0. So that's
only a solution if the person who cares about Q35 support also controls
the guest OS in question.

>   (4) Consumers can continue using the 440fx machinetype for guest
>       OSes that don't support virtio-0.9
> 
>       (This would work, but perpetuates use of the 440fx
>       machinetype, and all for just this one reason (at least in
>       the case of CentOS6/RHEL6, which otherwise work just fine with
>       Q35)).

>From an usptream POV this is always going to be the case. This is
really only an undesirable thing for downstream who are trying to
artificially restrict what QEMU features users have available to
them.

>   (5) Introduce  virtio-0.9, virtio-1.0 models in libvirt
>       which are explicitly legacy-only and modern-only.
>       QEMU doesn't need to change, as libvirt can simply set
>       the right params on existing QEMU models to force the
>       behavior.
> 
>       (NB: it's unclear to me whether virtio-0.9 simply won't
>       work without forcing the device to be on a legacy PCI
>       slot, or if that's just "a very bad idea" because it
>       will mean that the device uses up extra io port space)

> As a starter for continuing the discussion, it seems to me that for
> option (5):
> 
> a) we don't really need the virtio-1.0 model, since that's what you
> currently get anyway when you ask for "virtio" on Q35 (and on 440fx,
> "virtio" gives you transitional, which works for everybody).

At some point we might have a virtio-2.0 and find ourselves in a
similar problem again. IMHO it is preferrable to have both explicit
versioned models, and discourage use of the magical 'virtio' model from
mgmt apps. Use libosinfo to identify which virtio model is supported
for the OS in question and use that explicitly.  Only use the magical
'virtio' model if there's no information about what version the OS
supports.

> b) Rather than a "legacy-only" model for virtio-0.9, it would be more
> useful to have "transitional". This way the config would work for older
> OSes that don't support virtio-1.0, and when/if the OS was upgraded such
> that it supported virtio-1.0, that would be automatically used without
> needing to change the config.

I don't think the case of OS suddenly gaining support for 1.0 in an update
is frequent enough to be worth worrying about.

> c) Even if it's possible to force a device on an Express slot into
> transitional mode, this is extremely wasteful of io port space, so
> libvirt should consider virtio-0.9 devices to be legacy PCI, and thus
> plug them into legacy PCI slots. And once we're doing this, it's
> unnecessary to add any extra option to the qemu commandline to force
> legacy support (i.e. transitional mode), as that is what QEMU already
> does when the device is connected to a legacy PCI slot.

Yes, it should plug them into legacy PCI slots by default, but if a
mgmt app has done explicit placement itself, it should honour that
even if it means wasting IO space.

> So making the naive assumption that we agree on implementing option (5)
> and there are no objections to my points a-c (Hah! As if!), how does
> this sound as a plan:
> 
> 
> A) libosinfo starts telling consumers that the preferred virtio device
> model for the relevant OSes is "virtio-0.9", and leaves the
> recommendation for other OSes as "virtio".

Libosinfo already uses 'virtio' as the prefix identifying virtio-0.9
support (the old PCI product IDs), and 'virtio-1.0' as the prefix for
identifying virtio-1.0 support (the new PCI product IDs).  That these
don't match libvirt model names doesn't matter.

> B) libvirt adds a "virtio-0.9" model for all virtio devices that
> actually have virtio-0.9 support (a couple of devices never existed
> prior to virtio-1.0 (rng and ???) so virtio-0.9 would be nonsensical for
> them).

> 
> C) inside libvirt, the implementation of the "virtio-0.9" model is
> identical to "virtio", except that the VIR_PCI_CONNECT_TYPE flags for
> these devices contain VIR_PCI_CONNECT_TYPE_PCI rather than
> VIR_PCI_CONNECT_TYPE_PCIE, resulting in those devices being assigned to
> a legacy PCI slot, and thus they would be transitional mode by default.

For 'virtio-0.9' libvirt should set "disable-modern=yes" in QEMU args

For 'virtio-1.0' libvirt should set "disable-legacy=yes" in QEMU args

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|