libvirt-devaddr: a new library for device address assignment

Dan Kenigsberg danken at redhat.com
Sat Mar 14 10:32:54 UTC 2020


On Fri, Mar 13, 2020 at 12:47 PM Daniel P. Berrangé <berrange at redhat.com> wrote:
>
> On Fri, Mar 13, 2020 at 11:23:44AM +0200, Dan Kenigsberg wrote:
> > On Wed, 4 Mar 2020, 14:51 Daniel P. Berrangé, <berrange at redhat.com> wrote:
> > >
> > > We've been doing alot of refactoring of code in recent times, and also
> > > have plans for significant infrastructure changes. We still need to
> > > spend time delivering interesting features to users / applications.
> > > This mail is to introduce an idea for a solution to an specific
> > > area applications have had long term pain with libvirt's current
> > > "mechanism, not policy" approach - device addressing. This is a way
> > > for us to show brand new ideas & approaches for what the libvirt
> > > project can deliver in terms of management APIs.
> > >
> > > To set expectations straight: I have written no code for this yet,
> > > merely identified the gap & conceptual solution.
> > >
> > >
> > > The device addressing problem
> > > =============================
> > >
> > > One of the key jobs libvirt does when processing a new domain XML
> > > configuration is to assign addresses to all devices that are present.
> > > This involves adding various device controllers (PCI bridges, PCI root
> > > ports, IDE/SCSI buses, USB controllers, etc) if they are not already
> > > present, and then assigning PCI, USB, IDE, SCSI, etc, addresses to each
> > > device so they are associated with controllers. When libvirt spawns a
> > > QEMU guest, it will pass full address information to QEMU.
> > >
> > > Libvirt, as a general rule, aims to avoid defining and implementing
> > > policy around expansion of guest configuration / defaults, however, it
> > > is inescapable in the case of device addressing due to the need to
> > > guarantee a stable hardware ABI to make live migration and save/restore
> > > to disk work.  The policy that libvirt has implemented for device
> > > addressing is, as much as possible, the same as the addressing scheme
> > > QEMU would apply itself.
> > >
> > > While libvirt succeeds in its goal of providing a stable hardware API,
> > > the addressing scheme used is not well suited to all deployment
> > > scenarios of QEMU. This is an inevitable result of having a specific
> > > assignment policy implemented in libvirt which has to trade off mutually
> > > incompatible use cases/goals.
> > >
> > > When the libvirt addressing policy is not been sufficient, management
> > > applications are forced to take on address assignment themselves,
> > > which is a massive non-trivial job with many subtle problems to
> > > consider.
> > >
> > > Places where libvirt's addressing is insufficient for PCI include
> > >
> > >  * Setting up multiple guest NUMA nodes and associating devices to
> > >    specific nodes
> > >  * Pre-emptive creation of extra PCIe root ports, to allow for later
> > >    device hotplug on PCIe topologies
> > >  * Determining whether to place a device on a PCI or PCIe bridge
> > >  * Controlling whether a device is placed into a hotpluggable slot
> > >  * Controlling whether a PCIe root port supports hotplug or not
> > >  * Determining whether to places all devices on distinct slots or
> > >    buses, vs grouping them all into functions on the same slot
> > >  * Ability to expand the device addressing without being on the
> > >    hypervisor host
> >
> > (I don't understand the last bullet point)
>
> I'm not sure if this is still the case, but at some point in time
> there was a desire from KubeVirt to be able to expand the users'
> configuration when loaded in KubeVirt, filling in various defaults
> for devices. This would run when the end user YAML/JSON config
> was first posted to the k8s API for storage, some arbitrary amount
> of time later the config gets chosen to run on a virtualization
> host at which point it is turned into libvirt domain XML.

Ah, I did not hear about this before, but I see why something like
this would be useful even without libvirt-devaddr. Having something
like virDomainDryRunXML() would have eliminated old race conditions we
had in oVirt.

>
> > > Libvirt wishes to avoid implementing many different address assignment
> > > policies. It also wishes to keep the domain XML as a representation
> > > of the virtual hardware, not add a bunch of properties to it which
> > > merely serve as tunable input parameters for device addressing
> > > algorithms.
> > >
> > > There is thus a dilemma here. Management applications increasingly
> > > need fine grained control over device addressing, while libvirt
> > > doesn't want to expose fine grained policy controls via the XML.
> > >
> > >
> > > The new libvirt-devaddr API
> > > ===========================
> > >
> > > The way out of this is to define a brand new virt management API
> > > which tackles this specific problem in a way that addresses all the
> > > problems mgmt apps have with device addressing and explicitly
> > > provides a variety of policy impls with tunable behaviour.
> > >
> > > By "new API", I actually mean an entirely new library, completely
> > > distinct from libvirt.so, or anything else we've delivered so
> > > far. The closest we've come to delivering something at this kind
> > > of conceptual level, would be the abortive attempt we made with
> > > "libvirt-builder" to deliver a policy-driven API instead of mechanism
> > > based. This proposal is still quite different from that attempt.
> > >
> > > At a high level
> > >
> > >  * The new API is "libvirt-devaddr" - short for "libvirt device addressing"
> > >
> > >  * As input it will take
> > >
> > >    1. The guest CPU architecture and machine type
> > >    2. A list of global tunables specifying desired behaviour of the
> > >       address assignment policy
> > >    3. A minimal list of devices needed in the virtual machine, with
> > >       optional addresses and optional per-device tunables to override
> > >       the global tunables
> > >
> > >  * As output it will emit
> > >
> > >    1. fully expanded list of devices needed in the virtual machine,
> > >       with addressing information sufficient to ensure stable hardware ABI
> > >
> > > Initially the API would implement something that behaves the same
> > > way as libvirt's current address assignment API.
> > >
> > > The intended usage would be
> > >
> > >  * Mgmt application makes a minimal list of devices they want in
> > >    their guest
> > >  * List of devices is fed into libvirt-devaddr API
> > >  * Mgmt application gets back a full list of devices & addresses
> > >  * Mgmt application writes a libvirt XML doc using this full list &
> > >    addresses
> > >  * Mgmt application creates the guest in libvirt
> > >
> > > IOW, this new "libvirt-devaddr" API is intended to be used prior to
> > > creating the XML that is used by libvirt. The API could also be used
> > > prior to needing to hotplug a new device to an existing guest.
> > > This API is intended to be a deliverable of the libvirt project, but
> > > it would be completely independent of the current libvirt API. Most
> > > especially note that it would NOT use the domain XML in any way.
> > > This gives applications maximum flexibility in how they consume this
> > > functionality, not trying to force a way to build domain XML.
> >
> > This procedure forces Mgmt to learn a new language to describe device
> > placement. Mgmt (or should I just say "we"?) currently expresses the
> > "minimal list of devices" in XML form and pass it to libvirt. Here we
> > are asked to pass it once to libvirt-devaddr, parse its output, and
> > feed it as XML to libvirt.
>
> I'm not neccessarily suggesting we even need a document format the
> core API level. I could easily see the API working in terms of a
> list of Go structs, with tunables being normal method parameters.
> A JSON format could be an optional way to serialize the Go structs,
> but if the app were written in Go the JSON may not be needed at all.
>
> > I believe it would be easier to use the domxml as the base language
> > for the new library, too. libvirt-devaddr would accept it with various
> > hints (expressed as its own extension to the XML?) such as "place all
> > of these devices in the same NUMA node", "keep on root bus" or
> > "separate these two chattering devices to their own bus". The output
> > of libvirt-devaddr would be a domxml with <devices> filled with
> > controllers and addresses, readily available for consumption by
> > libvirt.
>
> I don't believe that using the libvirt domain XML is a good idea for
> this as it uneccesssarily constrains the usage scenarios. Most management
> applications do not use the domain XML as their canonical internal storage
> format. KubeVirt has its JSON/YAML schema for k8s API, OpenStack/RHEV just
> store metadata in their DB, others vary again. Some of these applications
> benefit from being able to expand device topology/addressing, a long time
> before they get any where near use of domain XML - the latter only matters
> when you come to instantiate a VM on a particular host.

Nevertheless, your suggested Go struct would become a third
representation of virtual devices, on top of domxml and the
Mgmt-canonical one. Maybe I'm just overconservative. Let us ask
kubevirt-dev what would be their preferable form to consume this
suggested API.

>
> We could of coure have a convenience method which optionally generates
> a domain XML template from the output list of devices, if someone believes
> that's useful to standardize on, but I don't think the domain XML should
> be the core format format.
>
> I would also like this library to usable for scenarios in which libvirt
> is not involved at all. One of the strange things about the QEMU driver
> in libvirt compared to the other hypervisor drivers is that it is missing
> an intermediate API layer. In other drivers the hypervisor platform itself
> provides a full management API layer, and libvirt merely maps the libvirt
> APIs to the underling mgmt API or data formats. IOW, libvirt is just a
> mapping layer.
>
> QEMU though only really provides a few low level building blocks, alongside
> other building blocks you have to pull in from Linux. It doesn't even provide
> a configuration file. Libvirt pulls all these pieces together to form the
> complete managment QEMU API, as well as mapping everything onto the libvirt
> domain XML & APIs. I think all there is scope & interest/demand to look at
> creating an intermediate layer that provides a full managment layer for
> QEMU, such that libvirt can eventually become just a mapping layer for
> QEMU. In such a scenario the libvirt-devaddr library is still very useful
> but you don't want it using the libvirt domain XML, as that's not likely
> to be the format in use.
>
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>





More information about the libvir-list mailing list