[virt-tools-list] libvirt profiles (a.k.a. virtuned) design ideas draft

Mon Jul 9 17:10:28 UTC 2018

On Mon, Jul 09, 2018 at 05:01:25PM +0300, Martin Kletzander wrote:
> On Thu, Jul 05, 2018 at 05:58:46PM +0100, Daniel P. Berrangé wrote:
> > On Tue, Jul 03, 2018 at 04:41:52PM -0400, Cole Robinson wrote:
> > > > ## Brief specification of functionality
> > > >
> > > > Currently virtuned aims to provide a consistent way of applying profiles to
> > > > libvirt VM definitions.  That way management applications don't need to
> > > > duplicate the implementation in their codebases.
> > > >
> > > > ### Functions
> > > >
> > > > As a starting point virtuned exposes one function.  As input the function
> > > > accepts a VM definition with the only restriction being that it is a libvirt
> > > > domain XML.  However it doesn't have to be complete.  The function applies all
> > > > relevant profiles to that XML and produces a complete libvirt domain XML.
> > > >
> > > > The outcome of this is twofold:
> > > > - Every libvirt domain XML is already working virtuned XML.
> > > > - Applications can select, by arbitrarily small steps, how much functionality
> > > >   they want to use from virtuned.
> > 
> > I'm not sure I understand this second point. IIUC, the contents of the profiles
> > are supposed to be opaque to the mgmt application. So while they use virtuned,
> > they'll be exposed to whatever arbitrary XML the profile contains, whether
> > they understand it or not.
> > 
> 
> Why would they need to be opaque to the mgmt app?  Either you are using some of
> profiles that are shipped with it (in which case the mgmt app developers should
> know what they are using in the code) or the mgmt app can construct their own
> profile to be used in which case it should know what it is asking for.

In previous discussions on this topic is was suggested that the selling point
for profiles was to allow new features to be enabled in multiple mgmt apps
without having to add support to each mgmt app to format XML, potentially with
the end user providing arbitrary profiles. This implies that the mgmt app
considers the profile contents to be opaque. Based on your answer though, it
seems this is not in fact a goal.

Not allowing arbitrary black-box profiles would indeed be my preference,
since I don't think it is practical to support it in the real world given
the complex interactions that will fall out of that.

> > > > ### API endpoints ###
> > > >
> > > > For now the API will be exposed as:
> > > >
> > > > 1. Python module - trivial if we're basing it on virt-manager codebase which is
> > > >    using python
> > 
> > What's the key reasons/benefit to be part of virt-manager codebase as opposed
> > to a standalone project ?
> > 
> 
> Few things:
> 
> 1) The XMLBuilder makes it easier to work with the XML, particularly the domain
>    XML.  This is not that big of a deal since libvirt-go-xml does a good job of
>    that as well
> 
> 2) There is an existing logic for "intermediate" devices.  By that I mean the
>    devices that are needed to add the requested one.  For example when
>    requesting an addition of a SATA disk, there is already a logic that figures
>    out if there is an existing SATA controller with a free slot and adds one if
>    there is not.  The reason for this is that there might be some defaults
>    specified which affect the intermediate devices.
> 
> 3) The possibility of exposing virt-xml and virt-install in the future.  The
>    former would be used for making changes to the XML and the latter is
>    something that stateless mgmt apps would like to use (cockpit currently).

FWIW, great as virt-install is, if I was writing a new mgmt application, I'd
really use GNOME Boxes installer as the benchmark. Most importantly it is
able to fully automate the installation process from installer media, by
generating the requisite kickstart files from data provided by libosinfo.

> > > > The above example will request a video card with model QXL to exist in the VM
> > > > definition.  The precise outcome of this depends on the existing devices in the
> > > > VM definition:
> > > >
> > > > - **VM has no video device:** the XML snippet (`qxl` video card) will simply be
> > > >   added to the list of devices.
> > > > - **VM has video device with no model specified:** Just fill in the video model
> > > >   for the existing video card.
> > > > - **VM has video device with different model:** Add one more video device with
> > > >   the specified model since multiple video cards are perfectly fine.
> > > >
> > > > The above is very concrete example, but it can be very easily and efficiently
> > > > generalized for any `<add/>` sub-element.  The only information which is
> > > > required for said generalization is the knowledge of libvirt's domain XML
> > > > format.  This could be one of the reasons for virtuned to be spun off of
> > > > virt-manager's codebase (since most of that information is already there).  The
> > > > other option would be using
> > > > [libvirt-go-xml](https://libvirt.org/git/?p=libvirt-go-xml.git) as that should
> > > > have enough information for this as well <sup id='fn3'>[[3]](#fn3d)</sup>.
> > 
> > FYI, libvirt-go-xml should have 100% coverage of all XML constructs in the
> > libvirt schema. Any ommissions are entirely due to libvirt's own master XML
> > test files being incomplete. libvirt-go-xml unit tests check that it can
> > roundtrip all XML files in libvirt.git without data loss. I don't think any
> > other XML parser impl for libvirt has the same level of coverage, principally
> > because none of them do similar kind of testing to prove it.
> > 
> 
> Coverage is one thing, but another thing is the logic that is in XMLBuilder
> (even though it's not there for all the elements).  For example if there are
> different sub-elements allowed based on an attribute.  But even simpler,
> elements that cannot be duplicated, but in the struct it is saved in a list.  If
> that is not fully introspectable from the struct tags, then we will need to
> duplicate the code that already exists in virt-manager if this is a side
> project.

The way I've modelled things in Go is that when there is a type=XXXX attribute
that controls which sub-elements are permitted, I've created dedicated structs
for each sub-schema. In fact you never set any 'type' attribute - we generate
the type attribute based on which struct you've created for the child content.

> > > > Yet another simple profile can look like this:
> > > > ``` xml
> > > > <profile name='some-interesting-things'>
> > > >   <add>
> > > >     <iothreads>2</iothreads>
> > > >   </add>
> > > >   <add>
> > > >     <devices>
> > > >       <disk device='cdrom'>
> > > >     </devices>
> > > >   </add>
> > > >   <add multiple='yes'>
> > > >     <devices>
> > > >       <redirdev bus='usb' type='spicevmc'/>
> > > >       <redirdev bus='usb' type='spicevmc'/>
> > > >     </devices>
> > > >   </add>
> > > >   <remove type='hard'>
> > > >     <features>
> > > >       <apic/>
> > > >     </features>
> > > >   </remove>
> > > >   <defaults>
> > > >     <devices>
> > > >       <interface>
> > > >         <model type='virtio'/>
> > > >       </interface>
> > > >     <devices>
> > > >   </defaults>
> > > > </profile>
> > > > ```
> > 
> > This is where I really start to get very concerned. The examples you're giving
> > a nice and simple, so composition of arbitrary profiles, together with application
> > written XML looks like it'll work.
> > 
> > I think it will be all too easy, however, to write profiles where the result of
> > composition profiles and merging with app XML is an XML document that is
> > semantically invalid / unrunnable.
> > 
> > Consider if you have a two profiles, one sets up a XML doc with 'pc' machine
> > type and other profile sets up an XML doc with 'q35' machine type.
> > 
> > Now a third profile wants to setup NUMA for the guest such that PCI devices
> > are associated with NUMA nodes. The way you do this is very different for
> > 'pc' and 'q35' machine types due to PCI vs PCI-Express topology changes.
> > So if the 'numa' profile assumes 'pc' it will break if the app composes it
> > with the 'q35' profile, or vica-verca.
> > 
> > Now consider you have a 'networking-nfv' profile that is supposed to setup
> > NICs in a way that is optimized for NFV use cases. This profile now needs
> > to know if it should put the NICs in the default PCI bus, or in the NUMA
> > specific PCI bus. So the result may or may not do the right thing if you
> > compose it with the 'numa' profile.
> > 
> > Solving these problems would require a combinatorial expansion in the
> > number of profiles. eg a numa-pc, numa-q35 profile, and then a
> > networking-nfv-pc, networking-nfv-q46, networking-nfv-numa-pc, and
> > networking-nfv-numa-q35 profiles. There would then have to be dependancies
> > expressed to tell the app which profiles can be composed with each other.
> > 
> 
> So this is how tuned does it and I didn't really like the way the matrix
> explodes with added dimensions.

At least with tuned I think the range of profiles is probably fairly
small, since there's only so many tunables that are going to be
relevant. With the domain XML, our schema is huge, so I could easily
imagine getting into high double-figures number of profiles. So this
will explode the matrix way worse than seen with tuned.

> > This still only solves the problem of composing profiles, and does not
> > consider how to merge with the application defined XML parts. The only
> > way an application can know if the XML it wants to write, is compatible
> > with the profiles it has used, is if it parses and understands all the
> > parts of the profile.
> > 
> 
> I hear what you are saying, but I don't see why the app would need to parse the
> profiles.  There can be conditions in profiles (proposed in open questions) that
> would eliminated the need for multiple profiles for the same thing.  Yes, DSL
> would be better for this.  We could just right away use what "xq" provides (see
> open questions).  That would also solve erroring out.

My point touches slightly in the possible misunderstanding I mention above
about the scope wrt allowing end user blackbox profiles to be provided.

> 
> > If something was used in the profile that the app doesn't know about,
> > it could ignore it, but the resulting VM config may well be unrunnable,
> > or worse, runnable but doing something completely inappropriate.
> > 
> > 
> > I think these kind of problems are inherant in any approach which allows
> > arbitrary user defined XML as the schema for the profiles.
> > 
> > This is one of reasons why libosinfo didn't base the information it
> > provides around the libvirt XML schema. Instead it defines its own
> > domain specific language, and applications only use the features in
> > it that they actually know how to handle.
> > 
> > This means if we add some new concept to libosinfo database, applications
> > are not going to automagically use it, and instead have to add explicit
> > support. As above though, I think this is inevitable, because it is too
> > easy to create unrunnable/nonsensical XML configs if you allow arbitrary
> > user specified XML inputs.
> > 
> 
> Thanks for the info with the NUMA locality example.  On one hand it would really
> save us a lot of work if we just used something that exists (by just extending
> it) and for DSL there is a solution we can use as well.  If not then we can
> build it from existing parts at least partially.

BTW, I meant to include this link to illustate the NUMA locality example:

  https://www.berrange.com/posts/2017/02/16/setting-up-a-nested-kvm-guest-for-developing-testing-pci-device-assignment-with-numa/

> > > I didn't really know where to cut in so this is a big comment...
> > > 
> > > The idea here is that virtuned will ship with something like a
> > > profile/add-qxl.xml, and profile=add-qxl will then effectively be part
> > > of the virtuned API, like an osinfo ID value is to libosinfo; the
> > > profile will never go away, so apps can depend on it being there.
> > > Presumably we can extend the profile as necessary as long as it
> > > accomplishes its stated goal and we confirm it doesn't break apps.
> > > 
> 
> Yes, we're probably going to need to version it as well.

Hmm, yes, versioning would be key for being able to reconstruct the
exact same machine each time, even after upgrades. That said, it would
be valid to declare that profiles need to be persisted at time of VM
creation, per VM. This is how openstack deals with its "flavour"
concept - at time of VM create we copy the data for the flavour, so
we always used the original values for life of that specific VM.

> > > Using XML for this kind of thing makes me nervous, trying to model
> > > conditional actions with XML. I feel like it's a real quick slippery
> > > slope to implementing a turing complete schema. For example how would we
> > > handle complex examples like:
> > > 
> 
> The idea to use XML was sparkled by two facts:
> 
> 1) Apps will be able to create their own profiles.
> 
> 2) Simple profiles (addition of few elements) could be created by just taking
>    the specific part of the domain XML and wrapping it in a tag that says what
>    to do (e.g. `<add><existing_xml_snippet/></add>`).

FWIW, I'm not opposed to using XML - I think it is valuable to be able
to use standardized tools for parsing / formatting / editor syntax
highligting etc. I'm just wary about using the Domain XML schema itself,
as opposed to a custom XML schema explicitly designed for this job. If
nothing else, we've got lots of stupid mistakes in our domain XML schema,
such as the way we litter CPU/NUMA related bits across 6 different places
in the schema, making it hard to understand wtf we're expressing.

> > > What's the motivation for doing this in XML? So apps or distros can drop
> > > in their own profiles? Or extend system profiles? I'm wondering why XML
> > > over privately implemented. Maybe you can explain some specific app
> > > usecases that motivated this? I feel like I missed a lot in the previous
> > > discussion
> > > 
> 
> You didn't miss much and you hit the two points nicely, dropping in own profiles
> and, possibly, extend existing ones.
> 
> > > Also do we expect the API to talk directly to libvirt? Like for checking
> > > domcapabilities?
> > 
> 
> For KubeVirt that wouldn't be that much of a help as they need to do bunch of
> these things without libvirt running.  Also not being dependent on libvirt makes
> it independent from the host.  Capabilities might be provided as another input,
> but question is whether it should be full blown libvirt (dom)capabilities.  The
> reason is that you might need to migrate between various nodes and the mgmt
> app/cluster knows the minimal requirements better than host-oriented daemon.

I don't think it is so clearcut for KubeVirt. It is entirely possible for them
to have a libvirtd spawned to be able to query the capabilities, independantly
of them launching the guest if this is a compelling benefit. It dalso depends
on exactly where in their code flow they'll slot in the usage and expansion
of profiles into full domain XML.

> > I tend to think writing the profiles is going to be more complex and
> > error prone than directly writing the XML, because of the composability
> > problems I mention above.
> > 
> > My gut feeling is that it would be a more tractable problem if the profiles
> > used a domain specific language (DSL), possibly still XML, but not libvirt
> > domain XML. Applications would have to explicitly know about individual
> > features in the DSL, but they could consume it in a way that the way they
> > generate libvirt XML is more fully data-driven.
> > 
> > ie, taking my example above, applications would need explicit knowledge
> > of machine types, NUMA topologies, and attaching devices to NUMA nodes.
> > Given that knowledge though, the decision about /when/ to use these
> > respective features would be data driven from profiles that simply
> > stated desired traits.
> > 
> 
> I lost you at the last paragraph.  Could you rephrase it or maybe give another
> example?  The idea is that mgmt app knows when it wants to use what profile.
> And what is provided as an API is the composition of the XML.  But you were
> probably addressing something else, right?  As I said, I lost you here.

This does back to the question of scope wrt whether profiles are blackboxes
that administrators can augment at will, or whether it is strictly limited
to stuff the application developer has decided to express. If it is the
latter, then it simplifies the process of expanding the profile to form
domain XML.

To be clear though, my thought was that if you have a DSL, you could say

  "Place guest on host node 0"

in the profile, and the application would have logic to turn that into
the domain XML that sets appropriate NUMA tunables in the various different
places, giving the application to customize them taking into account other
factors. For example, the app might have been told not to use host CPUs
0 and 1, as they're reserved for OS processes. It can use that knowledge
to filter out pinning to CPUs 0 and 1, and only pin to CPUs 2-3 in node.

If the profile is expressed in terms of domain XML, then the profile would
be encoding specific host CPU information, and the application would have
to parse the domain XML and modify all the places which list CPUs to
remove CPUs 0 and 1. So in that sense having the profile use domain XML
isn't really simplifying life for the app - it would have been easier to
just generate the domain XML from scratch rather than parse & modify
what was written in the profile to remove 2 CPUs.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|