[libvirt] RFC: Creating mediated devices with libvirt

Thu Jun 22 14:32:58 UTC 2017

On Thu, Jun 22, 2017 at 02:05:26PM +0200, Erik Skultety wrote:
> On Thu, Jun 22, 2017 at 10:41:13AM +0200, Martin Polednik wrote:
> > On 16/06/17 18:14 +0100, Daniel P. Berrange wrote:
> > > On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote:
> > > >
> > > > I'm fine with libvirt having APIs in the node device APIs to enable
> > > > create/delete with libvirt, as well as using managed=yes in the same
> > > > manner that we do for regular PCI devices (the bind/unbind to vfio
> > > > or pci-back)
> > >
> > > Oh, and we really need to fix the big missing feature in the node
> > > device APIs of persistent, inactive configs. eg we should be able
> > > to record XML configs of mdevs (and npiv devices too), in /etc/libvirt
> > > so they persist across reboots, and can be setup for auto-start on
> > > boot too.
> >
> > That doesn't help mdev in any way though. It doesn't make sense to
> > generate new UUID for given VM at each start. So in case of
> 
> What statement does this^^ refer to? Why would you generate a new UUID for a VM
> at each start, you'd generate it only once and then store it, the same way as
> domain UUIDs work.
> 
> > single host, the persistent file is redundant to the domain XML (as
> > long as uuid+parent is in the xml) and in case of cluster we'd have to
> 
> Right now you don't have any info about the parent device in the domain XML and
> such data would only exist in the XML if we all agreed on auto-creating mdevs,
> in which case persistent configs in nodedev would be unnecessary and vice-versa.
> 
> > copy all possible VM mdev definitions to all the hosts.
> 
> ^For mdev configs, you might be better off with creating them explicitly than
> copying configs, simply because given the information the XML has, you might
> conflict with UUIDs between hosts, so you'd have to take care for that. Parents
> have different PCI addresses that most probably wouldn't match across hosts, so
> from automation point of view, I think writing a stub recreating the whole set
> of devices/configs might actually be easier than copying & handling them
> (solely because the 2 things left - after the ones I mentioned - in the XML are
> the vgpu type and IOMMU group number which AFAIK cannot be requested explicitly).

Yep, separately the mdev config from the domain config is a significant
benefit as it makes the domain config independant of the particular device
you've attached to which can vary across hosts.

> > The idea works nicely if you had such definitions accessible in the
> > cluster and could define a group of devices (gpu+soundcard, single
> > mdev, single vf, ...) that would later be assigned to a VM (let's hope
> > kubevirt can get there).
> >
> > As for automatic creation, I think it's on the "nice to have" level.
> > So far libvirt is close to useless when working with mdevs as all the
> > data is in the same sysfs place where create/delete endpoints are - as
> > mentioned earlier, we can just get the data and do everything directly
> > from there instead of dealing with XML and bunch of new API calls.
> > Having at least some *configurable* auto create policy might add some
> 
> ^this is the thing we constantly keep discussing as everyone has a slightly
> different angle of view - libvirt does not implement any kind of policy,
> therefore the only "configuration" would be the PCI parent placement - you say
> what to do and we do it, no logic in it, that's it. Now, I don't understand
> taking care of the guesswork for the user in the simplest manner possible as
> policy rather as a mere convenience, be it just for developers and testers, but
> even that might apparently be perceived as a policy and therefore unacceptable.
> 
> I still stand by idea of having auto-creation as unfortunately, I sort of still
> fail to understand what the negative implications of having it are - is that it
> would get just unnecessarily too complex to maintain in the future that we would
> regret it or that we'd get a huge amount of follow-up requests for extending the
> feature or is it just that simply the interpretation of auto-create == policy?

The increasing complexity of the qemu driver is a significant concern with
adding policy based logic to the code. THinking about this though, if we
provide the inactive node device feature, then we can avoid essentially
all new code and complexity QEMU driver, and still support auto-create.

ie, in the domain XML we just continue to have the exact same XML that
we already have today for mdevs, but with a single new attribute
autocreate=yes|no

  <devices>
    <hostdev mode='subsystem' type='mdev' model='vfio-pci' autocreate="yes">
    <source>
      <address uuid='c2177883-f1bb-47f0-914d-32a22e3a8804'>
    </source>
    </hostdev>
  </devices>

In the QEMU driver, then the only change required is

   if (def->autocreate)
       virNodeDeviceCreate(dev)

and the opposite in shutdown. This avoids pulling all the node device
XML schema into the domain XML schema too which is something I dislike
about the previous proposals too.

The inactive node device concept is also more broadly useful than just
this mdev scenario - its been something we would have liked for NPIV
in the past too, and gives users a nice way to have a set of mdevs
precreated on nodes independantly of VM usage, so solves multiple use
cases / scenarios at once.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|