[libvirt] RFC: Creating mediated devices with libvirt

Daniel P. Berrange berrange at redhat.com
Thu Jun 22 16:42:49 UTC 2017

On Thu, Jun 22, 2017 at 12:33:16PM -0400, Laine Stump wrote:
> On 06/22/2017 11:28 AM, Alex Williamson wrote:
> > On Thu, 22 Jun 2017 17:14:48 +0200
> > Erik Skultety <eskultet at redhat.com> wrote:
> > 
> >> [...]
> >>>>
> >>>> ^this is the thing we constantly keep discussing as everyone has a slightly
> >>>> different angle of view - libvirt does not implement any kind of policy,
> >>>> therefore the only "configuration" would be the PCI parent placement - you say
> >>>> what to do and we do it, no logic in it, that's it. Now, I don't understand
> >>>> taking care of the guesswork for the user in the simplest manner possible as
> >>>> policy rather as a mere convenience, be it just for developers and testers, but
> >>>> even that might apparently be perceived as a policy and therefore unacceptable.
> >>>>
> >>>> I still stand by idea of having auto-creation as unfortunately, I sort of still
> >>>> fail to understand what the negative implications of having it are - is that it
> >>>> would get just unnecessarily too complex to maintain in the future that we would
> >>>> regret it or that we'd get a huge amount of follow-up requests for extending the
> >>>> feature or is it just that simply the interpretation of auto-create == policy?  
> >>>
> >>> The increasing complexity of the qemu driver is a significant concern with
> >>> adding policy based logic to the code. THinking about this though, if we
> >>> provide the inactive node device feature, then we can avoid essentially
> >>> all new code and complexity QEMU driver, and still support auto-create.
> >>>
> >>> ie, in the domain XML we just continue to have the exact same XML that
> >>> we already have today for mdevs, but with a single new attribute
> >>> autocreate=yes|no
> >>>
> >>>   <devices>
> >>>     <hostdev mode='subsystem' type='mdev' model='vfio-pci' autocreate="yes">
> >>>     <source>
> >>>       <address uuid='c2177883-f1bb-47f0-914d-32a22e3a8804'>  
> >>
> >> So, just for clarification of the concept, the device with ^this UUID will have
> >> had to be defined by the nodedev API by the time we start to edit the domain
> >> XML in this manner in which case the only thing the autocreate=yes would do is
> >> to actually create the mdev according to the nodedev config, right? Continuing
> >> with that thought, if UUID doesn't refer to any of the inactive configs it will
> >> be an error I suppose? What about the fact that only one vgpu type can live on
> >> the GPU? even if you can successfully identify a device using the UUID in this
> >> way, you'll still face the problem, that other types might be currently
> >> occupying the GPU and need to be torn down first, will this be automated as
> >> well in what you suggest? I assume not.
> >>
> >>>     </source>
> >>>     </hostdev>
> >>>   </devices>
> >>>
> >>> In the QEMU driver, then the only change required is
> >>>
> >>>    if (def->autocreate)
> >>>        virNodeDeviceCreate(dev)  
> >>
> >> Aha, so if a device gets torn down on shutdown, we won't face the problem with
> >> some other devices being active, all of them will have to be in the inactive
> >> state because they got torn down during the last shutdown - that would work.
> > 
> > 
> > I'm not familiar with how inactive devices would be defined in the
> > nodedev API, would someone mind explaining or providing an example
> > please?  I don't understand where the metadata is stored that describes
> > the what and where of a given UUID.  Thanks,
> You don't understand it because it doesn't exist yet :-)
> The idea is essentially the same that we've talked about, except that
> all the information about parent PCI address, desired type of child, and
> anything else (is there anything else?) is stored in some
> not-yet-specified persistent node device config rather than directly in
> the domain XML. Maybe something like:
>   <nodedevice>
>     <uuid>BobLobLaw</uuid>
>     <parent>
>       <address type='pci' .... />
>     </parent>
>     <child type='MoreBlah'/>
>   </nodedevice>
> I haven't thought about how it would show the difference between active
> and inactive - didn't get enough coffee today and I have a headache.

The XML doesn't need to show the difference between active & inactive.

That distinction is something you filter on when querying the list
of devices. We'd want to add  a virNodeDeviceIsActive() API like
we have for other objects too, so you can query it afterwards too.

> ... okay, another "shower thought" is coming in... One deficiency of
> this comes to mind - since the domain config references the device by
> uuid, and an existing child device's uuid can't be changed, the unique
> uuid used by a particular domain must be defined on all of the hosts
> that the domain might be moved to. And since other domains can't share
> that uuid (unless you're 100% sure they'll never be active at the same
> time), you won't be able to implement the alternate idea of "pre-create
> all the devices, then assign them to domains as needed"; instead, you'll
> be forced to use the "create-on-demand" model.

You can still pre-create them all, as you still have the option of
providing updated XML when you migrate VMs across hosts, so that
it refers to a different UUID on the target host.

Also, since you're actually starting them all at once, you can have
the option of precreating more vGPU definitions than you can actually
concurrently support - you're only limited when you go to start them.
Though you probably wouldn't want to do that beyond a certain scale - just
changing XML on migrate is simpler

|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

More information about the libvir-list mailing list