[libvirt] RFC: Creating mediated devices with libvirt

Daniel P. Berrange berrange at redhat.com
Thu Jun 22 16:15:09 UTC 2017

On Thu, Jun 22, 2017 at 05:14:48PM +0200, Erik Skultety wrote:
> [...]
> > >
> > > ^this is the thing we constantly keep discussing as everyone has a slightly
> > > different angle of view - libvirt does not implement any kind of policy,
> > > therefore the only "configuration" would be the PCI parent placement - you say
> > > what to do and we do it, no logic in it, that's it. Now, I don't understand
> > > taking care of the guesswork for the user in the simplest manner possible as
> > > policy rather as a mere convenience, be it just for developers and testers, but
> > > even that might apparently be perceived as a policy and therefore unacceptable.
> > >
> > > I still stand by idea of having auto-creation as unfortunately, I sort of still
> > > fail to understand what the negative implications of having it are - is that it
> > > would get just unnecessarily too complex to maintain in the future that we would
> > > regret it or that we'd get a huge amount of follow-up requests for extending the
> > > feature or is it just that simply the interpretation of auto-create == policy?
> >
> > The increasing complexity of the qemu driver is a significant concern with
> > adding policy based logic to the code. THinking about this though, if we
> > provide the inactive node device feature, then we can avoid essentially
> > all new code and complexity QEMU driver, and still support auto-create.
> >
> > ie, in the domain XML we just continue to have the exact same XML that
> > we already have today for mdevs, but with a single new attribute
> > autocreate=yes|no
> >
> >   <devices>
> >     <hostdev mode='subsystem' type='mdev' model='vfio-pci' autocreate="yes">
> >     <source>
> >       <address uuid='c2177883-f1bb-47f0-914d-32a22e3a8804'>
> So, just for clarification of the concept, the device with ^this UUID will have
> had to be defined by the nodedev API by the time we start to edit the domain
> XML in this manner in which case the only thing the autocreate=yes would do is
> to actually create the mdev according to the nodedev config, right? Continuing
> with that thought, if UUID doesn't refer to any of the inactive configs it will
> be an error I suppose? What about the fact that only one vgpu type can live on
> the GPU? even if you can successfully identify a device using the UUID in this
> way, you'll still face the problem, that other types might be currently
> occupying the GPU and need to be torn down first, will this be automated as
> well in what you suggest? I assume not.

Technically we shouldn't need the node device to exist at the time we
define the XML - only at the time we start the guest, does the node
device have to exist. eg same way you list a virtual network as the
source of a guest NIC, but that virtual network doesn't have to actually
have been defined & started until the guest starts.

If there are constraints that a pGPU can only support a certain combination
of vGPUs at any single point in time, doesn't the kernel already  enforce
that when you try to create the vGPU in sysfs. IOW, we merely need to try
to create the vGPU, and if the kernel mdev driver doesn't allow you to mix
that with the other vGPUs that already exist, then we'd just report an
error from virNodeDeviceCreate, and that'd get propagated back as the
error for the virDomainCreate call.

> >     </source>
> >     </hostdev>
> >   </devices>
> >
> > In the QEMU driver, then the only change required is
> >
> >    if (def->autocreate)
> >        virNodeDeviceCreate(dev)
> Aha, so if a device gets torn down on shutdown, we won't face the problem with
> some other devices being active, all of them will have to be in the inactive
> state because they got torn down during the last shutdown - that would work.

I'm not sure what the relationship with other active devices is relevant
here. The virNodeDevicePtr we're accesing here is a single vGPU - if other
running guests have further vGPUs on the same pGPU, that's not really
relevant. Each vGPU is created/deleted as required.

|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

More information about the libvir-list mailing list