[libvirt PATCH v4 00/25] Add support for persistent mediated devices

Erik Skultety eskultet at redhat.com
Mon Mar 1 15:34:17 UTC 2021


On Wed, Feb 24, 2021 at 04:48:34PM -0600, Jonathon Jongsma wrote:
> On Mon, 22 Feb 2021 11:08:12 +0100
> Erik Skultety <eskultet at redhat.com> wrote:
> 
> > On Wed, Feb 03, 2021 at 11:38:44AM -0600, Jonathon Jongsma wrote:
> > > This patch series follows the previously-merged series which added
> > > support for transient mediated devices. This series expands mdev
> > > support to include persistent device definitions. Again, it relies
> > > on mdevctl as the backend.
> > > 
> > > It follows the common libvirt pattern of APIs by adding the
> > > following new APIs for node devices:
> > >     - virNodeDeviceDefineXML() - defines a persistent device
> > >     - virNodeDeviceUndefine() - undefines a persistent device
> > >     - virNodeDeviceCreate() - starts a previously-defined device  
> > 
> > So, I tested the patches on Friday and have a few notes:
> > - all of the driver implementations mentioned ^above need to pass an
> > error buffer to the respective mdevctl wrapper, because the generic
> > error "Unable to XZY mediated device" doesn't help at all. Kind of
> > like with QEMU errors, we need to extract it from an error buffer
> > (you already have input and output, so just add another one) --> this
> > relates to patches 14,18,21
> > 
> > - trying to undefine an active mdev reports an error in libvirt
> >     --> this is both inconsistent with the rest of libvirt's
> > subsystems and there's also no reason for it since mdevctl supports it
> >     --> we need to support transient devices as well  
> > 
> > - trying to define the following XML hangs for 5s and then fails:
> >     <device>
> >       <parent>pci_0000_06_00_0</parent>
> >       <driver>
> >         <name>vfio_mdev</name>
> >       </driver>
> >       <capability type='mdev'>
> >         <type id='nvidia-11'/>
> >         <uuid>d41787d2-2e0e-4021-8ed2-b6f1ef305a9f</uuid>
> >       </capability>
> >     </device>
> 
> I assume that you have a parent device that supports creating an mdev
> of this type? In other words, this was expected to succeed, right?

Yeah, there's is a real NVIDIA card backing it.

> 
> > 
> >     -> I debugged this on Friday evening and for some reason
> > driver->devs hash table of devices was NULL going through the
> > nodeDeviceFindNewMediatedDevice call, but I haven't managed to figure
> > out why it was NULL, listing devices worked just fine
> 
> So, I often get a 5s delay when trying to define a new mdev. But it
> always succeeds after the delay. 
> 
> The reason for the delay is that the device is generally not in our
> device list the first time we call nodeDeviceFindNewMediatedDevice().
> It's usually not in our device list because the device is only added
> to the driver's device list after we get a notification from mdevctl
> (via monitoring the mdevctl config directory) and then we re-query
> mdevctl for the list of defined devices. Because mdevctl is not
> blazingly fast[1], and because we do the querying in a separate thread,
> the new device has usually not been discovered when we first call
> nodeDeviceFindNewMediatedDevice(). So we sleep for 5s before trying
> again.

Yeah, I suspected that we didn't get notified quickly enough, but the weird
thing was that "driver->devs" was NULL, which cannot be, the list couldn't have
disappeared. Actually, what happened in my case was that even after 5s there
was nothing so the code assumed the event would never come and failed, no idea
why.



> 
> For udev devices, this 5s delay is usually not hit because
> nodeDeviceFindNewDevice() first calls 'udevadm settle' before
> checking the device list. This waits just long enough to ensure that
> pending udev events are handled. Unfortunately we don't have a similar
> "wait just long enough" command for mdevctl, so we almost always hit
> the 5s retry timeout. 
> 
> There are a couple of potential ways to avoid hitting this 5s timeout:
> 1. directly add a placeholder device to the driver's device list
> immediately after calling 'mdevctl define' so that it is guaranteed to
> be in the device list when we call nodeDeviceFindNewMediatedDevice().
> We can then update that device definition with additional info
> when mdevctl is eventually queried.

Hmm, ^this would work for define. Mdevctl has returned the UUID, so we have all
the information to start the device, so we don't care when the monitoring
thread wakes up. But we should hit the same issue with "create" right, because
we don't need to wait for anything.

> 2. don't wait 5s every time a device is not found. Instead, start with
> a small sleep timeout and increase it gradually until we hit the max
> timeout. In other words, instead of check; sleep(5); check; sleep(5);
> check; sleep(5); we could instead do something like: check; sleep(1);
> check; sleep(2); check; sleep(4); check; sleep(8)...

IIRC we do something similar when we're spawning a machine and waiting for an
mdev, because it takes time before the sysfs entry for mdev is created and
complete directory hierarchy is created for it, so libvirt has to wait for it.
We wait for several rounds and then we fail if a cap was hit.
Anyway, 5s is a long time, I can imagine this timing out in the management
layer easily.

> 
> 
> [1]$ time mdevctl list --defined
> 49bf2346-6747-4ad6-be5a-adc2f0a10b5c 0000:00:02.0 i915-GVTg_V5_8 manual
> 4fcd3666-e58a-4c63-969c-bd616a158c0d 0000:00:02.0 i915-GVTg_V5_8 manual
> 5c152919-3a34-4338-b7c9-532f73fa5440 0000:00:02.0 i915-GVTg_V5_4 manual
> 
> real	0m0.782s
> user	0m0.735s
> sys	0m0.056s
> 
> 
> 
> > 
> > - I also prepared several adjustments to how we define the mdevctl
> > wrappers + some test adjustments which I wanted to share with you,
> > but I haven't tested those thoroughly since I was debugging why that
> > XML couldn't be defined, I'll share it when we eliminate the
> > underlying problem -> if you're wondering why I didn't just put it
> > into the review, well, I figured sharing the code was more
> > descriptive than if I used words
> 
> Would you like me to wait for this before sending a new series?

No need, I won't provide more comments at this point. My changes can be applied
as a follow up patch/series anyway, so even if we don't make them part of your
series immediately, we can always apply it later.
I'll reply with the patch once I figure out why define isn't working for me
properly.

Regards,
Erik




More information about the libvir-list mailing list