[libvirt PATCH] nodedev: wait a bit longer for new node devices

Erik Skultety eskultet at redhat.com
Thu Aug 25 07:06:27 UTC 2022


On Wed, Aug 24, 2022 at 06:14:51PM +0100, Daniel P. Berrangé wrote:
> On Wed, Aug 24, 2022 at 10:56:22AM -0500, Jonathon Jongsma wrote:
> > On 8/24/22 2:09 AM, Erik Skultety wrote:
> > > On Tue, Aug 23, 2022 at 12:43:03PM -0500, Jonathon Jongsma wrote:
> > > > Openstack developers reported that newly-created mdevs were not
> > > > recognized by libvirt until after a libvirt daemon restart. The source
> > > > of the problem appears to be that when libvirt gets the udev 'add'
> > > > event, the sysfs tree for that device might not be ready and so libvirt
> > > > waits 100ms for it to appear (max 100 waits of 1ms each). But in the
> > > > OpenStack environment, the sysfs tree for new mediated devices was
> > > > taking closer to 250ms to appear and therefore libvirt gave up waiting
> > > > and didn't add these new devices to its list of nodedevs.
> > > > 
> > > > By changing the wait time to 1 second (max 100 waits of 10ms each), this
> > > > should provide enough time to enable these deployments to recognize
> > > > newly-created mediated devices, but it shouldn't increase the delay for
> > > > more traditional deployments too much.
> > > > 
> > > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2109450
> > > > 
> > > > Signed-off-by: Jonathon Jongsma <jjongsma at redhat.com>
> > > > ---
> > > > 
> > > > Alternatively, we could switch to triggering off of the udev 'bind' event
> > > > rather than the 'add' event, but I wasn't able to convince myself that this
> > > > would result in 100% compatible behavior, so this felt like the safest
> > > > solution. If others can convince me that switching to 'bind' is safe, I can
> > > > re-submit this patch.
> > > 
> > > Is there a guarantee that the filesystem tree is ready by the time the event
> > > arrives? I remember back in the day when I implemented this, this was even
> > > discussed on the kernel list and the outcome was that each application needs to
> > > sort this out on its own hinting that at least at that time there wasn't
> > > any other way to do this reliably? Has something changed in the meantime?
> > > 
> > > Erik
> > > 
> > 
> > I'm afraid I don't actually know if anything has changed in the kernel in
> > this area. That's basically the reason that I proposed the approach that I
> > did. But I do know that in the bug referenced, the 'bind' event comes about
> > 250ms later than the 'add' event. I'm not sure if the filesystem tree is
> > necessarily ready on 'bind', but the fact that it is 250ms later means that,
> > at minimum, there's a significantly better chance that it is ready by that
> > point than at the time of 'add'.
> 
> Looking at the commit message for 'bind' events
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1455cf8dbfd0
> 
> [quote]
> Also, many drivers create additional driver-specific device attributes
> when binding to the device, to provide userspace with additional controls.
> The new events allow userspace to adjust these driver-specific attributes
> without worrying that they are not there yet.
> [/quote]
> 
> My reading of this is that we should NOT assume sysfs files exist
> until we've seen the 'bind' event.
> 
> IOW, the reason we have this problem with sysfs is precisely because
> we hooked into 'add' instead of 'bind'.  The sleep papers over the
> mistake, while using 'bind' instead would address the root cause.

>From the quote you posted I don't see how we're guaranteed that the sysfs tree
is ready and it looks like each driver has too much room for interpreting
'bind' whichever way they want. That said, I agree that switching to listening
to 'bind' instead of 'add' is an improvement, however we still may need to have
a timeout in place anyway as I still don't see any guarantees from the NVIDIA
driver that the sysfs tree will be ready by then.


> 
> NB, if we use 'bind' we need to double check all our supported
> platforms are targetting kerenl >= 4.14 already. 

They are, the oldest we support is 4.18 on AlmaLinux 8.

Erik



More information about the libvir-list mailing list