[libvirt] [PATCH 5/5] nodedev: Protect every access to udev_monitor by locking the driver

Erik Skultety eskultet at redhat.com
Thu Aug 17 14:31:40 UTC 2017


On Tue, Aug 01, 2017 at 04:13:55PM -0400, John Ferlan wrote:
>
>
> On 07/26/2017 02:44 AM, Erik Skultety wrote:
> > Since @udev_monitor isn't immutable within the driver, we need to
> > protect every access to it by locking the driver, so that no spurious
> > action changing the pointer while one thread is actively using it might
> > occur. This patch just takes some precaution measures.
> >
> > Signed-off-by: Erik Skultety <eskultet at redhat.com>
> > ---
> >  src/node_device/node_device_udev.c | 16 ++++++++++++----
> >  1 file changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c
> > index 7ecb330df..0643a5845 100644
> > --- a/src/node_device/node_device_udev.c
> > +++ b/src/node_device/node_device_udev.c
> > @@ -1623,9 +1623,13 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
> >                          void *data ATTRIBUTE_UNUSED)
> >  {
> >      struct udev_device *device = NULL;
> > -    struct udev_monitor *udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
> > +    struct udev_monitor *udev_monitor = NULL;
> >      int udev_fd = -1;
> >
> > +    nodeDeviceLock();
> > +    if (!(udev_monitor = DRV_STATE_UDEV_MONITOR(driver)))
> > +        goto error;
> > +
>
> Originally I thought maybe we should provide some sort of error here,
> but I convinced myself perhaps not - although I wouldn't be opposed to a
> VIR_WARN or something...
>
> My second thought was should we add some sort of EventRemoveHandle type
> operation? But once we get the lock, the only way udev_monitor would be
> NULL is if nodeStateCleanup is run which should already have removed the
> handle, so probably not. But then it struck me, if that's the case, then
> the nodeDeviceLock could fail rather miserably because driver could be
> NULL... (e.g. virMutexLock(&driver->lock);)
>
> Still not quite clear how a path such as this could be triggered other
> than perhaps the last dying breaths of the daemon - unless something
> that the *unref does causes any events to be flushed. Maybe a "delayed"

It doesn't, libudev just closes the socket, and because we already removed the
event handle, the only possible entry point I see is when there's an event to
handle, but the daemon's going down and preemption causes the daemon to grab
the lock first, is that what you're referring to as 'delayed' perhaps?

> event - who knows. Still, if we wait on the lock, we could end up in a
> state where @driver doesn't exist or anything it was referencing has
> been freed...
>
> So I think we need :
>
>     if (!driver)
>         return;
>
> to follow how nodeStateCleanup will bail...
>
> then (considering my thoughts in patch 2)
>
> Add to node_device_udev.h (and use it in Cleanup too, so we don't have
> to create a local @priv variable):
>
> #define DRV_STATE_WATCH(ds) (((udevPrivate *)((ds)->privateData))->watch)
>
>
>     nodeDeviceLock();
>     if (!driver || !driver->privateData ||
>         DRV_STATE_WATCH(driver) == -1 ||
>         !(udev_monitor = DRV_STATE_UDEV_MONITOR(driver))) {
>         nodeDeviceUnlock();
>         return;
>     }

Anyhow, looking at ^this as the potential addition to our best effort to
terminate gracefully, I suddenly don't feel any motivation to pursue this
further, especially since both locking a non-existent mutex and destroying a
locked mutex results into an undefined behaviour according to the
documentation, so even despite our hard effort of checking whether the driver
still exists and whether it's internals are still intact, it still might go
haywire during termination, who knows. Therefore I opt for dropping patches 3
and 5 from the series as I no longer see it as worth the effort.

Thank you for your input.
Erik

>
> then the rest of the code.
>
> If you're "waiting" to get the lock while Cleanup is running, then
> remove the chance that something that Cleanup does could cause awful
> things to happen in this code.
>
> In cleanup, once nodeDeviceUnlock() is run, but before virMutexDestroy,
> we could grab the lock. That would cause virMutexDestroy
> (pthread_mutex_destroy) to fail, but we don't check that. That just
> becomes a side effect of using an event function and taking out the
> mutex I suppose.
>
> Oh and I should point out that it's possible that virMutexDestroy does
> run, thus our getting the lock does nothing, but one would think that
> the subsequent condition checks wouldn't be racing too hard against the
> VIR_FREE(driver) in the other thread.
>
> The impossible part of this race is that how does one tell if something
> is waiting on the mutex other than getting a failure when we destroy it,
> but since we don't manage failures for destroy, we have no way to be
> sure. Still in the long run, both threads are heading to a very quick
> death, so does it really matter.
>
> John
>
> (muttering to myself how much I hate lock races).
>
> >      udev_fd = udev_monitor_get_fd(udev_monitor);
> >      if (fd != udev_fd) {
> >          udevPrivate *priv = driver->privateData;
> > @@ -1640,22 +1644,26 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
> >           * the same error multiple times
> >           */
> >          virEventRemoveHandle(priv->watch);
> > -
> > -        goto cleanup;
> > +        goto error;
> >      }
> >
> >      device = udev_monitor_receive_device(udev_monitor);
> >      if (device == NULL) {
> >          virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> >                         _("udev_monitor_receive_device returned NULL"));
> > -        goto cleanup;
> > +        goto error;
> >      }
> >
> > +    nodeDeviceUnlock();
> >      udevHandleOneDevice(device);
> >
> >   cleanup:
> >      udev_device_unref(device);
> >      return;
> > +
> > + error:
> > +    nodeDeviceUnlock();
> > +    goto cleanup;
> >  }
> >
> >
> >




More information about the libvir-list mailing list