[libvirt] [PATCH 5/5] nodedev: Protect every access to udev_monitor by locking the driver

Tue Aug 1 20:13:55 UTC 2017

On 07/26/2017 02:44 AM, Erik Skultety wrote:
> Since @udev_monitor isn't immutable within the driver, we need to
> protect every access to it by locking the driver, so that no spurious
> action changing the pointer while one thread is actively using it might
> occur. This patch just takes some precaution measures.
> 
> Signed-off-by: Erik Skultety <eskultet at redhat.com>
> ---
>  src/node_device/node_device_udev.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c
> index 7ecb330df..0643a5845 100644
> --- a/src/node_device/node_device_udev.c
> +++ b/src/node_device/node_device_udev.c
> @@ -1623,9 +1623,13 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
>                          void *data ATTRIBUTE_UNUSED)
>  {
>      struct udev_device *device = NULL;
> -    struct udev_monitor *udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
> +    struct udev_monitor *udev_monitor = NULL;
>      int udev_fd = -1;
>  
> +    nodeDeviceLock();
> +    if (!(udev_monitor = DRV_STATE_UDEV_MONITOR(driver)))
> +        goto error;
> +

Originally I thought maybe we should provide some sort of error here,
but I convinced myself perhaps not - although I wouldn't be opposed to a
VIR_WARN or something...

My second thought was should we add some sort of EventRemoveHandle type
operation? But once we get the lock, the only way udev_monitor would be
NULL is if nodeStateCleanup is run which should already have removed the
handle, so probably not. But then it struck me, if that's the case, then
the nodeDeviceLock could fail rather miserably because driver could be
NULL... (e.g. virMutexLock(&driver->lock);)

Still not quite clear how a path such as this could be triggered other
than perhaps the last dying breaths of the daemon - unless something
that the *unref does causes any events to be flushed. Maybe a "delayed"
event - who knows. Still, if we wait on the lock, we could end up in a
state where @driver doesn't exist or anything it was referencing has
been freed...

So I think we need :

    if (!driver)
        return;

to follow how nodeStateCleanup will bail...

then (considering my thoughts in patch 2)

Add to node_device_udev.h (and use it in Cleanup too, so we don't have
to create a local @priv variable):

#define DRV_STATE_WATCH(ds) (((udevPrivate *)((ds)->privateData))->watch)

    nodeDeviceLock();
    if (!driver || !driver->privateData ||
        DRV_STATE_WATCH(driver) == -1 ||
        !(udev_monitor = DRV_STATE_UDEV_MONITOR(driver))) {
        nodeDeviceUnlock();
        return;
    }

then the rest of the code.

If you're "waiting" to get the lock while Cleanup is running, then
remove the chance that something that Cleanup does could cause awful
things to happen in this code.

In cleanup, once nodeDeviceUnlock() is run, but before virMutexDestroy,
we could grab the lock. That would cause virMutexDestroy
(pthread_mutex_destroy) to fail, but we don't check that. That just
becomes a side effect of using an event function and taking out the
mutex I suppose.

Oh and I should point out that it's possible that virMutexDestroy does
run, thus our getting the lock does nothing, but one would think that
the subsequent condition checks wouldn't be racing too hard against the
VIR_FREE(driver) in the other thread.

The impossible part of this race is that how does one tell if something
is waiting on the mutex other than getting a failure when we destroy it,
but since we don't manage failures for destroy, we have no way to be
sure. Still in the long run, both threads are heading to a very quick
death, so does it really matter.

John

(muttering to myself how much I hate lock races).

>      udev_fd = udev_monitor_get_fd(udev_monitor);
>      if (fd != udev_fd) {
>          udevPrivate *priv = driver->privateData;
> @@ -1640,22 +1644,26 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
>           * the same error multiple times
>           */
>          virEventRemoveHandle(priv->watch);
> -
> -        goto cleanup;
> +        goto error;
>      }
>  
>      device = udev_monitor_receive_device(udev_monitor);
>      if (device == NULL) {
>          virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>                         _("udev_monitor_receive_device returned NULL"));
> -        goto cleanup;
> +        goto error;
>      }
>  
> +    nodeDeviceUnlock();
>      udevHandleOneDevice(device);
>  
>   cleanup:
>      udev_device_unref(device);
>      return;
> +
> + error:
> +    nodeDeviceUnlock();
> +    goto cleanup;
>  }
>  
>  
>