[libvirt] [PATCH v4 4/7] udev: Convert udevEventHandleThread to an actual thread routine

Thu Oct 5 12:54:36 UTC 2017

[...]
> > +    while (1) {
> > +        virMutexLock(&privateData->lock);
> > +        while (privateData->nevents == 0 && !privateData->threadQuit) {
> > +            if (virCondWait(&privateData->threadCond, &privateData->lock)) {
> > +                virReportSystemError(errno, "%s",
> > +                                     _("handler failed to wait on condition"));
>
> Is a virMutexUnlock required before eventually calling virMutexDestroy...

It is, but even an unlock wouldn't be enough, since there's concurrent access,
even if everything goes smooth, the handle callback could be already waiting in
queue to lock the mutex, which is too an undefined behaviour. So I'll change it
so that the thread doesn't do any thread-local data cleanup (except for mutex
unlock) and I'm going to add the free callback to virEventAddHandle, so that the
data will be freed eventually.

The interesting thing about this is that we never actually check the errno when
we continue with the error path on virCondWait failure, most likely because
it's veeery unlikely that such a thing would happen. If you look at the errnos
it can return, you'll be able to see that a) none of the errors are really
applicable to our usecases, therefore almost impossible to happen and b) one of
them indicates that the mutex was still acquired, therefore needs to be
unlocked, while the other one indicates that the state is unrecoverable, so the
mutex couldn't have been acquired, but if you have a quick look throughout the
code, we always unlock in this case (which is possibly undefined). But I'll
stick with the unlock for the sake of consistency, you see how quickly the
problem changes to something else :).

>
> > +                goto cleanup;
> > +            }
> > +        }
> >
> > -    if (!udevEventCheckMonitorFD(udev_monitor, fd)) {
> > +        privateData->nevents--;
> > +        virMutexUnlock(&privateData->lock);
>
> If we get here, then either nevents > 0 || threadQuit == true, but we
> don't check for threadQuit here before the fetch/check of monitor_fd,
> e.g. the reason for threadQuit = true, so although the following
> udev_monitor check "works", the question thus becomes is it necessary if
> threadQuit == true?  I suppose it could be, but we could also jump to

You'd have to check for threadQuit==true everywhere to avoid any potential
unnecessary work. Also, threadQuit == true only when udevEventCheckMonitorFD
failed in udevEventHandleCallback, so when the thread invokes
udevEventCheckMonitorFD, it will fail equally, thus nothing else will be done
and that's because we can't recover from the previous udev monitor fd error.

> cleanup if threadQuit == true || !udevEventCheckMonitorFD
>
> > +
> > +        nodeDeviceLock();
> > +        udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
> > +
> > +        if (!udevEventCheckMonitorFD(udev_monitor, privateData->monitor_fd)) {
>
> This accesses privateData->monitor_fd without the mutex. So we don't
> have too many lock/unlock - consider a local @monitor_fd which is
> fetched while the lock is held.

True, but @monitor_fd is defacto a constant, noone is touching it since it's
defined during thread creation.

>
> > +            nodeDeviceUnlock();
> > +            goto cleanup;
> > +        }
> > +
> > +        device = udev_monitor_receive_device(udev_monitor);
> >          nodeDeviceUnlock();
> > -        return;
> > -    }
> >
> > -    device = udev_monitor_receive_device(udev_monitor);
> > -    nodeDeviceUnlock();
>
> [1]  could move the comment here since that's what I believe it's meant
> to describe...
>
> > +        if (!device) {
> > +            virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> > +                           _("udev_monitor_receive_device returned NULL"));
>
> Perhaps a VIR_WARN? Doesn't perhaps really matter, but it's not an error
> it's just a condition we didn't expect that we're continuing on...

That means that when the user sets global logging level to ERROR, and say
they're not seeing their devices to be updated the way they expect, they're not
going to see any signs of a potential failure, so I'm keeping it as an error.

>
>
> > +            goto next;
>
> This should just be a continue; instead of needing next... Not clear
> what happens if udev_device_unref(NULL) is called.

Very true :).

>
> > +        }
> > +
> > +        udevHandleOneDevice(device);
> >
> > -    if (!device) {
> > -        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> > -                       _("udev_monitor_receive_device returned NULL"));
> > -        return;
> > +    next:
> > +        udev_device_unref(device);
> >      }
> >
> > -    udevHandleOneDevice(device);
> > + cleanup:
> >      udev_device_unref(device);
>
> Should this be:
>
>     if (device)
>         udev_device_unref(device)
>
>
> I think the cleanups are obvious, so
>
> Reviewed-by: John Ferlan <jferlan at redhat.com>

Thanks, but there are more changes needed, so another version will be posted.

>
> John
>
> > +    udevEventThreadDataFree(privateData);
> > +    return;
> >  }
> >
> >
> > @@ -1678,20 +1748,29 @@ static void
> >  udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
> >                          int fd,
> >                          int events ATTRIBUTE_UNUSED,
> > -                        void *data ATTRIBUTE_UNUSED)
> > +                        void *opaque)
> >  {
> >      struct udev_monitor *udev_monitor = NULL;
> > +    udevEventThreadDataPtr threadData = opaque;
> >
> >      nodeDeviceLock();
> >      udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
> >
> >      if (!udevEventCheckMonitorFD(udev_monitor, fd)) {
> > +        virMutexLock(&threadData->lock);
> > +        threadData->threadQuit = true;
> > +        virCondSignal(&threadData->threadCond);
> > +        virMutexUnlock(&threadData->lock);
> > +
> >          nodeDeviceUnlock();
> >          return;
> >      }
> >      nodeDeviceUnlock();
> >
> > -    udevEventHandleThread((void *)(intptr_t) fd);
> > +    virMutexLock(&threadData->lock);
> > +    threadData->nevents++;
> > +    virCondSignal(&threadData->threadCond);
> > +    virMutexUnlock(&threadData->lock);
> >  }
> >
> >
> > @@ -1818,6 +1897,9 @@ nodeStateInitialize(bool privileged,
> >  {
> >      udevPrivate *priv = NULL;
> >      struct udev *udev = NULL;
> > +    int monitor_fd = -1;
> > +    virThread th;
> > +    udevEventThreadDataPtr threadData = NULL;
> >
> >      if (VIR_ALLOC(priv) < 0)
> >          return -1;
> > @@ -1878,6 +1960,14 @@ nodeStateInitialize(bool privileged,
> >                                               128 * 1024 * 1024);
> >  #endif
> >
> > +    monitor_fd = udev_monitor_get_fd(priv->udev_monitor);
> > +    if (!(threadData = udevEventThreadDataNew(monitor_fd)) ||
> > +        virThreadCreate(&th, false, udevEventHandleThread, threadData) < 0) {
> > +        virReportSystemError(errno, "%s",
> > +                             _("failed to create udev handling thread"));
> > +        goto cleanup;
> > +    }
> > +
> >      /* We register the monitor with the event callback so we are
> >       * notified by udev of device changes before we enumerate existing
> >       * devices because libvirt will simply recreate the device if we
> > @@ -1886,9 +1976,8 @@ nodeStateInitialize(bool privileged,
> >       * enumeration.  The alternative is to register the callback after
> >       * we enumerate, in which case we will fail to create any devices
> >       * that appear while the enumeration is taking place.  */
> > -    priv->watch = virEventAddHandle(udev_monitor_get_fd(priv->udev_monitor),
> > -                                    VIR_EVENT_HANDLE_READABLE,
> > -                                    udevEventHandleCallback, NULL, NULL);
> > +    priv->watch = virEventAddHandle(monitor_fd, VIR_EVENT_HANDLE_READABLE,
> > +                                    udevEventHandleCallback, threadData, NULL);
> >      if (priv->watch == -1)
> >          goto unlock;
> >
> >