[libvirt] [PATCH v4 5/7] nodedev: Disable/re-enable polling on the udev fd

Fri Sep 29 13:46:48 UTC 2017

On 09/28/2017 06:00 AM, Erik Skultety wrote:
> [...]
> 
>>>
>>>          nodeDeviceLock();
>>> +        priv = driver->privateData;
>>>          udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
>>>
>>>          if (!udevEventCheckMonitorFD(udev_monitor, privateData->monitor_fd)) {
>>> @@ -1725,6 +1727,9 @@ udevEventHandleThread(void *opaque)
>>>          device = udev_monitor_receive_device(udev_monitor);
>>>          nodeDeviceUnlock();
>>>
>>> +        /* Re-enable polling for new events on the @udev_monitor */
>>> +        virEventUpdateHandle(priv->watch, VIR_EVENT_HANDLE_READABLE);
>>> +
>>
>> I think this should only be done when privateData->nevents == 0?  If we
>> have multiple events to read, then calling virEventPollUpdateHandle,
>> (eventually) for every pass through the loop seems like a bit of
>> overkill especially if udevEventHandleCallback turns right around and
>> disables it again.
>>
>> Also fortunately there isn't more than one udev thread sending the
>> events since you access the priv->watch without the driver lock...
>>
>> Conversely perhaps we only disable if events > 1... Then again, how does
>> one get to 2 events queued if we're disabling as soon as we increment
> 
> Very good point, technically events would still get queued, we just wouldn't
> check and yes, we would process 1 event at a time. Not optimal, but if you look
> at the original code and compare it with this one performance-wise (and I hope
> I haven't missed anything that would render everything I write next a complete
> rubbish), the time complexity hasn't changed, the space complexity hasn't
> changed, what changed is code complexity which makes the code a bit slower due
> to the excessive locking and toggling the FD polling back and forth. So
> essentially you've got the same thing as you had before..but asynchronous.
> However, yes, the usage of @nevents is completely useless now (haven't realized
> that immediately, thanks) and a simple signalling should suffice.

Having it "slower" is necessarily bad ;-)  That gives some of the other
slower buggers a chance to fill in the details we need. Throwing the
control back to udev quicker could aid in that too.

> 
> So how could we make it faster though? I thought more about the idea you shared
> in one of the previous reviews, letting the thread actually pull all the data
> from the monitor, to which I IIRC replied something in the sense that the event
> counting mechanism wouldn't allow that and it would break. Okay, let's drop the
> event counting. What if we now let both the udev handler thread and the event
> loop "poll" the file descriptor, IOW let the event loop polling the monitor fd,
> thus invoking udevHandleCallback which would in turn keep signalling the handler
> thread that there are some data. The difference now in the handler thread would
> be that it wouldn't blindly trust the callback about the data, because of the
> scheduling issue, it would keep poking the monitor itself until it gets either
> EAGAIN or EWOULDBLOCK from recvmsg() called in libudev,  which would then be the
> signal to start waiting on the condition. The next an event appears, a signal
> from udevHandleCallback would finally have a meaning and wouldn't be ignored.

Is making it faster really a goal?  It's preferable that it works
consistently I think. The various errno possibilities and the desire to
avoid the "udev_monitor_receive_device returned NULL" message processing
because we had too many "cooks in the kitchen" trying to determine
whether a new device was really available or was this just another
notification for something we're already processing.

Also, not that I expect the udev code to change, but if a new errno is
added then we may have to keep up... Always a fear especially if we're
using the errno to dictate our algorithm.

> 
> This way, it's actually the handler thread who's 'the boss', as most of the
> signals from udevHandleCallback would be lost/NOPs.
> 
> Would you agree to such an approach?

At some point we get too fancy and think too hard about a problem
letting it consume us.  I think when I presented my idea - I wasn't
really understanding the details of how we consume the udev data. Now
that I have a bit more of a clue - perhaps the original idea wasn't so
good. If we receive multiple notifications that a device is ready to be
processed even though we could be processing it - how are we to truly
know whether we missed one or we really got it and udev was just
reminding us again.

I'm not against looking at a different mechanism - the question then
becomes from your perspective is it worth it?

John

> 
> Erik
> 
>> nevents? Wouldn't this totally obviate the need for nevents?
>>
>> I would think it would be overkill to disable/enable for just 1 event.
>> If multiple are queued, then yeah we're starting to get behind... Of
>> course that could effect the re-enable when events == 1 (or some
>> MAGIC_CONSTANT_NUMBER_THRESHOLD_TO_DISABLE_POLLING)
> 
> This magic threshold approach still wouldn't work because you don't know
> anything about the event at that point, you'd have to process them to actually
> find out whether that's a legitimate event or it's an already noted event that
> hasn't been processed yet, so it would still mess with your counters.
> 
>>
>> IIRC from the previous trip down this rabbit hole (I did briefly go back
>> and read the comments) that what you're trying to avoid is the following
>> message spamming the error logs because of a scheduling mishap... Thus
>> perhaps the following message could only be displayed if nevents > 1 and
>> nothing was found knowing that we have this "timing problem" between
>> udev, this thread, and the fd polling scanner of when a device should be
>> "found"...  and then reset nevents == 0 with the appropriate lock.
>>
>> Curious on your thoughts before an R-B/ACK though.
>>
>> John
>>
>>
>>>          if (!device) {
>>>              virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>>>                             _("udev_monitor_receive_device returned NULL"));
>>> @@ -1750,10 +1755,12 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
>>>                          int events ATTRIBUTE_UNUSED,
>>>                          void *opaque)
>>>  {
>>> +    udevPrivate *priv = NULL;
>>>      struct udev_monitor *udev_monitor = NULL;
>>>      udevEventThreadDataPtr threadData = opaque;
>>>
>>>      nodeDeviceLock();
>>> +    priv = driver->privateData;
>>>      udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
>>>
>>>      if (!udevEventCheckMonitorFD(udev_monitor, fd)) {
>>> @@ -1771,6 +1778,16 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
>>>      threadData->nevents++;
>>>      virCondSignal(&threadData->threadCond);
>>>      virMutexUnlock(&threadData->lock);
>>> +
>>> +    /* Due to scheduling, the eventloop might poll the udev monitor before the
>>> +     * handler thread actually removes the data from the socket, thus causing it
>>> +     * to spam errors about data not being ready yet, because
>>> +     * udevEventHandleCallback would be invoked multiple times incrementing the
>>> +     * counter of events queuing for a single event.
>>> +     * Therefore we need to disable polling on the fd until the thread actually
>>> +     * removes the data from the socket.
>>> +     */
>>> +    virEventUpdateHandle(priv->watch, 0);
>>>  }
>>>
>>>
>>>