[libvirt] [PATCH] events: Fix domain event race on client disconnect
Daniel P. Berrange
berrange at redhat.com
Fri Sep 7 12:24:35 UTC 2012
On Thu, Sep 06, 2012 at 03:06:41PM +0200, Christophe Fergeau wrote:
> GNOME Boxes sometimes stops getting domain events from libvirtd, even
> after restarting it. Further investigation in libvirtd shows that
> events are properly queued with virDomainEventStateQueue, but the
> timer virDomainEventTimer which flushes the events and sends them to
> the clients never gets called. Looking at the event queue in gdb
> shows that it's non-empty and that its size increases with each new
> events.
>
> virDomainEventTimer is set up in virDomainEventStateRegister[ID]
> when going from 0 client connecte to 1 client connected, but is
> initially disabled. The timer is removed in
> virDomainEventStateRegister[ID] when the last client is disconnected
> (going from 1 client connected to 0).
>
> This timer (which handles sending the events to the clients) is
> enabled in virDomainEventStateQueue when queueing an event on an
> empty queue (queue containing 0 events). It's disabled in
> virDomainEventStateFlush after flushing the queue (ie removing all
> the elements from it). This way, no extra work is done when the queue
> is empty, and when the next event comes up, the timer will get
> reenabled because the queue will go from 0 event to 1 event, which
> triggers enabling the timer.
>
> However, with this Boxes bug, we have a client connected (Boxes), a
> non-empty queue (there are events waiting to be sent), but a disabled
> timer, so something went wrong.
>
> When Boxes connects (it's the only client connecting to the libvirtd
> instance I used for debugging), the event timer is not set as expected
> (state->timer == -1 when virDomainEventStateRegisterID is called),
> but at the same time the event queue is not empty. In other words,
> we had no clients connected, but pending events. This also explains
> why the timer never gets enabled as this is only done when an event
> is queued on an empty queue.
>
> I think this can happen if an event gets queued using
> virDomainEventStateQueue and the client disconnection happens before
> the event timer virDomainEventTimer gets a chance to run and flush
> the event. In this situation, virDomainEventStateDeregister[ID] will
> get called with a non-empty event queue, the timer will be destroyed
> if this was the only client connected. Then, when other clients connect
> at a later time, they will never get notified about domain events as
> the event timer will never get enabled because the timer is only
> enabled if the event queue is empty when virDomainEventStateRegister[ID]
> gets called, which will is no longer the case.
A nice long detailed explanation. I agree that this scenario you
outline is plausible as an explanation for why Boxes sometimes
stops getting events from libvirtd. It also explains why we don't
see it with VDSM - since they're a long running service, not a
frequently stoppped/started desktop app, they're much less likely
to hit the race.
> To avoid this issue, this commit makes sure to remove all events from
> the event queue when the last client in unregistered. As there is
> no longer anyone interested in receiving these events, these events
> are stale so there is no need to keep them around. A client connecting
> later will have no interest in getting events that happened before it
> got connected.
Yep, makes sense.
> ---
> src/conf/domain_event.c | 24 +++++++++++++++++++++---
> 1 file changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c
> index 43ecdcf..7ab973b 100644
> --- a/src/conf/domain_event.c
> +++ b/src/conf/domain_event.c
> @@ -525,13 +525,13 @@ void virDomainEventFree(virDomainEventPtr event)
> }
>
> /**
> - * virDomainEventQueueFree:
> + * virDomainEventQueueClear:
> * @queue: pointer to the queue
> *
> - * Free the memory in the queue. We process this like a list here
> + * Removes all elements from the queue
> */
> static void
> -virDomainEventQueueFree(virDomainEventQueuePtr queue)
> +virDomainEventQueueClear(virDomainEventQueuePtr queue)
> {
> int i;
> if (!queue)
> @@ -541,6 +541,22 @@ virDomainEventQueueFree(virDomainEventQueuePtr queue)
> virDomainEventFree(queue->events[i]);
> }
> VIR_FREE(queue->events);
> + queue->count = 0;
> +}
> +
> +/**
> + * virDomainEventQueueFree:
> + * @queue: pointer to the queue
> + *
> + * Free the memory in the queue. We process this like a list here
> + */
> +static void
> +virDomainEventQueueFree(virDomainEventQueuePtr queue)
> +{
> + if (!queue)
> + return;
> +
> + virDomainEventQueueClear(queue);
> VIR_FREE(queue);
> }
>
> @@ -1559,6 +1575,7 @@ virDomainEventStateDeregister(virConnectPtr conn,
> state->timer != -1) {
> virEventRemoveTimeout(state->timer);
> state->timer = -1;
> + virDomainEventQueueClear(state->queue);
> }
>
> virDomainEventStateUnlock(state);
> @@ -1596,6 +1613,7 @@ virDomainEventStateDeregisterID(virConnectPtr conn,
> state->timer != -1) {
> virEventRemoveTimeout(state->timer);
> state->timer = -1;
> + virDomainEventQueueClear(state->queue);
> }
>
> virDomainEventStateUnlock(state);
ACK
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the libvir-list
mailing list