[libvirt] [PATCH v5 3/3] libvirtd: fix crash on termination

Mon Nov 6 15:43:58 UTC 2017

[...]

> > So first let's agree on what should be done when virNetDaemonClose() is
> > called and the loop in virNetDaemonRun() exits.  My opinion it is:
> >
> > 1) Stop accepting new calls and services:
> >    virNetServerServiceToggle(false);
> >
> > 2) Wait for all the workers to finish:
> >    First part of virThreadPoolFree()
>
> This is a very tricky part which might get overly complex in order to be done
> properly. We currently don't allow the threads finish gracefully, IOW the
> connections have most likely been closed by the time the worker thread got to
> respond back to the client or the thread simply hung for some other reason in
> which case the daemon wouldn't even finish. So to give the threads a chance to
> finish properly, they need to be decommissioned while the event loop is still
> running otherwise there's no way for the worker to e.g. get an event back from
> QEMU's monitor, since there's noone to execute the appropriate callback for
> that anymore.

Actually, I was wrong here, qemuMonitorClose would wake the worker up with an
error, so it wouldn't be stuck indefinitely. However, I'd still prefer the
event loop to decommission the threads and do what Nikolay's qemuStateShutdown
proposal is doing, since eventloop knows about all of its handles, so when it's
finishing, it could simply run an appropriate cleanup callback on all of the
handles, thus closing the qemuMonitor as well.

> However, even if we did this and let's say we waited for a worker thread to
> finish its time-consuming job, you still have the problem I mentioned above
> with distinguishing really long time-consuming jobs from threads that are hung.
> So I'm not really sure whether any attempt to be terminating them gracefully is
> really worth the effort, given the circumstances, but at the same time I
> understand that the outcomes: the daemon might crash (it was terminating anyway)
> or it terminates properly but leaves a QEMU guest in an inconsistent way.
>
> To your idea of splitting virThreadPoolFree, since we most likely have to stop
> the threads within the event loop, thus broadcast 'quit' as well as wait for
> them, I don't see a point in decoupling the function to killing and joining
> threads and then doing a bunch of VIR_FREEs and vir(Mutex|Cond)Destroys, that's
> just scraping of some leftovers, I think it makes sense the way it's now, even
> though the naming is not ideal, I agree with that and we should come up with
> something else. But this is just a matter of taste and very unimportant, of
> course I don't have a strong argument about why the approach would be bad,
> because it's not, I just don't see it worth the effort.
>

Erik