[libvirt] [PATCH v5 3/3] libvirtd: fix crash on termination

Daniel P. Berrange berrange at redhat.com
Fri Dec 22 08:55:18 UTC 2017

On Fri, Dec 22, 2017 at 10:52:45AM +0300, Nikolay Shirokovskiy wrote:
> On 21.12.2017 15:14, Erik Skultety wrote:
> > On Thu, Dec 21, 2017 at 12:48:44PM +0300, Nikolay Shirokovskiy wrote:
> >>
> >>
> >> On 21.12.2017 11:49, Erik Skultety wrote:
> >>> On Thu, Dec 21, 2017 at 09:39:16AM +0100, Cedric Bosdonnat wrote:
> >>>> Hi John,
> >>>>
> >>>> On Wed, 2017-12-20 at 16:52 -0500, John Ferlan wrote:
> >>>>> So I moved the Unref(dmn) after virStateCleanup... That of course led to
> >>>>> the virt-manager issue.  So I figured I'd give adding this patch in and
> >>>>> that then fixed the virt-manager crash (obviously as we'd be Unref'ing
> >>>>> all those servers we added).
> >>>>
> >>>> Seems like I forgot to git-send my patch yesterday evening: here it is:
> >>>>
> >>>> https://www.redhat.com/archives/libvir-list/2017-December/msg00739.html
> >>>>
> >>>> That fixes the virt-manager crash at least.
> >>>
> >>> Hi, so the patch looks good, at first glance it makes complete sense to close
> >>> the client connections when closing the server and unreffing the objects when
> >>> disposing of the server object. However, since this and the other related
> >>> issues have been bugging us for quite some time I'm not going to say ACK just
> >>> yet, I'd like to think about it for a bit - but if someone else already has and
> >>> they feel brave enough :P, then I'm not going to block this to go in.
> >>>
> >>> Erik
> >>>
> >>
> >> Patch looks good to me too. But still original "libvirtd: fix crash on termination"
> >> fixes another issue and if applied fixes "virt-manager issue" as well as John
> >> figured out.
> > 
> > Finally I'm back on track with this (sorry for it taking so long), although
> > you're right about this, it's not the correct fix, it's just a byproduct of
> > your patch, however, the whole thing about closing connections and releasing
> > memory is a bit of a mess. For example, right now, we only dispose of service
> > objs (but we don't close them, we do that in virNetServerClose), but we both
> > close and dispose of the client objs. Another thing, we toggle the service to
> > stop accepting any new connections, but that's irrelevant at that point because
> > we've closed them already by that time as a product of calling
> > virNetDaemonClose->virNetServerClose->virNetServerServiceClose - so that
> > should be removed...I drifted a bit here, anyway, we need to make a clear
> > distinction when we're releasing memory and when we're shutting down some kind
> > of service - sometimes both can be done at the same time (see below), however
> > it's very clear we can't do it here. Your issue is that Close currently only
> > closes the sockets (this reflects my point of "shutting down a service") but
> > does nothing with the threadpool connected to the server, thus leaving the
> > worker threads to continue running and executing APIs the results of which they
> > shouldn't even be able to return back to the client, since we're shutting down.
> To me it is not the first order problem. On daemon shutdown clients will anyway
> most probably receive some kind of error. 
> But we should finish worker threads before we cleanup drivers.

Yes, that is absolutely critical. With current situation where we leave clients
connected & worker threads running, while we cleanup drivers, we're pretty much
guaranteed a tonne of bizarre crashes.

For killing clients we should simply make virNetServerClose() tear down all
client connections, at the same time as it tears down listener sockets. There
is no point separating those actions, as they're conceptually related.

> > 1) split virThreadPoolFree into 2 functions, one which only broadcasts the
> > "die" message and joins the threads (or waits for them in this case...) and the
> > other releasing the resources - can't say I'm a fan of this one
> > 2) call virThreadPoolFree from virNetServerClose instead...
> > 
> > None of those approaches is ideal, but I can't seem to think off anything
> > better at the moment.

Whether the workers threads are running or not is actually not the key
issue here.

What causes the crashes is whether the worker threads are processing jobs
or not.  IOW, all we need todo is first purge all queued jobs, and then
wait for the threads to finish any jobs they already had. The threads
themselves can stay running. This is a conceptually useful operation beyond
just the shutdown scenario.

IOW, we should create  a  virThreadPoolDrain() method to purge all jobs
and wait for idle.

|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

More information about the libvir-list mailing list