[libvirt] [PATCH v5 3/3] libvirtd: fix crash on termination

Martin Kletzander mkletzan at redhat.com
Tue Nov 7 09:46:45 UTC 2017


On Mon, Nov 06, 2017 at 03:20:13PM -0500, John Ferlan wrote:
>
>
>On 11/03/2017 05:20 AM, Nikolay Shirokovskiy wrote:
>>
>>
>> On 03.11.2017 11:42, Martin Kletzander wrote:
>>> On Fri, Nov 03, 2017 at 11:07:36AM +0300, Nikolay Shirokovskiy wrote:
>>>> On 02.11.2017 19:32, Martin Kletzander wrote:
>>>>> This just makes the window of opportunity (between daemonServerClose()
>>>>> and the actual removal of the virNetServerPtr from the hash) smaller.
>>>>> That's why I don't see it as a fix, rather as a workaround.
>>>>>
>>>>> What I think should be done is the order of what's done with services,
>>>>> servers, drivers, daemon, workers, etc.
>>>>>
>>>>> I read the other threds and I think we're on the same note here, it's
>>>>> just that there is lot of confusion and we're all approaching it
>>>>> differently.
>
>Trying to catch up, but this thread has gone beyond where I was
>previously. I still have to push the first two patches of this series...
>
>Yes, libvirtd cleanup path is awful and there's quite a few "issues"
>that have cropped up because previous patches didn't take care of
>refcnt'ing properly and just "added" cleanup "somewhere" in the list of
>things to be cleaned up... Fortunately to a degree I think this is still
>all edge condition type stuff, but getting it right would be good...
>
>For me theory has always been to cleanup in the inverse order that
>something was created - for sure libvirtd doesn't follow that at all.
>For libvirtd the shutdown/cleanup logic just seemed too haphazard and
>fixing never made it to the top of the list of things to think about.
>
>Another "caveat" in libvirtd cleanup logic is that driversInitialized is
>set as part of a thread that could be running or have run if the code
>enters the cleanup: path as a result of virNetlinkEventServiceStart
>failure before we enter the event loop (e.g. virNetDaemonRun).
>
>Still like Erik - I do like Martin's suggestion and would like to see it
>as a formal patch - although perhaps with a few tweaks.
>
>IIRC, this particular patch "worked" because by removing the
>dmn->servers refcnt earlier made sure that we didn't have to wait for
>virObjectUnref(dmn) to occur after virStateCleanup.
>

Would it be to daring to ask you guys to send one series that includes all the
related patchsets?  We can then discuss there, it would be easier to follow as
well.  I don't want anyone to do any extra work, but we might save some review
bandwidth by resending those patches in one series.  Or maybe it's just me and
I have a hard time keeping up with everything at the same time =)

>>>>>
>>>>> So first let's agree on what should be done when virNetDaemonClose() is
>>>>> called and the loop in virNetDaemonRun() exits.  My opinion it is:
>>>>>
>>>>> 1) Stop accepting new calls and services:
>>>>>    virNetServerServiceToggle(false);
>>>>
>>>> This not really necessary as this has effect only if loop is running.
>>>>
>>>
>>> Sure, good point.
>>>
>>>>>
>>>>> 2) Wait for all the workers to finish:
>>>>>    First part of virThreadPoolFree()
>>>>
>>>> I was thinking of splitting this function to finish/free too.
>>>> After finish it is not really usable because there are no
>>>> more threads in pool anymore so we have to introduce state
>>>> finished and check it in every thread pool api function to
>>>> be safe. So this will introduce a bit of complexity only
>>>> for the sake of some scheme IMO.
>>>>
>>>
>>> Yeah, it's similar to the daemon when you are between close and dispose, but for
>>> the daemon we need it.
>>>
>>> What function do you think we would need to check this for?  The finish would
>>> end all the threads and since the event loop is not running no new jobs could be
>>> posted.
>>
>> virThreadPoolSendJob is already good in this respect as it checks @quit at the beginning.
>> Looks like we need only to add same check to virThreadPoolSetParameters. So
>> not so much complexity.
>>
>> I can imagine thread pool function being called from another
>> thread pools thread. AFAIK there are not such situation though. So just to be on a safe side.
>>
>>>
>>>>>
>>>>> 3) Kill off clients:
>>>>>    virNetServerClientClose()
>>>>>
>>>>> 4) Clean up drivers:
>>>>>    virStateCleanup()
>>>>
>>>> I think it is better for net daemon/servers not to know about
>>>> hypervisor drivers directly.
>>>>
>>>
>>> Sure, I was only talking about the libvirtd for the sake of simplicity.
>>>
>>>>>
>>>>> 5) Clean up services, servers, daemon, etc.
>>>>>    including the second part of virThreadPoolFree()
>>>>>
>>>>> The last thing is what should be done in virNetDaemonDispose(), but
>>>>> unfortunately we (mis)use it for more than that.  Numbers 1-4 should be,
>>>>> IMO in virNet{Daemon,Server}Close().
>>>>>
>>>>> Would this solve your problem?  I mean I'm not against more reference
>>>>> counting, it should be done properly, but I think the above would do the
>>>>> trick and also make sense from the naming point of view.
>>>>>
>>>>> Since both the daemon and server will probably not ever be consistent
>>>>> after calling the Close function, it might make sense to just include
>>>>> virNetDaemonClose() into virNetDaemonDispose() and I'm not against that.
>>>>> But the order of the steps should still be as described above IMO.
>>>>
>>>> This won't work IMO. If we are already in virNetDaemonDispose then
>>>> @dmn object is invalid but we don't yet call virStateCleanup yet
>>>> (if it is moved to virNetDaemonClose) and this is the problem 85c3a182 dealt with.
>>>>
>>>
>>> It would solve it if order above is followed.  Sure, virNetDaemon should not
>>
>> It is ok if virStateCleanup is called from virNetDaemonClose somehow. I was
>> only against calling virNetDaemonClose from virNetDaemonDispose in this case.
>>
>>> know about any drivers, but we can simply register a cleanup callback for the
>>> daemon which the Dispose function will call after the thread pools are cleaned
>>> up (just thought of that now).
>>
>> Well to me it is too generic for this case. We can code this order just
>> as order of statements.
>>
>> 1. close netservers (which imply finishing rpc thread pool)
>> 2. close hypervisor drivers
>> 3. unref @dmn
>>
>> Or we can use refcounts and take point 3 out of consideration.
>> Also there is a patch series [1] that introduces 0 step to help 1 step to finish actually
>> in the situation when loop is down.
>>
>> 0. shutdown hypervirsor drivers.
>>
>> [1] https://www.redhat.com/archives/libvir-list/2017-October/msg01134.html
>
>Right this still exists out there, but it's hard enough to focus on the
>issues being discussed here without adding more into the mix.  If we can
>get the shutdown order right as Martin has proposed, then address this
>afterwards, it'll perhaps make things easier.
>

Or just put it all in a series as hinted above ;)

>>>
>>>>>
>>>>> Let me know what you think.  And don't forget to have a nice day!
>>>>>
>>>>> Martin
>>>>>
>>>>> P.S.: The order of clean ups in libvirtd's main() is absolutely terrible and
>>>>>      unreadable.  Also, even though I guess it goes without saying, if proper
>>>>>      reference counting is in place, the order of virObjectUnref()s and
>>>>>      VIR_FREE()s should not matter.  I think it chould be changed like this:
>>>>
>>>> It makes sense that in cleanup section we can first make all cleanup and
>>>> then all dispose. It is only not clear to me can we call virStateCleanup
>>>> before virNetlinkShutdown. Also I would keep comment for @dmn unref
>>>> until proper refcount for @dmn is implemented.
>>>>
>>>
>>> I don't think virStateCleanup() and virNetlinkShutdown() are dependent.  The
>>> comment can stay, I was just trying to show that it's also a mess.  What I would
>>> like, actually, is to have the daemon running with all the references and
>>> pointers (that the main() function doesn't need anymore) freed, but both this
>>> and what I suggested below is unrelated to what we're trying to fix, I don't
>>> know why I started talking about it.
>
>
>So you cut off Martin's patch here, but if one essentially did inverse
>order of startup they'd get the following: (the // comments are my notes
>about startup order oddity).
>
>cleanup:
>    virNetDaemonClose(dmn);
>
>    virNetlinkEventServiceStopAll();
>
>    if (driversInitialized) {
>        /* Set via daemonRunStateInit in daemonStateInit */
>        driversInitialized = false;
>        virStateCleanup();
>    }
>
>    virObjectUnref(adminProgram);
>    virObjectUnref(srvAdm);
>    virObjectUnref(qemuProgram);
>    virObjectUnref(lxcProgram);
>    virObjectUnref(remoteProgram);
>
>    // FWIW: Not exact here, but the order of startup should be
>    // changed since srv is added to dmn, so realistically srv
>    // depends on dmn existence.
>    virObjectUnref(srv);

We can unref all of the above whenever we please.  @dmn keeps references for
everything it needs, we just have @srv in order to set some things up, we can
unref it right after we add it to @dmn.  The dependencies here are not the
concern thanks to (hopefully) proper refcnt'ing.

>    virObjectUnref(dmn);
>
>    virNetlinkShutdown();
>
>    // This one is also strange - seems startup should move the claiming
>    // to before run_dir setup after forking into background.
>    if (pid_file_fd != -1)
>        virPidFileReleasePath(pid_file, pid_file_fd);

During start we should quit immediately, so it should be done before forking
into background, otherwise init scripts will not know whether starting the
service failed or not.

>
>    VIR_FREE(run_dir);

Frees can be together, again, most of them could be freed before virNetDaemonRun
since all the string are just used to initialize stuff and I believe it's
strdup'd everywhere.

>
>    if (statuswrite != -1) {
>        if (ret != 0) {
>            /* Tell parent of daemon what failed */
>            char status = ret;
>            while (write(statuswrite, &status, 1) == -1 &&
>                   errno == EINTR)
>                ;
>        }
>        VIR_FORCE_CLOSE(statuswrite);
>    }
>
>    VIR_FREE(sock_file);
>    VIR_FREE(sock_file_ro);
>    VIR_FREE(sock_file_adm);
>
>    VIR_FREE(pid_file);
>
>    VIR_FREE(remote_config_file);
>    daemonConfigFree(config);
>
>
>Yes, this does differ from Martin's order...
>

Sure, no problem, I just quickly sorted the lines by hand, I just wanted to show
that the readability was bad as well.

>John
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20171107/0fa3e385/attachment-0001.sig>


More information about the libvir-list mailing list