[libvirt] [PATCH v5 3/3] libvirtd: fix crash on termination

John Ferlan jferlan at redhat.com
Tue Nov 7 15:28:10 UTC 2017



On 11/07/2017 04:46 AM, Martin Kletzander wrote:
> On Mon, Nov 06, 2017 at 03:20:13PM -0500, John Ferlan wrote:
>>
>>
>> On 11/03/2017 05:20 AM, Nikolay Shirokovskiy wrote:
>>>
>>>
>>> On 03.11.2017 11:42, Martin Kletzander wrote:
>>>> On Fri, Nov 03, 2017 at 11:07:36AM +0300, Nikolay Shirokovskiy wrote:
>>>>> On 02.11.2017 19:32, Martin Kletzander wrote:
>>>>>> This just makes the window of opportunity (between
>>>>>> daemonServerClose()
>>>>>> and the actual removal of the virNetServerPtr from the hash) smaller.
>>>>>> That's why I don't see it as a fix, rather as a workaround.
>>>>>>
>>>>>> What I think should be done is the order of what's done with
>>>>>> services,
>>>>>> servers, drivers, daemon, workers, etc.
>>>>>>
>>>>>> I read the other threds and I think we're on the same note here, it's
>>>>>> just that there is lot of confusion and we're all approaching it
>>>>>> differently.
>>
>> Trying to catch up, but this thread has gone beyond where I was
>> previously. I still have to push the first two patches of this series...
>>
>> Yes, libvirtd cleanup path is awful and there's quite a few "issues"
>> that have cropped up because previous patches didn't take care of
>> refcnt'ing properly and just "added" cleanup "somewhere" in the list of
>> things to be cleaned up... Fortunately to a degree I think this is still
>> all edge condition type stuff, but getting it right would be good...
>>
>> For me theory has always been to cleanup in the inverse order that
>> something was created - for sure libvirtd doesn't follow that at all.
>> For libvirtd the shutdown/cleanup logic just seemed too haphazard and
>> fixing never made it to the top of the list of things to think about.
>>
>> Another "caveat" in libvirtd cleanup logic is that driversInitialized is
>> set as part of a thread that could be running or have run if the code
>> enters the cleanup: path as a result of virNetlinkEventServiceStart
>> failure before we enter the event loop (e.g. virNetDaemonRun).
>>
>> Still like Erik - I do like Martin's suggestion and would like to see it
>> as a formal patch - although perhaps with a few tweaks.
>>
>> IIRC, this particular patch "worked" because by removing the
>> dmn->servers refcnt earlier made sure that we didn't have to wait for
>> virObjectUnref(dmn) to occur after virStateCleanup.
>>
> 
> Would it be to daring to ask you guys to send one series that includes
> all the
> related patchsets?  We can then discuss there, it would be easier to
> follow as
> well.  I don't want anyone to do any extra work, but we might save some
> review
> bandwidth by resending those patches in one series.  Or maybe it's just
> me and
> I have a hard time keeping up with everything at the same time =)
> 

Originally "this" crash fix was mixed in with the other series to add a
shutdown state option, but I asked they be separate since they were IMO
addressing different problems.

To me this series was attempting to resolve a refcnt problem and a
"latent" free of the dmn->servers that was IIRC attempting to be used
before the state cleanup and the Unref(dmn) occurred to run the
virHashFree in virNetDaemonDispose.

The other series seems to be adding a state shutdown step to be run
before state cleanup; however, if we alter the cleanup code in libvirtd
does that make the state shutdown step unnecessary? I don't know and
really didn't want to think about it until we got through thinking about
the cleanup processing order. Still if one considers that state
initialization is really two steps (init and auto start), then splitting
cleanup into shutdown and cleanup seems reasonable, but I don't think
affects whether we have a crash or latent usage of dmn->servers. I could
be wrong, but it's so hard to tell.

[...]

>>
>>
>> So you cut off Martin's patch here, but if one essentially did inverse
>> order of startup they'd get the following: (the // comments are my notes
>> about startup order oddity).
>>
>> cleanup:
>>    virNetDaemonClose(dmn);
>>
>>    virNetlinkEventServiceStopAll();
>>
>>    if (driversInitialized) {
>>        /* Set via daemonRunStateInit in daemonStateInit */
>>        driversInitialized = false;
>>        virStateCleanup();
>>    }
>>
>>    virObjectUnref(adminProgram);
>>    virObjectUnref(srvAdm);
>>    virObjectUnref(qemuProgram);
>>    virObjectUnref(lxcProgram);
>>    virObjectUnref(remoteProgram);
>>
>>    // FWIW: Not exact here, but the order of startup should be
>>    // changed since srv is added to dmn, so realistically srv
>>    // depends on dmn existence.
>>    virObjectUnref(srv);
> 
> We can unref all of the above whenever we please.  @dmn keeps references
> for
> everything it needs, we just have @srv in order to set some things up,
> we can
> unref it right after we add it to @dmn.  The dependencies here are not the
> concern thanks to (hopefully) proper refcnt'ing.

OK - that's a bit more adjustment, but totally possible...

> 
>>    virObjectUnref(dmn);
>>
>>    virNetlinkShutdown();
>>
>>    // This one is also strange - seems startup should move the claiming
>>    // to before run_dir setup after forking into background.
>>    if (pid_file_fd != -1)
>>        virPidFileReleasePath(pid_file, pid_file_fd);
> 
> During start we should quit immediately, so it should be done before
> forking
> into background, otherwise init scripts will not know whether starting the
> service failed or not.
> 

But don't we want to write the child pid into the pid_file and not the
parent? Or more succinctly the grandchild pid...

e.g. the call after the fork is:

   if ((pid_file_fd = virPidFileAcquirePath(pid_file, false, getpid()))
< 0) {

which essentially writes the getpid value into the pid_file... If we did
this before the fork, then we'd be writing the parent pid into the file
and that parent exits as does it's child, leaving the grandchild as the
daemon (or am I reading all that wrong - it's quite possible ;-))

In any case, I don't mind posting something that alters the cleanup:
logic - I wasn't sure if you were going to do that based on a couple of
replies back in this thread...

John

>>
>>    VIR_FREE(run_dir);
> 
> Frees can be together, again, most of them could be freed before
> virNetDaemonRun
> since all the string are just used to initialize stuff and I believe it's
> strdup'd everywhere.
> 
>>
>>    if (statuswrite != -1) {
>>        if (ret != 0) {
>>            /* Tell parent of daemon what failed */
>>            char status = ret;
>>            while (write(statuswrite, &status, 1) == -1 &&
>>                   errno == EINTR)
>>                ;
>>        }
>>        VIR_FORCE_CLOSE(statuswrite);
>>    }
>>
>>    VIR_FREE(sock_file);
>>    VIR_FREE(sock_file_ro);
>>    VIR_FREE(sock_file_adm);
>>
>>    VIR_FREE(pid_file);
>>
>>    VIR_FREE(remote_config_file);
>>    daemonConfigFree(config);
>>
>>
>> Yes, this does differ from Martin's order...
>>
> 
> Sure, no problem, I just quickly sorted the lines by hand, I just wanted
> to show
> that the readability was bad as well.
> 
>> John




More information about the libvir-list mailing list