[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH v5 3/3] libvirtd: fix crash on termination




On 11/03/2017 05:20 AM, Nikolay Shirokovskiy wrote:
> 
> 
> On 03.11.2017 11:42, Martin Kletzander wrote:
>> On Fri, Nov 03, 2017 at 11:07:36AM +0300, Nikolay Shirokovskiy wrote:
>>> On 02.11.2017 19:32, Martin Kletzander wrote:
>>>> This just makes the window of opportunity (between daemonServerClose()
>>>> and the actual removal of the virNetServerPtr from the hash) smaller.
>>>> That's why I don't see it as a fix, rather as a workaround.
>>>>
>>>> What I think should be done is the order of what's done with services,
>>>> servers, drivers, daemon, workers, etc.
>>>>
>>>> I read the other threds and I think we're on the same note here, it's
>>>> just that there is lot of confusion and we're all approaching it
>>>> differently.

Trying to catch up, but this thread has gone beyond where I was
previously. I still have to push the first two patches of this series...

Yes, libvirtd cleanup path is awful and there's quite a few "issues"
that have cropped up because previous patches didn't take care of
refcnt'ing properly and just "added" cleanup "somewhere" in the list of
things to be cleaned up... Fortunately to a degree I think this is still
all edge condition type stuff, but getting it right would be good...

For me theory has always been to cleanup in the inverse order that
something was created - for sure libvirtd doesn't follow that at all.
For libvirtd the shutdown/cleanup logic just seemed too haphazard and
fixing never made it to the top of the list of things to think about.

Another "caveat" in libvirtd cleanup logic is that driversInitialized is
set as part of a thread that could be running or have run if the code
enters the cleanup: path as a result of virNetlinkEventServiceStart
failure before we enter the event loop (e.g. virNetDaemonRun).

Still like Erik - I do like Martin's suggestion and would like to see it
as a formal patch - although perhaps with a few tweaks.

IIRC, this particular patch "worked" because by removing the
dmn->servers refcnt earlier made sure that we didn't have to wait for
virObjectUnref(dmn) to occur after virStateCleanup.

>>>>
>>>> So first let's agree on what should be done when virNetDaemonClose() is
>>>> called and the loop in virNetDaemonRun() exits.  My opinion it is:
>>>>
>>>> 1) Stop accepting new calls and services:
>>>>    virNetServerServiceToggle(false);
>>>
>>> This not really necessary as this has effect only if loop is running.
>>>
>>
>> Sure, good point.
>>
>>>>
>>>> 2) Wait for all the workers to finish:
>>>>    First part of virThreadPoolFree()
>>>
>>> I was thinking of splitting this function to finish/free too.
>>> After finish it is not really usable because there are no
>>> more threads in pool anymore so we have to introduce state
>>> finished and check it in every thread pool api function to
>>> be safe. So this will introduce a bit of complexity only
>>> for the sake of some scheme IMO.
>>>
>>
>> Yeah, it's similar to the daemon when you are between close and dispose, but for
>> the daemon we need it.
>>
>> What function do you think we would need to check this for?  The finish would
>> end all the threads and since the event loop is not running no new jobs could be
>> posted.
> 
> virThreadPoolSendJob is already good in this respect as it checks @quit at the beginning.
> Looks like we need only to add same check to virThreadPoolSetParameters. So
> not so much complexity. 
> 
> I can imagine thread pool function being called from another
> thread pools thread. AFAIK there are not such situation though. So just to be on a safe side.
> 
>>
>>>>
>>>> 3) Kill off clients:
>>>>    virNetServerClientClose()
>>>>
>>>> 4) Clean up drivers:
>>>>    virStateCleanup()
>>>
>>> I think it is better for net daemon/servers not to know about
>>> hypervisor drivers directly.
>>>
>>
>> Sure, I was only talking about the libvirtd for the sake of simplicity.
>>
>>>>
>>>> 5) Clean up services, servers, daemon, etc.
>>>>    including the second part of virThreadPoolFree()
>>>>
>>>> The last thing is what should be done in virNetDaemonDispose(), but
>>>> unfortunately we (mis)use it for more than that.  Numbers 1-4 should be,
>>>> IMO in virNet{Daemon,Server}Close().
>>>>
>>>> Would this solve your problem?  I mean I'm not against more reference
>>>> counting, it should be done properly, but I think the above would do the
>>>> trick and also make sense from the naming point of view.
>>>>
>>>> Since both the daemon and server will probably not ever be consistent
>>>> after calling the Close function, it might make sense to just include
>>>> virNetDaemonClose() into virNetDaemonDispose() and I'm not against that.
>>>> But the order of the steps should still be as described above IMO.
>>>
>>> This won't work IMO. If we are already in virNetDaemonDispose then
>>> @dmn object is invalid but we don't yet call virStateCleanup yet
>>> (if it is moved to virNetDaemonClose) and this is the problem 85c3a182 dealt with.
>>>
>>
>> It would solve it if order above is followed.  Sure, virNetDaemon should not
> 
> It is ok if virStateCleanup is called from virNetDaemonClose somehow. I was
> only against calling virNetDaemonClose from virNetDaemonDispose in this case.
> 
>> know about any drivers, but we can simply register a cleanup callback for the
>> daemon which the Dispose function will call after the thread pools are cleaned
>> up (just thought of that now).
> 
> Well to me it is too generic for this case. We can code this order just
> as order of statements.
> 
> 1. close netservers (which imply finishing rpc thread pool)
> 2. close hypervisor drivers
> 3. unref @dmn
> 
> Or we can use refcounts and take point 3 out of consideration.
> Also there is a patch series [1] that introduces 0 step to help 1 step to finish actually
> in the situation when loop is down.
> 
> 0. shutdown hypervirsor drivers.
> 
> [1] https://www.redhat.com/archives/libvir-list/2017-October/msg01134.html

Right this still exists out there, but it's hard enough to focus on the
issues being discussed here without adding more into the mix.  If we can
get the shutdown order right as Martin has proposed, then address this
afterwards, it'll perhaps make things easier.

>>
>>>>
>>>> Let me know what you think.  And don't forget to have a nice day!
>>>>
>>>> Martin
>>>>
>>>> P.S.: The order of clean ups in libvirtd's main() is absolutely terrible and
>>>>      unreadable.  Also, even though I guess it goes without saying, if proper
>>>>      reference counting is in place, the order of virObjectUnref()s and
>>>>      VIR_FREE()s should not matter.  I think it chould be changed like this:
>>>
>>> It makes sense that in cleanup section we can first make all cleanup and
>>> then all dispose. It is only not clear to me can we call virStateCleanup
>>> before virNetlinkShutdown. Also I would keep comment for @dmn unref
>>> until proper refcount for @dmn is implemented.
>>>
>>
>> I don't think virStateCleanup() and virNetlinkShutdown() are dependent.  The
>> comment can stay, I was just trying to show that it's also a mess.  What I would
>> like, actually, is to have the daemon running with all the references and
>> pointers (that the main() function doesn't need anymore) freed, but both this
>> and what I suggested below is unrelated to what we're trying to fix, I don't
>> know why I started talking about it.


So you cut off Martin's patch here, but if one essentially did inverse
order of startup they'd get the following: (the // comments are my notes
about startup order oddity).

cleanup:
    virNetDaemonClose(dmn);

    virNetlinkEventServiceStopAll();

    if (driversInitialized) {
        /* Set via daemonRunStateInit in daemonStateInit */
        driversInitialized = false;
        virStateCleanup();
    }

    virObjectUnref(adminProgram);
    virObjectUnref(srvAdm);
    virObjectUnref(qemuProgram);
    virObjectUnref(lxcProgram);
    virObjectUnref(remoteProgram);

    // FWIW: Not exact here, but the order of startup should be
    // changed since srv is added to dmn, so realistically srv
    // depends on dmn existence.
    virObjectUnref(srv);
    virObjectUnref(dmn);

    virNetlinkShutdown();

    // This one is also strange - seems startup should move the claiming
    // to before run_dir setup after forking into background.
    if (pid_file_fd != -1)
        virPidFileReleasePath(pid_file, pid_file_fd);

    VIR_FREE(run_dir);

    if (statuswrite != -1) {
        if (ret != 0) {
            /* Tell parent of daemon what failed */
            char status = ret;
            while (write(statuswrite, &status, 1) == -1 &&
                   errno == EINTR)
                ;
        }
        VIR_FORCE_CLOSE(statuswrite);
    }

    VIR_FREE(sock_file);
    VIR_FREE(sock_file_ro);
    VIR_FREE(sock_file_adm);

    VIR_FREE(pid_file);

    VIR_FREE(remote_config_file);
    daemonConfigFree(config);


Yes, this does differ from Martin's order...

John


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]