[libvirt] [PATCH v5 3/3] libvirtd: fix crash on termination

Martin Kletzander mkletzan at redhat.com
Mon Nov 13 14:59:45 UTC 2017


On Tue, Nov 07, 2017 at 05:06:33PM +0100, Erik Skultety wrote:
>On Tue, Nov 07, 2017 at 10:28:10AM -0500, John Ferlan wrote:
>>
>>
>> On 11/07/2017 04:46 AM, Martin Kletzander wrote:
>> > On Mon, Nov 06, 2017 at 03:20:13PM -0500, John Ferlan wrote:
>> >>
>> >>
>> >> On 11/03/2017 05:20 AM, Nikolay Shirokovskiy wrote:
>> >>>
>> >>>
>> >>> On 03.11.2017 11:42, Martin Kletzander wrote:
>> >>>> On Fri, Nov 03, 2017 at 11:07:36AM +0300, Nikolay Shirokovskiy wrote:
>> >>>>> On 02.11.2017 19:32, Martin Kletzander wrote:
>> >>>>>> This just makes the window of opportunity (between
>> >>>>>> daemonServerClose()
>> >>>>>> and the actual removal of the virNetServerPtr from the hash) smaller.
>> >>>>>> That's why I don't see it as a fix, rather as a workaround.
>> >>>>>>
>> >>>>>> What I think should be done is the order of what's done with
>> >>>>>> services,
>> >>>>>> servers, drivers, daemon, workers, etc.
>> >>>>>>
>> >>>>>> I read the other threds and I think we're on the same note here, it's
>> >>>>>> just that there is lot of confusion and we're all approaching it
>> >>>>>> differently.
>> >>
>> >> Trying to catch up, but this thread has gone beyond where I was
>> >> previously. I still have to push the first two patches of this series...
>> >>
>> >> Yes, libvirtd cleanup path is awful and there's quite a few "issues"
>> >> that have cropped up because previous patches didn't take care of
>> >> refcnt'ing properly and just "added" cleanup "somewhere" in the list of
>> >> things to be cleaned up... Fortunately to a degree I think this is still
>> >> all edge condition type stuff, but getting it right would be good...
>> >>
>> >> For me theory has always been to cleanup in the inverse order that
>> >> something was created - for sure libvirtd doesn't follow that at all.
>> >> For libvirtd the shutdown/cleanup logic just seemed too haphazard and
>> >> fixing never made it to the top of the list of things to think about.
>> >>
>> >> Another "caveat" in libvirtd cleanup logic is that driversInitialized is
>> >> set as part of a thread that could be running or have run if the code
>> >> enters the cleanup: path as a result of virNetlinkEventServiceStart
>> >> failure before we enter the event loop (e.g. virNetDaemonRun).
>> >>
>> >> Still like Erik - I do like Martin's suggestion and would like to see it
>> >> as a formal patch - although perhaps with a few tweaks.
>> >>
>> >> IIRC, this particular patch "worked" because by removing the
>> >> dmn->servers refcnt earlier made sure that we didn't have to wait for
>> >> virObjectUnref(dmn) to occur after virStateCleanup.
>> >>
>> >
>> > Would it be to daring to ask you guys to send one series that includes
>> > all the
>> > related patchsets?  We can then discuss there, it would be easier to
>> > follow as
>> > well.  I don't want anyone to do any extra work, but we might save some
>> > review
>> > bandwidth by resending those patches in one series.  Or maybe it's just
>> > me and
>> > I have a hard time keeping up with everything at the same time =)
>> >
>>
>> Originally "this" crash fix was mixed in with the other series to add a
>> shutdown state option, but I asked they be separate since they were IMO
>> addressing different problems.
>>
>> To me this series was attempting to resolve a refcnt problem and a
>> "latent" free of the dmn->servers that was IIRC attempting to be used
>> before the state cleanup and the Unref(dmn) occurred to run the
>> virHashFree in virNetDaemonDispose.
>
>Exactly, this was only about refcounting of @dmn, see below for @driver issue.
>

OK, sorry for that then, I followed up only more recently and honestly I'm
getting pretty confused with how many thought threads this has so the context
switches get more and more expensive.

>>
>> The other series seems to be adding a state shutdown step to be run
>> before state cleanup; however, if we alter the cleanup code in libvirtd
>> does that make the state shutdown step unnecessary? I don't know and
>
>Actually, no, the stateShutdown was addressing the issue with running
><driver>StateCleanup while there still was an active worker thread that could
>potentially access the @driver object, in which case - SIGSEGV, but as I wrote
>in my response, I don't like the idea of having another state driver callback,
>as I feel it introduces a bit of ambiguity, since by just looking at the
>callback names, you can't have an idea, what the difference between
>stateCleanup and stateShutdown is. I think this should be handled as part of
>the final stage of leaving the event loop and simply register an appropriate
>callback to the handle, that way, you don't need to define a new state callback
>which would most likely used by a couple of drivers.
>

If stateShutdown is what makes everything go away, then why not?  The names are
pretty descriptive, shutdown is called when the daemon is shutting down and
cleanup when you need to clean up after yourself.  Used by only few drivers?
Well, we can implement it everywhere and not make it make optional, but rather
mandatory.  If that is checked on the registration then we'll know right away
when we missed something (aat runtime, but right after someone tries running
it).

>Erik
>
>> really didn't want to think about it until we got through thinking about
>> the cleanup processing order. Still if one considers that state
>> initialization is really two steps (init and auto start), then splitting
>> cleanup into shutdown and cleanup seems reasonable, but I don't think
>> affects whether we have a crash or latent usage of dmn->servers. I could
>> be wrong, but it's so hard to tell.
>>
>> [...]
>
>Erik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20171113/3c942d86/attachment-0001.sig>


More information about the libvir-list mailing list