[libvirt] [PATCH v5 3/3] libvirtd: fix crash on termination

John Ferlan jferlan at redhat.com
Wed Nov 1 18:51:15 UTC 2017



On 10/31/2017 02:54 AM, Nikolay Shirokovskiy wrote:
> 
> 
> On 30.10.2017 19:21, Martin Kletzander wrote:
>> On Mon, Oct 30, 2017 at 07:14:39AM -0400, John Ferlan wrote:
>>> From: Nikolay Shirokovskiy <nshirokovskiy at virtuozzo.com>
>>>
>>> The problem is incorrect order of qemu driver shutdown and shutdown
>>> of netserver threads that serve client requests (thru qemu driver
>>> particularly).
>>>
>>> Net server threads are shutdown upon dispose which is triggered
>>> by last daemon object unref at the end of main function. At the same
>>> time qemu driver is shutdown earlier in virStateCleanup. As a result
>>> netserver threads see invalid driver object in the middle of request
>>> processing.
>>>
>>> Let's move shutting down netserver threads earlier to virNetDaemonClose.
>>>
>>> Note: order of last daemon unref and virStateCleanup
>>> is introduced in 85c3a182 for a valid reason.
>>>
>>
>> I must say I don't believe that's true.  Reading it back, that patch is
>> wrong IMHO.  I haven't gone through all the details of it and I don't
>> remember them from when I rewrote part of it, but the way this should
>> be handled is different.
>>
>> The way you can clearly identify such issues is when you see that one
>> thread determines the validity of data (pointer in this case).  This
>> must never happen.  That means that the pointer is used from more places
>> than how many references that object has.  However each of the pointers
>> for such object should have their own reference.
>>
>> So the problem is probably somewhere else.
> 
> If I understand you correctly we can fix issue in 85c3a182 in 
> a differenct way. Like adding reference to daemon for every
> driver that uses it for shutdown inhibition. It will require to
> pass unref function to driver init function or a bit of wild casting
> to virObjectPtr for unref. Not sure it is worth it in this case.

caveat: I'm still in a post KVM Forum travel brain fog...

Perhaps a bit more simple than that... Since the 'inhibitCallback' and
the 'inhibitOpaque' are essentially the mechanism that would pass/use
@dmn, then it'd be up to the inhibitCallback to add/remove the reference
with the *given* that the inhibitOpaque is a lockable object anyway...

Thus, virNetDaemonAddShutdownInhibition could use virObjectRef and
virNetDaemonRemoveShutdownInhibition could use virObjectUnref... The
"catch" is that for Shutdown the Unref would need to go after Unlock.

I believe that then would "replicate" the virObjectRef done in
daemonStateInit and the virObjectUnref done at the end of
daemonRunStateInit with respect to "some outside thread" being able to
use @dmn and not have it be Unref'd for the last time at any point in
time until the "consumer" was done with it.

Moving the Unref to after all the StateCleanup calls were done works
because it accomplishesd the same/similar task, but just held onto @dmn
longer because the original implementation didn't properly reference dmn
when it was being used for some other consumer/thread. Of course that
led to other problems which this series is addressing and I'm not clear
yet as to the impact vis-a-vis your StateShutdown series.

> 
> Anyway the initical conditions for current patch stay the same - 
> daemon object will be unrefed after state drivers cleanup and 
> RPC requests could deal with invalid state driver objects.
> 
>>
>>> Signed-off-by: John Ferlan <jferlan at redhat.com>
>>> ---
>>> src/rpc/virnetdaemon.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c
>>> index 8c21414897..33bd8e3b06 100644
>>> --- a/src/rpc/virnetdaemon.c
>>> +++ b/src/rpc/virnetdaemon.c
>>> @@ -881,6 +881,7 @@ virNetDaemonClose(virNetDaemonPtr dmn)
>>>     virObjectLock(dmn);
>>>
>>>     virHashForEach(dmn->servers, daemonServerClose, NULL);
>>> +    virHashRemoveAll(dmn->servers);
>>>
>>
>> If I get this correctly, you are removing the servers so that their workers get
>> cleaned up, but that should be done in a separate step.  Most probably what
>> daemonServerClose should be doing.  This is a workaround, but not a proper fix.

Are you sure?  The daemonServerClose is the callback for virHashForEach
which means the table->iterating would be set thus preventing
daemonServerClose from performing a virHashRemoveEntry of the server
element from the list.

So this is to me the right fix.

John

> 
> Well, original patch has a different approach for stopping RPC calls (see [1]) close
> to what you suggest. I think in this case it is matter of taste because there are
> no usecases which help us to prefer one way over another so I'm ok with both of them.
> 
> [1] https://www.redhat.com/archives/libvir-list/2017-September/msg00999.html
> [2] https://www.redhat.com/archives/libvir-list/2017-October/msg00643.html   (thread dicussing [1])
> 
>> If that's not true than the previous commit mentioned here should also be fixed
>> differently.
> 
> Sorry I don't understand it. Can you elaborate on this point?
> 
> 
> Nikolay
>>
>> So this is a clear NACK from my POV.
>>
>>>     virObjectUnlock(dmn);
>>> }
>>> -- 
>>> 2.13.6
>>>
>>> -- 
>>> libvir-list mailing list
>>> libvir-list at redhat.com
>>> https://www.redhat.com/mailman/listinfo/libvir-list
>>
>>
>> --
>> libvir-list mailing list
>> libvir-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/libvir-list
>>




More information about the libvir-list mailing list