[libvirt] [PATCH v2 2/2] remote: free client all event callbacks at remoteClientCloseFunc

Mon Nov 13 13:07:12 UTC 2017


在 2017/11/12 22:45, John Ferlan 写道:
>
> On 11/11/2017 03:20 AM, xinhua.Cao wrote:
>>
>> 在 2017/11/11 5:43, John Ferlan 写道:
>>> On 11/06/2017 12:47 AM, xinhua.Cao wrote:
>>>> base on commit fe8f1c8b8650c8ab5e7d27338f4538582651bd14, we solve
>>>> libvirt coredump problem, but it introduce a memory leak sense:
>>>>
>>>> 1. one client register a domain event such as reboot event
>>>> 2. and client was terminated unexpectly, such as kill -9,
>>> unexpectedly
>>>
>>>> then this client object will not free at libvirtd service program.
>>>>
>>>> remoteDispatchConnectDomainEventCallbackRegisterAny reference the
>>>> client,
>>>> but when client was terminated before it call deRegisterAny,the
>>>> reference
>>>> of client will not reduced to zero. so the memory leak take place.
>>> What's the difference between the path that would terminate normally and
>>> the one that terminates abnormally. It seems to me there might be (an)
>>> extra Ref(s) on the client which prevents virNetServerClientDispose from
>>> calling it's @privateDataFreeFunc which would call remoteClientFreeFunc
>>> since remoteClientInitHook worked properly.
>>>
>>> That's what needs to be found.
>> If client terminate normally, client would call
>> virConnectDomainEventDeregisterAny to
>> clear eventCallbacks. Then libvirt daemon will call
>> remoteDispatchConnectDomainEventDeregisterAny
>> -> remoteEventCallbackFree -> virObjectUnref(callback->client) to free
>> eventCallbacks and unRef client.
>> just think if client don't call  virConnectDomainEventDeregisterAny and
>> be killed. then there is no way to call call
>> remoteDispatchConnectDomainEventDeregisterAny -> remoteEventCallbackFree
>> -> virObjectUnref(callback->client)
>> so client's ref can't down to zero. so remoteClientFreeFunc can't be
>> called. then client object memory leak.
> So the client/sample code is not calling Deregister because it has no
> "handling" of an abnormal exit. That is no signal handler to call
> Deregister in the event some outside force kills the client/sample code
> while it's processing.
>
> So a different solution would be to add signal handling to the code in
> order to handle the kill and make the Deregister (and close) calls.
signal handler can handle such as kill -15 signal, But can't handle kill 
-9 signal.
so we also need to resolve it in libvirt
>> The difference is call remoteDispatchConnectDomainEventDeregisterAny or
>> don't call remoteDispatchConnectDomainEventDeregisterAny
>> to Unref(callback->client).
>>
> IIUC, the whole reason the Deregister code is in the Free Func is to
> "handle" the case where that last client reference was Unref'd, but
> someone didn't perform the actual Deregister. The logic predates my
> libvirt involvement...
>
> Still because of commit id 'fe8f1c8b' where we generate a REF for the
> Register and that's transparent to the consumer (e.g. how would they
> know they need to ensure that Deregister is called), thus the purpose of
> this patch is to find a way to Deregister if it's determined that the
> consumer hasn't by the time of the "last" REF we'd have [trying to write
> a commit message here too].
>
> This solution to this problem is to alter the processing to have the
> remoteClientCloseFunc handle performing the Deregister calls instead of
> the remoteClientFreeFunc because there's no way FreeFunc would be called
> unless the Deregister was already called. In fact, if you think about it
> if the CloseFunc does the Deregister why would the FreeFunc also include
> that call?  IOW: It should only need to be in one or the other.
>
> I know you've already posted patch v3; however, I also think you need to
> read commit id '8294aa0c'. If you're moving the Deregister handling to
> the close func, then I would think that the "reason" for that commit
> still applies.
>
> When you generate a v4, my suggestion is to generate 2 patches.
>
>    1. Just the remoteClientFreePrivateCallbacks separation including the
> sysident handling. From my read of the code, the virConnectClose doesn't
> have the same issue.
I am unfamiliarity with sysident. then research on it.  It used for 
polkit check
but I still can't confirm it.
>    2. Move the call to remoteClientFreePrivateCallbacks from FreeFunc to
> CloseFunc with a commit message that explains why this is necessary
> since commit id 'fe8f1c8b'  (BTW: No need to supply the complete hash, 8
> significant digits are usually plenty).
this two advices are very useful. thanks very much.
>
> I supplied a bit more details w/r/t what I found for REF/UNREF - I
> assume it essentially matches what you've found. But having it here
> certainly helps - so thanks for providing it along with the example.
>
>> here is my test code
>>
>> #include<stdio.h>
>> #include<pthread.h>
>> #include<libvirt/libvirt.h>
>>
>> int _DomainStatusCallback(virConnectPtr conn, virDomainPtr dom, int event,
>>          int detail, void *opaque);
>> void timeOutCallback(int timer, void *opaque);
>>
>> int run = 1;
>>
>> int _DomainStatusCallback(virConnectPtr conn, virDomainPtr dom, int event,
>>          int detail, void *opaque)
>> {
>>      char uuid[128] = { 0 };
>>      (void) virDomainGetUUIDString(dom, uuid);
>>      printf("domain %s come a event %d\n", uuid, event);
>> }
>>
>> void _timeOutCallback(int timer, void *opaque)
>> {
>>      printf("time out is comming\n");
>> }
>>
>> void *event_thread(void *opaque)
>> {
>>      int i =0;
>>      if (virEventRegisterDefaultImpl() < 0)
>>      {
>>          return NULL;
>>      }
>>
>>      virConnectPtr conn = virConnectOpen("qemu:///system");
>>      if (conn == NULL)
>>      {
>>          return NULL;
>>      }
>>
>>      int callback = virConnectDomainEventRegisterAny(conn, NULL,
>> VIR_DOMAIN_EVENT_ID_LIFECYCLE,
>>              _DomainStatusCallback, NULL, NULL);
>>      int timeout = virEventAddTimeout(1000,_timeOutCallback,NULL,NULL);
>>
>>      while (run && virConnectIsAlive(conn) == 1)
>>      {
>>         virEventRunDefaultImpl();
>>      }
>>
>>      sleep(5);
>>      virConnectDomainEventDeregisterAny(conn, callback);
>>      virConnectClose(conn);
>> }
>>
>>
>> int main(int argc, char *argv[])
>> {
>>      pthread_t pid;
>>      int ret = 0;
>>      ret = pthread_create(&pid, NULL, event_thread, NULL);
>>      sleep(100);
>>      run = 0;
>>      sleep(10);
>> }
>>
>> when I don't call remoteClientFreePrivateCallbacks at
>> remoteClientCloseFunc,
>> here is client refs change when I kill the client.
>> (gdb) c
>> Continuing.
>> Hardware watchpoint 6: *(int *) 0x5637edc888f4
>>
>> Old value = 4
>> New value = 5
>> virObjectRef (anyobj=anyobj at entry=0x5637edc888f0) at util/virobject.c:296
>> 296         PROBE(OBJECT_REF, "obj=%p", obj);
>> (gdb) bt
>> #0  virObjectRef (anyobj=anyobj at entry=0x5637edc888f0) at
>> util/virobject.c:296
>> #1  0x00005637ec61d643 in
>> remoteDispatchConnectDomainEventCallbackRegisterAny (server=<optimized
>> out>, msg=<optimized out>,
>>      ret=0x7f054c0008e0, args=0x7f054c0008c0, rerr=0x7f057ba6cc90,
>> client=0x5637edc888f0) at remote.c:4418
> Although the line numbers are different with top of tree, this is the
> RegisterAny REF on the client...
>
> [...]
>
> So that says to me we generally run with 3-4 REFs (without any Register
> Event occurring).  AFAICT, the following is the list:
>
>     1. Adding to the srv->clients (virNetServerAddClient)
>     2. Adding IO Callback (virNetServerClientRegisterEvent)
>     3. Adding Keep Alive (virNetServerClientInitKeepAlive and
>                           virKeepAliveStart)
>
> The 4th is a bit trickier, but I believe it's REF'd during the call to
> virNetServerClientDispatchRead...
>
>> Old value = 5
>> New value = 4
>> virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
>> 259         PROBE(OBJECT_UNREF, "obj=%p", obj);
>> Continuing.
>> [Switching to Thread 0x7f058eb1f840 (LWP 3878)]
>> Hardware watchpoint 9: *(int *) 0x5637edc88634
> I'm assuming this comes from the opposite end of the DispatchRead -
> there's no trace here to prove it, but that's not necessarily surprising
> either as it's not easy to find where the Unref would naturally occur
> without an error path occurring....
>
>> Old value = 4
>> New value = 5
>> virObjectRef (anyobj=anyobj at entry=0x5637edc88630) at util/virobject.c:296
>> 296         PROBE(OBJECT_REF, "obj=%p", obj);
>> #0  virObjectRef (anyobj=anyobj at entry=0x5637edc88630) at
>> util/virobject.c:296
>> #1  0x00005637ec65299d in virNetServerClientClose
>> (client=0x5637edc88630) at rpc/virnetserverclient.c:978
> [...]
>
>> Old value = 5
>> New value = 4
>> virObjectUnref (anyobj=anyobj at entry=0x5637edc88630) at util/virobject.c:259
>> 259         PROBE(OBJECT_UNREF, "obj=%p", obj);
>> #0  virObjectUnref (anyobj=anyobj at entry=0x5637edc88630) at
>> util/virobject.c:259
>> #1  0x00005637ec6529bd in virNetServerClientClose
>> (client=0x5637edc88630) at rpc/virnetserverclient.c:982
> This pair of REF/UNREF in virNetServerClientClose when client->keepalive
> is set.
>
> [...]
>
>> Old value = 4
>> New value = 5
>> virObjectRef (anyobj=anyobj at entry=0x5637edc88630) at util/virobject.c:296
>> 296         PROBE(OBJECT_REF, "obj=%p", obj);
>> #0  virObjectRef (anyobj=anyobj at entry=0x5637edc88630) at
>> util/virobject.c:296
>> #1  0x00005637ec6529d1 in virNetServerClientClose
>> (client=0x5637edc88630) at rpc/virnetserverclient.c:987
> [...]
>
>> Old value = 5
>> New value = 4
>> virObjectUnref (anyobj=anyobj at entry=0x5637edc88630) at util/virobject.c:259
>> 259         PROBE(OBJECT_UNREF, "obj=%p", obj);
>> #0  virObjectUnref (anyobj=anyobj at entry=0x5637edc88630) at
>> util/virobject.c:259
>> #1  0x00005637ec6529ee in virNetServerClientClose
>> (client=0x5637edc88630) at rpc/virnetserverclient.c:991
> This pair of REF/UNREF in virNetServerClientClose when
> client->privateDataCloseFunc is set.
>
> [...]
>
>> Old value = 4
>> New value = 3
>> virObjectUnref (anyobj=anyobj at entry=0x5637edc88630) at util/virobject.c:259
>> 259         PROBE(OBJECT_UNREF, "obj=%p", obj);
>> #0  virObjectUnref (anyobj=anyobj at entry=0x5637edc88630) at
>> util/virobject.c:259
>> #1  0x00005637ec650705 in virNetServerProcessClients
>> (srv=0x5637edc6bf10) at rpc/virnetserver.c:873
> Again my line numbers are off a bit, but this would be the for the UNREF
> after VIR_DELETE_ELEMENT when removed from srv->clients
>
> [...]
>
>> Old value = 3
>> New value = 2
>> virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
>> 259         PROBE(OBJECT_UNREF, "obj=%p", obj);
>> #0  virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
>> #1  0x00007f058de4199b in virObjectUnref (anyobj=<optimized out>) at
>> util/virobject.c:265
>> #2  0x00007f058de0db34 in virEventPollCleanupTimeouts () at
>> util/vireventpoll.c:543
> [...]
>
> This one is the opposite end of the virObjectRef done for the keep alive
> timer. See virNetServerClientInitKeepAlive and virKeepAliveStart usage
> of virEventAddTimeout.
>
>> Old value = 2
>> New value = 1
>> virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
>> 259         PROBE(OBJECT_UNREF, "obj=%p", obj);
>> #0  virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
>> #1  0x00005637ec656bee in virNetSocketEventFree
>> (opaque=opaque at entry=0x5637edc8a5b0) at rpc/virnetsocket.c:2152
>> #2  0x00007f058de0dd34 in virEventPollCleanupHandles () at
>> util/vireventpoll.c:592
> [...]
>
> This one is the opposite end of the virObjectRef done in
> virNetServerClientRegisterEvent prior to virNetSocketAddIOCallback which
> calls virEventAddHandle.
>
> Theoretically the "last" REF...
>
> John
>> Continuing.
>>
>> so the client refs alway have 1 when I register one event callbacks.
>>>> this patch will deRegister all event callbacks when client was close.
>>>> ---
>>> FWIW: This commit message was a bit difficult to follow - I know it's
>>> primarily a language barrier thing - so maybe if you just construct some
>>> function code flow for both initialization/open and destruction/close
>>> it'd help... Paths that work and the path that you think is broken...
>>> That can be done in this section after the "---" and before the changed
>>> API's.
>>>
>>> I assume you've done a lot of research, but the details of that research
>>> can be difficult to just "pick up on".  Typically someone will provide
>>> some sort of valgrind output as an example.
>>>
>>> I just have a sense that this could boil down to a path (or two) that
>>> are doing the virObjectRef without the expected virObjectUnref occurring
>>> - something perhaps to do with how @wantClose is handled in the abnormal
>>> termination case. Nothing jumps out at me though even after looking at
>>> the code for a while.
>>>
>>>
>>>>    daemon/remote.c | 46 ++++++++++++++++++++++++++++------------------
>>>>    1 file changed, 28 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/daemon/remote.c b/daemon/remote.c
>>>> index cbcb6e8..466b2e7 100644
>>>> --- a/daemon/remote.c
>>>> +++ b/daemon/remote.c
>>>> @@ -1689,23 +1689,10 @@ void
>>>> remoteRelayConnectionClosedEvent(virConnectPtr conn ATTRIBUTE_UNUSED,
>>>> int r
>>>>            neventCallbacks = 0; \
>>>>        } while (0);
>>>>    -/*
>>>> - * You must hold lock for at least the client
>>>> - * We don't free stuff here, merely disconnect the client's
>>>> - * network socket & resources.
>>>> - * We keep the libvirt connection open until any async
>>>> - * jobs have finished, then clean it up elsewhere
>>>> - */
>>>> -void remoteClientFreeFunc(void *data)
>>>> +static void
>>>> +remoteClientFreePrivateCallbacks(struct daemonClientPrivate *priv)
>>>>    {
>>>> -    struct daemonClientPrivate *priv = data;
>>>> -
>>>> -    /* Deregister event delivery callback */
>>>> -    if (priv->conn) {
>>>> -        virIdentityPtr sysident = virIdentityGetSystem();
>>>> -
>>>> -        virIdentitySetCurrent(sysident);
>>>> -
>>>> +    if (priv && priv->conn) {
>>> Well each of the callers has already dereferenced @priv so it's a bit
>>> too late to check it now!
>> yes, I also think this check is too late. this placement can only check
>> priv->conn.
>>>>            DEREG_CB(priv->conn, priv->domainEventCallbacks,
>>>>                     priv->ndomainEventCallbacks,
>>>>                     virConnectDomainEventDeregisterAny, "domain");
>>>> @@ -1730,16 +1717,38 @@ void remoteClientFreeFunc(void *data)
>>>>                                                     
>>>> remoteRelayConnectionClosedEvent) < 0)
>>>>                    VIR_WARN("unexpected close callback event
>>>> deregister failure");
>>>>            }
>>>> +    }
>>>> +}
>>>> +#undef DEREG_CB
>>>> +
>>> There should be another blank line here.
>> OK, I lost a blank line.
>>>> +/*
>>>> + * You must hold lock for at least the client
>>>> + * We don't free stuff here, merely disconnect the client's
>>>> + * network socket & resources.
>>>> + * We keep the libvirt connection open until any async
>>>> + * jobs have finished, then clean it up elsewhere
>>>> + */
>>>> +void remoteClientFreeFunc(void *data)
>>>> +{
>>>> +    struct daemonClientPrivate *priv = data;
>>>> +
>>>> +    /* Deregister event delivery callback */
>>>> +    if (priv->conn) {
>>>> +        virIdentityPtr sysident = virIdentityGetSystem();
>>>> +
>>>> +        virIdentitySetCurrent(sysident);
>>>> +
>>>> +        remoteClientFreePrivateCallbacks(priv);
>>>>              virConnectClose(priv->conn);
>>>>              virIdentitySetCurrent(NULL);
>>>>            virObjectUnref(sysident);
>>>> -    }
>>>>    +    }
>>>>        VIR_FREE(priv);
>>>> +
>>>>    }
>>>> -#undef DEREG_CB
>>>>        static void remoteClientCloseFunc(virNetServerClientPtr client)
>>>> @@ -1747,6 +1756,7 @@ static void
>>>> remoteClientCloseFunc(virNetServerClientPtr client)
>>>>        struct daemonClientPrivate *priv =
>>>> virNetServerClientGetPrivateData(client);
>>>>          daemonRemoveAllClientStreams(priv->streams);
>>>> +    remoteClientFreePrivateCallbacks(priv);
>>> Perhaps this just gets a "if (priv->conn)" before calling...
>>>
>>> Still if as I assume the missing Unref is found (someday), then
>>> theoretically remoteClientFreeFunc would be called - of course the
>>> nEventCallbacks = 0 will ensure the Dereg's dont' get called again. So
>>> it shoudn't be a problem
>> In my opinion, it is safe to check point is Null. But check everywhere
>> is stupid.
>> so it is hard to determine where need check.:-)
>>> John
>>>
>>>>    }
>>>>     
>>> .
>>>
>>
> .
>