[libvirt] [PATCH v2 2/2] remote: free client all event callbacks at remoteClientCloseFunc
xinhua.Cao
caoxinhua at huawei.com
Mon Nov 13 13:07:12 UTC 2017
在 2017/11/12 22:45, John Ferlan 写道:
>
> On 11/11/2017 03:20 AM, xinhua.Cao wrote:
>>
>> 在 2017/11/11 5:43, John Ferlan 写道:
>>> On 11/06/2017 12:47 AM, xinhua.Cao wrote:
>>>> base on commit fe8f1c8b8650c8ab5e7d27338f4538582651bd14, we solve
>>>> libvirt coredump problem, but it introduce a memory leak sense:
>>>>
>>>> 1. one client register a domain event such as reboot event
>>>> 2. and client was terminated unexpectly, such as kill -9,
>>> unexpectedly
>>>
>>>> then this client object will not free at libvirtd service program.
>>>>
>>>> remoteDispatchConnectDomainEventCallbackRegisterAny reference the
>>>> client,
>>>> but when client was terminated before it call deRegisterAny,the
>>>> reference
>>>> of client will not reduced to zero. so the memory leak take place.
>>> What's the difference between the path that would terminate normally and
>>> the one that terminates abnormally. It seems to me there might be (an)
>>> extra Ref(s) on the client which prevents virNetServerClientDispose from
>>> calling it's @privateDataFreeFunc which would call remoteClientFreeFunc
>>> since remoteClientInitHook worked properly.
>>>
>>> That's what needs to be found.
>> If client terminate normally, client would call
>> virConnectDomainEventDeregisterAny to
>> clear eventCallbacks. Then libvirt daemon will call
>> remoteDispatchConnectDomainEventDeregisterAny
>> -> remoteEventCallbackFree -> virObjectUnref(callback->client) to free
>> eventCallbacks and unRef client.
>> just think if client don't call virConnectDomainEventDeregisterAny and
>> be killed. then there is no way to call call
>> remoteDispatchConnectDomainEventDeregisterAny -> remoteEventCallbackFree
>> -> virObjectUnref(callback->client)
>> so client's ref can't down to zero. so remoteClientFreeFunc can't be
>> called. then client object memory leak.
> So the client/sample code is not calling Deregister because it has no
> "handling" of an abnormal exit. That is no signal handler to call
> Deregister in the event some outside force kills the client/sample code
> while it's processing.
>
> So a different solution would be to add signal handling to the code in
> order to handle the kill and make the Deregister (and close) calls.
signal handler can handle such as kill -15 signal, But can't handle kill
-9 signal.
so we also need to resolve it in libvirt
>> The difference is call remoteDispatchConnectDomainEventDeregisterAny or
>> don't call remoteDispatchConnectDomainEventDeregisterAny
>> to Unref(callback->client).
>>
> IIUC, the whole reason the Deregister code is in the Free Func is to
> "handle" the case where that last client reference was Unref'd, but
> someone didn't perform the actual Deregister. The logic predates my
> libvirt involvement...
>
> Still because of commit id 'fe8f1c8b' where we generate a REF for the
> Register and that's transparent to the consumer (e.g. how would they
> know they need to ensure that Deregister is called), thus the purpose of
> this patch is to find a way to Deregister if it's determined that the
> consumer hasn't by the time of the "last" REF we'd have [trying to write
> a commit message here too].
>
> This solution to this problem is to alter the processing to have the
> remoteClientCloseFunc handle performing the Deregister calls instead of
> the remoteClientFreeFunc because there's no way FreeFunc would be called
> unless the Deregister was already called. In fact, if you think about it
> if the CloseFunc does the Deregister why would the FreeFunc also include
> that call? IOW: It should only need to be in one or the other.
>
> I know you've already posted patch v3; however, I also think you need to
> read commit id '8294aa0c'. If you're moving the Deregister handling to
> the close func, then I would think that the "reason" for that commit
> still applies.
>
> When you generate a v4, my suggestion is to generate 2 patches.
>
> 1. Just the remoteClientFreePrivateCallbacks separation including the
> sysident handling. From my read of the code, the virConnectClose doesn't
> have the same issue.
I am unfamiliarity with sysident. then research on it. It used for
polkit check
but I still can't confirm it.
> 2. Move the call to remoteClientFreePrivateCallbacks from FreeFunc to
> CloseFunc with a commit message that explains why this is necessary
> since commit id 'fe8f1c8b' (BTW: No need to supply the complete hash, 8
> significant digits are usually plenty).
this two advices are very useful. thanks very much.
>
> I supplied a bit more details w/r/t what I found for REF/UNREF - I
> assume it essentially matches what you've found. But having it here
> certainly helps - so thanks for providing it along with the example.
>
>> here is my test code
>>
>> #include<stdio.h>
>> #include<pthread.h>
>> #include<libvirt/libvirt.h>
>>
>> int _DomainStatusCallback(virConnectPtr conn, virDomainPtr dom, int event,
>> int detail, void *opaque);
>> void timeOutCallback(int timer, void *opaque);
>>
>> int run = 1;
>>
>> int _DomainStatusCallback(virConnectPtr conn, virDomainPtr dom, int event,
>> int detail, void *opaque)
>> {
>> char uuid[128] = { 0 };
>> (void) virDomainGetUUIDString(dom, uuid);
>> printf("domain %s come a event %d\n", uuid, event);
>> }
>>
>> void _timeOutCallback(int timer, void *opaque)
>> {
>> printf("time out is comming\n");
>> }
>>
>> void *event_thread(void *opaque)
>> {
>> int i =0;
>> if (virEventRegisterDefaultImpl() < 0)
>> {
>> return NULL;
>> }
>>
>> virConnectPtr conn = virConnectOpen("qemu:///system");
>> if (conn == NULL)
>> {
>> return NULL;
>> }
>>
>> int callback = virConnectDomainEventRegisterAny(conn, NULL,
>> VIR_DOMAIN_EVENT_ID_LIFECYCLE,
>> _DomainStatusCallback, NULL, NULL);
>> int timeout = virEventAddTimeout(1000,_timeOutCallback,NULL,NULL);
>>
>> while (run && virConnectIsAlive(conn) == 1)
>> {
>> virEventRunDefaultImpl();
>> }
>>
>> sleep(5);
>> virConnectDomainEventDeregisterAny(conn, callback);
>> virConnectClose(conn);
>> }
>>
>>
>> int main(int argc, char *argv[])
>> {
>> pthread_t pid;
>> int ret = 0;
>> ret = pthread_create(&pid, NULL, event_thread, NULL);
>> sleep(100);
>> run = 0;
>> sleep(10);
>> }
>>
>> when I don't call remoteClientFreePrivateCallbacks at
>> remoteClientCloseFunc,
>> here is client refs change when I kill the client.
>> (gdb) c
>> Continuing.
>> Hardware watchpoint 6: *(int *) 0x5637edc888f4
>>
>> Old value = 4
>> New value = 5
>> virObjectRef (anyobj=anyobj at entry=0x5637edc888f0) at util/virobject.c:296
>> 296 PROBE(OBJECT_REF, "obj=%p", obj);
>> (gdb) bt
>> #0 virObjectRef (anyobj=anyobj at entry=0x5637edc888f0) at
>> util/virobject.c:296
>> #1 0x00005637ec61d643 in
>> remoteDispatchConnectDomainEventCallbackRegisterAny (server=<optimized
>> out>, msg=<optimized out>,
>> ret=0x7f054c0008e0, args=0x7f054c0008c0, rerr=0x7f057ba6cc90,
>> client=0x5637edc888f0) at remote.c:4418
> Although the line numbers are different with top of tree, this is the
> RegisterAny REF on the client...
>
> [...]
>
> So that says to me we generally run with 3-4 REFs (without any Register
> Event occurring). AFAICT, the following is the list:
>
> 1. Adding to the srv->clients (virNetServerAddClient)
> 2. Adding IO Callback (virNetServerClientRegisterEvent)
> 3. Adding Keep Alive (virNetServerClientInitKeepAlive and
> virKeepAliveStart)
>
> The 4th is a bit trickier, but I believe it's REF'd during the call to
> virNetServerClientDispatchRead...
>
>> Old value = 5
>> New value = 4
>> virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
>> 259 PROBE(OBJECT_UNREF, "obj=%p", obj);
>> Continuing.
>> [Switching to Thread 0x7f058eb1f840 (LWP 3878)]
>> Hardware watchpoint 9: *(int *) 0x5637edc88634
> I'm assuming this comes from the opposite end of the DispatchRead -
> there's no trace here to prove it, but that's not necessarily surprising
> either as it's not easy to find where the Unref would naturally occur
> without an error path occurring....
>
>> Old value = 4
>> New value = 5
>> virObjectRef (anyobj=anyobj at entry=0x5637edc88630) at util/virobject.c:296
>> 296 PROBE(OBJECT_REF, "obj=%p", obj);
>> #0 virObjectRef (anyobj=anyobj at entry=0x5637edc88630) at
>> util/virobject.c:296
>> #1 0x00005637ec65299d in virNetServerClientClose
>> (client=0x5637edc88630) at rpc/virnetserverclient.c:978
> [...]
>
>> Old value = 5
>> New value = 4
>> virObjectUnref (anyobj=anyobj at entry=0x5637edc88630) at util/virobject.c:259
>> 259 PROBE(OBJECT_UNREF, "obj=%p", obj);
>> #0 virObjectUnref (anyobj=anyobj at entry=0x5637edc88630) at
>> util/virobject.c:259
>> #1 0x00005637ec6529bd in virNetServerClientClose
>> (client=0x5637edc88630) at rpc/virnetserverclient.c:982
> This pair of REF/UNREF in virNetServerClientClose when client->keepalive
> is set.
>
> [...]
>
>> Old value = 4
>> New value = 5
>> virObjectRef (anyobj=anyobj at entry=0x5637edc88630) at util/virobject.c:296
>> 296 PROBE(OBJECT_REF, "obj=%p", obj);
>> #0 virObjectRef (anyobj=anyobj at entry=0x5637edc88630) at
>> util/virobject.c:296
>> #1 0x00005637ec6529d1 in virNetServerClientClose
>> (client=0x5637edc88630) at rpc/virnetserverclient.c:987
> [...]
>
>> Old value = 5
>> New value = 4
>> virObjectUnref (anyobj=anyobj at entry=0x5637edc88630) at util/virobject.c:259
>> 259 PROBE(OBJECT_UNREF, "obj=%p", obj);
>> #0 virObjectUnref (anyobj=anyobj at entry=0x5637edc88630) at
>> util/virobject.c:259
>> #1 0x00005637ec6529ee in virNetServerClientClose
>> (client=0x5637edc88630) at rpc/virnetserverclient.c:991
> This pair of REF/UNREF in virNetServerClientClose when
> client->privateDataCloseFunc is set.
>
> [...]
>
>> Old value = 4
>> New value = 3
>> virObjectUnref (anyobj=anyobj at entry=0x5637edc88630) at util/virobject.c:259
>> 259 PROBE(OBJECT_UNREF, "obj=%p", obj);
>> #0 virObjectUnref (anyobj=anyobj at entry=0x5637edc88630) at
>> util/virobject.c:259
>> #1 0x00005637ec650705 in virNetServerProcessClients
>> (srv=0x5637edc6bf10) at rpc/virnetserver.c:873
> Again my line numbers are off a bit, but this would be the for the UNREF
> after VIR_DELETE_ELEMENT when removed from srv->clients
>
> [...]
>
>> Old value = 3
>> New value = 2
>> virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
>> 259 PROBE(OBJECT_UNREF, "obj=%p", obj);
>> #0 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
>> #1 0x00007f058de4199b in virObjectUnref (anyobj=<optimized out>) at
>> util/virobject.c:265
>> #2 0x00007f058de0db34 in virEventPollCleanupTimeouts () at
>> util/vireventpoll.c:543
> [...]
>
> This one is the opposite end of the virObjectRef done for the keep alive
> timer. See virNetServerClientInitKeepAlive and virKeepAliveStart usage
> of virEventAddTimeout.
>
>> Old value = 2
>> New value = 1
>> virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
>> 259 PROBE(OBJECT_UNREF, "obj=%p", obj);
>> #0 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
>> #1 0x00005637ec656bee in virNetSocketEventFree
>> (opaque=opaque at entry=0x5637edc8a5b0) at rpc/virnetsocket.c:2152
>> #2 0x00007f058de0dd34 in virEventPollCleanupHandles () at
>> util/vireventpoll.c:592
> [...]
>
> This one is the opposite end of the virObjectRef done in
> virNetServerClientRegisterEvent prior to virNetSocketAddIOCallback which
> calls virEventAddHandle.
>
> Theoretically the "last" REF...
>
> John
>> Continuing.
>>
>> so the client refs alway have 1 when I register one event callbacks.
>>>> this patch will deRegister all event callbacks when client was close.
>>>> ---
>>> FWIW: This commit message was a bit difficult to follow - I know it's
>>> primarily a language barrier thing - so maybe if you just construct some
>>> function code flow for both initialization/open and destruction/close
>>> it'd help... Paths that work and the path that you think is broken...
>>> That can be done in this section after the "---" and before the changed
>>> API's.
>>>
>>> I assume you've done a lot of research, but the details of that research
>>> can be difficult to just "pick up on". Typically someone will provide
>>> some sort of valgrind output as an example.
>>>
>>> I just have a sense that this could boil down to a path (or two) that
>>> are doing the virObjectRef without the expected virObjectUnref occurring
>>> - something perhaps to do with how @wantClose is handled in the abnormal
>>> termination case. Nothing jumps out at me though even after looking at
>>> the code for a while.
>>>
>>>
>>>> daemon/remote.c | 46 ++++++++++++++++++++++++++++------------------
>>>> 1 file changed, 28 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/daemon/remote.c b/daemon/remote.c
>>>> index cbcb6e8..466b2e7 100644
>>>> --- a/daemon/remote.c
>>>> +++ b/daemon/remote.c
>>>> @@ -1689,23 +1689,10 @@ void
>>>> remoteRelayConnectionClosedEvent(virConnectPtr conn ATTRIBUTE_UNUSED,
>>>> int r
>>>> neventCallbacks = 0; \
>>>> } while (0);
>>>> -/*
>>>> - * You must hold lock for at least the client
>>>> - * We don't free stuff here, merely disconnect the client's
>>>> - * network socket & resources.
>>>> - * We keep the libvirt connection open until any async
>>>> - * jobs have finished, then clean it up elsewhere
>>>> - */
>>>> -void remoteClientFreeFunc(void *data)
>>>> +static void
>>>> +remoteClientFreePrivateCallbacks(struct daemonClientPrivate *priv)
>>>> {
>>>> - struct daemonClientPrivate *priv = data;
>>>> -
>>>> - /* Deregister event delivery callback */
>>>> - if (priv->conn) {
>>>> - virIdentityPtr sysident = virIdentityGetSystem();
>>>> -
>>>> - virIdentitySetCurrent(sysident);
>>>> -
>>>> + if (priv && priv->conn) {
>>> Well each of the callers has already dereferenced @priv so it's a bit
>>> too late to check it now!
>> yes, I also think this check is too late. this placement can only check
>> priv->conn.
>>>> DEREG_CB(priv->conn, priv->domainEventCallbacks,
>>>> priv->ndomainEventCallbacks,
>>>> virConnectDomainEventDeregisterAny, "domain");
>>>> @@ -1730,16 +1717,38 @@ void remoteClientFreeFunc(void *data)
>>>>
>>>> remoteRelayConnectionClosedEvent) < 0)
>>>> VIR_WARN("unexpected close callback event
>>>> deregister failure");
>>>> }
>>>> + }
>>>> +}
>>>> +#undef DEREG_CB
>>>> +
>>> There should be another blank line here.
>> OK, I lost a blank line.
>>>> +/*
>>>> + * You must hold lock for at least the client
>>>> + * We don't free stuff here, merely disconnect the client's
>>>> + * network socket & resources.
>>>> + * We keep the libvirt connection open until any async
>>>> + * jobs have finished, then clean it up elsewhere
>>>> + */
>>>> +void remoteClientFreeFunc(void *data)
>>>> +{
>>>> + struct daemonClientPrivate *priv = data;
>>>> +
>>>> + /* Deregister event delivery callback */
>>>> + if (priv->conn) {
>>>> + virIdentityPtr sysident = virIdentityGetSystem();
>>>> +
>>>> + virIdentitySetCurrent(sysident);
>>>> +
>>>> + remoteClientFreePrivateCallbacks(priv);
>>>> virConnectClose(priv->conn);
>>>> virIdentitySetCurrent(NULL);
>>>> virObjectUnref(sysident);
>>>> - }
>>>> + }
>>>> VIR_FREE(priv);
>>>> +
>>>> }
>>>> -#undef DEREG_CB
>>>> static void remoteClientCloseFunc(virNetServerClientPtr client)
>>>> @@ -1747,6 +1756,7 @@ static void
>>>> remoteClientCloseFunc(virNetServerClientPtr client)
>>>> struct daemonClientPrivate *priv =
>>>> virNetServerClientGetPrivateData(client);
>>>> daemonRemoveAllClientStreams(priv->streams);
>>>> + remoteClientFreePrivateCallbacks(priv);
>>> Perhaps this just gets a "if (priv->conn)" before calling...
>>>
>>> Still if as I assume the missing Unref is found (someday), then
>>> theoretically remoteClientFreeFunc would be called - of course the
>>> nEventCallbacks = 0 will ensure the Dereg's dont' get called again. So
>>> it shoudn't be a problem
>> In my opinion, it is safe to check point is Null. But check everywhere
>> is stupid.
>> so it is hard to determine where need check.:-)
>>> John
>>>
>>>> }
>>>>
>>> .
>>>
>>
> .
>
More information about the libvir-list
mailing list