The unix domain socket remains even after the VM is destroyed

Martin Kletzander mkletzan at redhat.com
Fri Jan 28 20:56:59 UTC 2022


On Tue, Jan 25, 2022 at 02:18:29PM -0500, Masayoshi Mizuma wrote:
>Hello,
>
>I found an issue that libvirt isn't close an unix domain socket to connect to the qemu monitor
>even after the VM is destroyed.
>This issue happens since commit 695bdb3841 ("src: ensure GSource background unref happens in correct event loop")
>on the system whose glib version is 2.56.
>I would appreciate it if you could give any ideas to solve the issue.
>

My very rough and fast guess is that this happens due to our workaround
while running on a system on which the fix was actually backported.  You
can remove the workaround (which is implemented for glib < 2.64.0), keep
only the new code and see if it reproduces.  I know it's not very
scientific, but we already faced so many workarounds for old buggy glib
that it's tiresome to try and fix all that when it goes away with update
of glib.  If the above does not help, then I would look at what could
actually happening (maybe the unref does not get scheduled because the
event loop it is requested on does not run any more, but that's just a
guess).

>The socket is allocated in qemuMonitorOpenUnix(), and used by the monitor->socket
>and monitor->watch:
>
>    qemuMonitorOpen
>      qemuMonitorOpenUnix
>        if ((monfd = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) {
>
>      qemuMonitorOpenInternal
>        mon->socket = g_socket_new_from_fd(fd, &gerr);
>
>        qemuMonitorRegister
>          mon->watch = g_socket_create_source(mon->socket,
>
>Usually, the socket is closed when the reference counter of the glib object
>(mon->socket and mon->watch) gets 0:
>
>    qemuMonitorClose
>      qemuMonitorUnregister
>        vir_g_source_unref(mon->watch, mon->context);
>          g_source_set_callback(idle, virEventGLibSourceUnrefIdle, src, NULL);
>            virEventGLibSourceUnrefIdle
>              g_source_unref(src); <== unref monitor->watch
>
>      g_object_unref(mon->socket); <== unref monitor->socket
>
>It seems that the callback virEventGLibSourceUnrefIdle() to unref the monitor->watch doesn't
>work when qemuMonitorUnregister() is called via qemuProcessStop(), so the socket isn't closed.
>I'm not sure why the callback doesn't work at that time. I suspect that the VM is closing
>so the main loop of the monitor doesn't work any more.
>
>We can close the socket to add g_socket_close() before unref the mon->socket, however,
>it may remain a memory leak because of mon->watch (GSource object), so probably
>it isn't a good idea to close the socket.
>
>We can unref the mon->watch to set NULL to the second argument of vir_g_source_unref()
>because the default main loop still works at that time, however, I'm not
>sure it's an appropriate way to avoid the gobject issue which the commit solves...
>
>I found this issue on the qemu monitor, and probably the qemu agent has the same issue
>because the socket procedure is almost the same as the monitor.
>
>I would appreciate it if you could give any ideas to solve this issue.
>
>Following is to observe the callback working:
>---
>diff --git a/src/util/glibcompat.c b/src/util/glibcompat.c
>index eb6dcc0111..b8b1770424 100644
>--- a/src/util/glibcompat.c
>+++ b/src/util/glibcompat.c
>@@ -24,6 +24,9 @@
>
> #include "glibcompat.h"
>
>+#include "virlog.h"
>+
>+VIR_LOG_INIT("util.glibcompat");
> /*
>  * Note that because of the GLIB_VERSION_MAX_ALLOWED constant in
>  * config-post.h, allowing use of functions from newer GLib via
>@@ -244,6 +247,7 @@ virEventGLibSourceUnrefIdle(gpointer data)
>     GSource *src = data;
>
>     g_source_unref(src);
>+    VIR_DEBUG("unrefed: %p", src);
>
>     return FALSE;
> }
>@@ -257,6 +261,7 @@ void vir_g_source_unref(GSource *src, GMainContext *ctx)
>     g_source_attach(idle, ctx);
>
>     g_source_unref(idle);
>+    VIR_DEBUG("unref registered: %p ctx: %p", src, ctx);
> }
>
> #endif
>---
>
>Case the mon->watch (0xffff28008af0) is unreffed correctly
>(via qemuMonitorUpdateWatch()):
>
>18:54:15.403+0000: 16845: debug : qemuMonitorEmitResume:1159 : mon=0xffff683ac020
>18:54:15.403+0000: 16845: debug : qemuProcessHandleResume:713 : Transitioned guest test-vm into running state, reason 'booted', event detail 0
>18:54:15.404+0000: 16845: debug : vir_g_source_unref:264 : unref registered: 0xffff28008af0 ctx: 0xffff780169a0
>18:54:15.404+0000: 16845: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"return": {}, "id": "libvirt-10"}]
>18:54:15.404+0000: 16845: info : qemuMonitorJSONIOProcessLine:242 : QEMU_MONITOR_RECV_REPLY: mon=0xffff683ac020 reply={"return": {}, "id": "libvirt-10"}
>18:54:15.404+0000: 16845: debug : vir_g_source_unref:264 : unref registered: 0xffff2819a260 ctx: 0xffff780169a0
>18:54:15.404+0000: 16845: debug : virEventGLibSourceUnrefIdle:250 : unrefed: 0xffff28008af0
>
>Case the mon->watch (0xffff7802bb30) isn't unreffed
>(via qemuProcessStop()):
>
>18:54:15.642+0000: 16589: debug : qemuProcessStop:8008 : Shutting down vm=0xaaaad40edec0 name=test-vm id=3 pid=16842, reason=destroyed, asyncJob=none, flags=0x0
>18:54:15.642+0000: 16589: debug : qemuDomainLogAppendMessage:6733 : Append log message (vm='test-vm' message='2022-01-25 18:54:15.642+0000: shutting down, reason=destroyed) stdioLogD=1
>18:54:15.643+0000: 16589: info : qemuMonitorClose:834 : QEMU_MONITOR_CLOSE: mon=0xffff683ac020
>18:54:15.643+0000: 16589: debug : vir_g_source_unref:264 : unref registered: 0xffff7802bb30 ctx: 0xffff780169a0
>18:54:15.643+0000: 16845: debug : qemuMonitorJSONIOProcessEvent:209 : handle SHUTDOWN handler=0xffff4ef057c0 data=0xffff28007da0
>18:54:15.643+0000: 16845: debug : qemuMonitorEmitShutdown:1132 : mon=0xffff683ac020 guest=2
>18:54:15.643+0000: 16845: debug : qemuProcessHandleShutdown:572 : vm=0xaaaad40edec0
>18:54:15.643+0000: 16845: debug : qemuProcessHandleShutdown:592 : Transitioned guest test-vm to shutdown state
>18:54:15.643+0000: 16845: debug : qemuProcessKill:7921 : vm=0xaaaad40edec0 name=test-vm pid=16842 flags=0x2
>18:54:15.643+0000: 16845: debug : qemuMonitorDispose:216 : mon=0xffff683ac020
>18:54:15.643+0000: 16589: debug : qemuProcessKill:7921 : vm=0xaaaad40edec0 name=test-vm pid=16842 flags=0x5
>18:54:15.643+0000: 16589: debug : qemuDomainCleanupRun:7313 : driver=0xffff281441b0, vm=test-vm
>18:54:15.644+0000: 16589: debug : qemuProcessAutoDestroyRemove:8395 : vm=test-vm
>18:54:15.645+0000: 16589: debug : qemuRemoveCgroup:1221 : Failed to terminate cgroup for test-vm
>18:54:15.647+0000: 16589: debug : qemuDomainObjEndJob:1141 : Stopping job: destroy (async=none vm=0xaaaad40edec0 name=test-vm)
>
>Thanks!
>Masa
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20220128/0db5c9e9/attachment-0001.sig>


More information about the libvir-list mailing list