The unix domain socket remains even after the VM is destroyed

Masayoshi Mizuma msys.mizuma at gmail.com
Tue Jan 25 19:18:29 UTC 2022


Hello,

I found an issue that libvirt isn't close an unix domain socket to connect to the qemu monitor
even after the VM is destroyed.
This issue happens since commit 695bdb3841 ("src: ensure GSource background unref happens in correct event loop")
on the system whose glib version is 2.56.
I would appreciate it if you could give any ideas to solve the issue.

The socket is allocated in qemuMonitorOpenUnix(), and used by the monitor->socket
and monitor->watch:

    qemuMonitorOpen
      qemuMonitorOpenUnix
        if ((monfd = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) {

      qemuMonitorOpenInternal
        mon->socket = g_socket_new_from_fd(fd, &gerr);

        qemuMonitorRegister
          mon->watch = g_socket_create_source(mon->socket,

Usually, the socket is closed when the reference counter of the glib object
(mon->socket and mon->watch) gets 0:

    qemuMonitorClose
      qemuMonitorUnregister
        vir_g_source_unref(mon->watch, mon->context);
          g_source_set_callback(idle, virEventGLibSourceUnrefIdle, src, NULL);
            virEventGLibSourceUnrefIdle
              g_source_unref(src); <== unref monitor->watch

      g_object_unref(mon->socket); <== unref monitor->socket

It seems that the callback virEventGLibSourceUnrefIdle() to unref the monitor->watch doesn't
work when qemuMonitorUnregister() is called via qemuProcessStop(), so the socket isn't closed.
I'm not sure why the callback doesn't work at that time. I suspect that the VM is closing
so the main loop of the monitor doesn't work any more.

We can close the socket to add g_socket_close() before unref the mon->socket, however,
it may remain a memory leak because of mon->watch (GSource object), so probably
it isn't a good idea to close the socket.

We can unref the mon->watch to set NULL to the second argument of vir_g_source_unref()
because the default main loop still works at that time, however, I'm not
sure it's an appropriate way to avoid the gobject issue which the commit solves...

I found this issue on the qemu monitor, and probably the qemu agent has the same issue
because the socket procedure is almost the same as the monitor.

I would appreciate it if you could give any ideas to solve this issue.

Following is to observe the callback working:
---
diff --git a/src/util/glibcompat.c b/src/util/glibcompat.c
index eb6dcc0111..b8b1770424 100644
--- a/src/util/glibcompat.c
+++ b/src/util/glibcompat.c
@@ -24,6 +24,9 @@
 
 #include "glibcompat.h"
 
+#include "virlog.h"
+
+VIR_LOG_INIT("util.glibcompat");
 /*
  * Note that because of the GLIB_VERSION_MAX_ALLOWED constant in
  * config-post.h, allowing use of functions from newer GLib via
@@ -244,6 +247,7 @@ virEventGLibSourceUnrefIdle(gpointer data)
     GSource *src = data;
 
     g_source_unref(src);
+    VIR_DEBUG("unrefed: %p", src);
 
     return FALSE;
 }
@@ -257,6 +261,7 @@ void vir_g_source_unref(GSource *src, GMainContext *ctx)
     g_source_attach(idle, ctx);
 
     g_source_unref(idle);
+    VIR_DEBUG("unref registered: %p ctx: %p", src, ctx);
 }
 
 #endif
---

Case the mon->watch (0xffff28008af0) is unreffed correctly
(via qemuMonitorUpdateWatch()):

18:54:15.403+0000: 16845: debug : qemuMonitorEmitResume:1159 : mon=0xffff683ac020
18:54:15.403+0000: 16845: debug : qemuProcessHandleResume:713 : Transitioned guest test-vm into running state, reason 'booted', event detail 0
18:54:15.404+0000: 16845: debug : vir_g_source_unref:264 : unref registered: 0xffff28008af0 ctx: 0xffff780169a0
18:54:15.404+0000: 16845: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"return": {}, "id": "libvirt-10"}]
18:54:15.404+0000: 16845: info : qemuMonitorJSONIOProcessLine:242 : QEMU_MONITOR_RECV_REPLY: mon=0xffff683ac020 reply={"return": {}, "id": "libvirt-10"}
18:54:15.404+0000: 16845: debug : vir_g_source_unref:264 : unref registered: 0xffff2819a260 ctx: 0xffff780169a0
18:54:15.404+0000: 16845: debug : virEventGLibSourceUnrefIdle:250 : unrefed: 0xffff28008af0

Case the mon->watch (0xffff7802bb30) isn't unreffed
(via qemuProcessStop()):

18:54:15.642+0000: 16589: debug : qemuProcessStop:8008 : Shutting down vm=0xaaaad40edec0 name=test-vm id=3 pid=16842, reason=destroyed, asyncJob=none, flags=0x0
18:54:15.642+0000: 16589: debug : qemuDomainLogAppendMessage:6733 : Append log message (vm='test-vm' message='2022-01-25 18:54:15.642+0000: shutting down, reason=destroyed) stdioLogD=1
18:54:15.643+0000: 16589: info : qemuMonitorClose:834 : QEMU_MONITOR_CLOSE: mon=0xffff683ac020
18:54:15.643+0000: 16589: debug : vir_g_source_unref:264 : unref registered: 0xffff7802bb30 ctx: 0xffff780169a0
18:54:15.643+0000: 16845: debug : qemuMonitorJSONIOProcessEvent:209 : handle SHUTDOWN handler=0xffff4ef057c0 data=0xffff28007da0
18:54:15.643+0000: 16845: debug : qemuMonitorEmitShutdown:1132 : mon=0xffff683ac020 guest=2
18:54:15.643+0000: 16845: debug : qemuProcessHandleShutdown:572 : vm=0xaaaad40edec0
18:54:15.643+0000: 16845: debug : qemuProcessHandleShutdown:592 : Transitioned guest test-vm to shutdown state
18:54:15.643+0000: 16845: debug : qemuProcessKill:7921 : vm=0xaaaad40edec0 name=test-vm pid=16842 flags=0x2
18:54:15.643+0000: 16845: debug : qemuMonitorDispose:216 : mon=0xffff683ac020
18:54:15.643+0000: 16589: debug : qemuProcessKill:7921 : vm=0xaaaad40edec0 name=test-vm pid=16842 flags=0x5
18:54:15.643+0000: 16589: debug : qemuDomainCleanupRun:7313 : driver=0xffff281441b0, vm=test-vm
18:54:15.644+0000: 16589: debug : qemuProcessAutoDestroyRemove:8395 : vm=test-vm
18:54:15.645+0000: 16589: debug : qemuRemoveCgroup:1221 : Failed to terminate cgroup for test-vm
18:54:15.647+0000: 16589: debug : qemuDomainObjEndJob:1141 : Stopping job: destroy (async=none vm=0xaaaad40edec0 name=test-vm)

Thanks!
Masa




More information about the libvir-list mailing list