[libvirt] [PATCH 03/14] Fix crash when deleting monitor while a command is in progress

Matthias Bolte matthias.bolte at googlemail.com
Wed Dec 2 23:47:02 UTC 2009


2009/11/26 Daniel P. Berrange <berrange at redhat.com>:
> If QEMU shuts down while we're in the middle of processing a
> monitor command, the monitor will be freed, and upon cleaning
> up we attempt to do  qemuMonitorUnlock(priv->mon) when priv->mon
> is NULL.
>
> To address this we introduce proper reference counting into
> the qemuMonitorPtr object, and hold an extra reference whenever
> executing a command.
>
> * src/qemu/qemu_driver.c: Hold a reference on the monitor while
>  executing commands, and only NULL-ify the priv->mon field when
>  the last reference is released
> * src/qemu/qemu_monitor.h, src/qemu/qemu_monitor.c: Add reference
>  counting to handle safe deletion of monitor objects

The locking pattern below results in destroying a locked mutex. It
this intended?

qemuMonitorLock(mon);
[...]
if (qemuMonitorUnref(mon) > 0)
    qemuMonitorUnlock(mon);

Well, this patch makes the TCK deadlock for me, seems to be a lock
ordering issue combined with a race condition; it doesn't happen every
run. I don't understand all details of the locking and refcounting
scheme of the QEMU monitor yet, it's quite complex and gets even more
complex.

I attached some GDB and Valgrind traces.

Debugging is hindered by libvirtd blocking on poll() in
virEventRunOnce() often and I haven't found out why yet.

Matthias
-------------- next part --------------
==8990== Thread #2: lock order "0x9AB9030 before 0x9ABAA80" violated
==8990==    at 0x4C27ADC: pthread_mutex_lock (hg_intercepts.c:464)
==8990==    by 0x432079: virMutexLock (threads-pthread.c:52)
==8990==    by 0x529CABE: virDomainObjLock (domain_conf.c:5344)
==8990==    by 0x45CCB8: qemuMonitorIO (qemu_monitor.c:440)
==8990==    by 0x4132AE: virEventDispatchHandles (event.c:473)
==8990==    by 0x4138B6: virEventRunOnce (event.c:601)
==8990==    by 0x4188F6: qemudOneLoop (libvirtd.c:2165)
==8990==    by 0x418E25: qemudRunLoop (libvirtd.c:2274)
==8990==    by 0x4C2B528: mythread_wrapper (hg_intercepts.c:201)
==8990==    by 0x81C63B9: start_thread (in /lib/libpthread-2.9.so)
==8990==    by 0x84C0FCC: clone (in /lib/libc-2.9.so)
==8990==   Required order was established by acquisition of lock at 0x9AB9030
==8990==    at 0x4C27ADC: pthread_mutex_lock (hg_intercepts.c:464)
==8990==    by 0x432079: virMutexLock (threads-pthread.c:52)
==8990==    by 0x529CABE: virDomainObjLock (domain_conf.c:5344)
==8990==    by 0x43AA71: qemuReconnectDomain (qemu_driver.c:684)
==8990==    by 0x52791F8: virHashForEach (hash.c:495)
==8990==    by 0x43ABD0: qemuReconnectDomains (qemu_driver.c:728)
==8990==    by 0x43B8BC: qemudStartup (qemu_driver.c:987)
==8990==    by 0x52B280F: virStateInitialize (libvirt.c:829)
==8990==    by 0x41BA2B: main (libvirtd.c:3154)
==8990==   followed by a later acquisition of lock at 0x9ABAA80
==8990==    at 0x4C27ADC: pthread_mutex_lock (hg_intercepts.c:464)
==8990==    by 0x432079: virMutexLock (threads-pthread.c:52)
==8990==    by 0x45BE28: qemuMonitorLock (qemu_monitor.c:82)
==8990==    by 0x45CE64: qemuMonitorOpen (qemu_monitor.c:475)
==8990==    by 0x43A9B0: qemuConnectMonitor (qemu_driver.c:663)
==8990==    by 0x43AAC2: qemuReconnectDomain (qemu_driver.c:689)
==8990==    by 0x52791F8: virHashForEach (hash.c:495)
==8990==    by 0x43ABD0: qemuReconnectDomains (qemu_driver.c:728)
==8990==    by 0x43B8BC: qemudStartup (qemu_driver.c:987)
==8990==    by 0x52B280F: virStateInitialize (libvirt.c:829)
==8990==    by 0x41BA2B: main (libvirtd.c:3154)
-------------- next part --------------
(gdb) thread apply all bt

Thread 7 (Thread 0x7f2b0827d950 (LWP 8179)):
#0  0x00007f2b0dd1b2e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x0000000000432135 in virCondWait (c=0x1ba5fc8, m=0x1ba5fa0) at util/threads-pthread.c:84
#2  0x00000000004171ea in qemudWorker (data=0x1baa320) at libvirtd.c:1496
#3  0x00007f2b0dd173ba in start_thread () from /lib/libpthread.so.0
#4  0x00007f2b0da83fcd in clone () from /lib/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f2b08a7e950 (LWP 8176)):
#0  0x00007f2b0dd1b2e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x0000000000432135 in virCondWait (c=0x1ba5fc8, m=0x1ba5fa0) at util/threads-pthread.c:84
#2  0x00000000004171ea in qemudWorker (data=0x1baa308) at libvirtd.c:1496
#3  0x00007f2b0dd173ba in start_thread () from /lib/libpthread.so.0
#4  0x00007f2b0da83fcd in clone () from /lib/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f2b0927f950 (LWP 8175)):
#0  0x00007f2b0dd1b2e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x0000000000432135 in virCondWait (c=0x1ba5fc8, m=0x1ba5fa0) at util/threads-pthread.c:84
#2  0x00000000004171ea in qemudWorker (data=0x1baa2f0) at libvirtd.c:1496
#3  0x00007f2b0dd173ba in start_thread () from /lib/libpthread.so.0
#4  0x00007f2b0da83fcd in clone () from /lib/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f2b09a80950 (LWP 8174)):
#0  0x00007f2b0dd1da94 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007f2b0dd19190 in _L_lock_102 () from /lib/libpthread.so.0
#2  0x00007f2b0dd18a7e in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x000000000043207a in virMutexLock (m=0x1bc3960) at util/threads-pthread.c:52
#4  0x000000000045be29 in qemuMonitorLock (mon=0x1bc3960) at qemu/qemu_monitor.c:82
#5  0x000000000045d168 in qemuMonitorClose (mon=0x1bc3960) at qemu/qemu_monitor.c:541
#6  0x000000000043f59d in qemudShutdownVMDaemon (conn=0x1bb18f0, driver=0x1baa9e0, vm=0x1bc1500) at qemu/qemu_driver.c:2410
#7  0x000000000044103c in qemudDomainDestroy (dom=0x1bb5f00) at qemu/qemu_driver.c:3097
#8  0x00007f2b10b7fbce in virDomainDestroy (domain=0x1bb5f00) at libvirt.c:1978
#9  0x000000000041d708 in remoteDispatchDomainDestroy (server=0x1ba5fa0, client=0x1bb16e0, conn=0x1bb18f0, hdr=0x1c03fb0, 
    rerr=0x7f2b09a7fe30, args=0x7f2b09a7fed0, ret=0x7f2b09a7ff20) at remote.c:925
#10 0x000000000042619f in remoteDispatchClientCall (server=0x1ba5fa0, client=0x1bb16e0, msg=0x1bc3fa0) at dispatch.c:506
#11 0x0000000000425d74 in remoteDispatchClientRequest (server=0x1ba5fa0, client=0x1bb16e0, msg=0x1bc3fa0) at dispatch.c:388
---Type <return> to continue, or q <return> to quit---
#12 0x00000000004172af in qemudWorker (data=0x1baa2d8) at libvirtd.c:1518
#13 0x00007f2b0dd173ba in start_thread () from /lib/libpthread.so.0
#14 0x00007f2b0da83fcd in clone () from /lib/libc.so.6
#15 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f2b0a281950 (LWP 8173)):
#0  0x00007f2b0dd1b2e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x0000000000432135 in virCondWait (c=0x1ba5fc8, m=0x1ba5fa0) at util/threads-pthread.c:84
#2  0x00000000004171ea in qemudWorker (data=0x1baa2c0) at libvirtd.c:1496
#3  0x00007f2b0dd173ba in start_thread () from /lib/libpthread.so.0
#4  0x00007f2b0da83fcd in clone () from /lib/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f2b0aa82950 (LWP 8172)):
#0  0x00007f2b0dd1da94 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007f2b0dd19190 in _L_lock_102 () from /lib/libpthread.so.0
#2  0x00007f2b0dd18a7e in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x000000000043207a in virMutexLock (m=0x1bc1500) at util/threads-pthread.c:52
#4  0x00007f2b10b67abf in virDomainObjLock (obj=0x1bc1500) at conf/domain_conf.c:5344
#5  0x000000000045ccb9 in qemuMonitorIO (watch=9, fd=18, events=0, opaque=0x1bc3960) at qemu/qemu_monitor.c:440
#6  0x00000000004132af in virEventDispatchHandles (nfds=7, fds=0x1bb1bf0) at event.c:473
#7  0x00000000004138b7 in virEventRunOnce () at event.c:601
#8  0x00000000004188f7 in qemudOneLoop () at libvirtd.c:2165
#9  0x0000000000418e26 in qemudRunLoop (opaque=0x1ba5fa0) at libvirtd.c:2274
#10 0x00007f2b0dd173ba in start_thread () from /lib/libpthread.so.0
#11 0x00007f2b0da83fcd in clone () from /lib/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f2b114a2780 (LWP 8169)):
#0  0x00007f2b0dd17c95 in pthread_join () from /lib/libpthread.so.0
#1  0x000000000041bb39 in main (argc=1, argv=0x7fff194db738) at libvirtd.c:3183
(gdb) 
-------------- next part --------------
(gdb) thread apply all bt

Thread 7 (Thread 0x7f1081ba7950 (LWP 26832)):
#0  0x00007f10876452e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x0000000000432135 in virCondWait (c=0x162cfc8, m=0x162cfa0) at util/threads-pthread.c:84
#2  0x00000000004171ea in qemudWorker (data=0x1631320) at libvirtd.c:1496
#3  0x00007f10876413ba in start_thread () from /lib/libpthread.so.0
#4  0x00007f10873adfcd in clone () from /lib/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f10823a8950 (LWP 26831)):
#0  0x00007f10876452e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x0000000000432135 in virCondWait (c=0x162cfc8, m=0x162cfa0) at util/threads-pthread.c:84
#2  0x00000000004171ea in qemudWorker (data=0x1631308) at libvirtd.c:1496
#3  0x00007f10876413ba in start_thread () from /lib/libpthread.so.0
#4  0x00007f10873adfcd in clone () from /lib/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f1082ba9950 (LWP 26830)):
#0  0x00007f10876452e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x0000000000432135 in virCondWait (c=0x162cfc8, m=0x162cfa0) at util/threads-pthread.c:84
#2  0x00000000004171ea in qemudWorker (data=0x16312f0) at libvirtd.c:1496
#3  0x00007f10876413ba in start_thread () from /lib/libpthread.so.0
#4  0x00007f10873adfcd in clone () from /lib/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f10833aa950 (LWP 26829)):
#0  0x00007f1087647a94 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007f1087643190 in _L_lock_102 () from /lib/libpthread.so.0
#2  0x00007f1087642a7e in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x000000000043207a in virMutexLock (m=0x164a0f0) at util/threads-pthread.c:52
#4  0x000000000045be29 in qemuMonitorLock (mon=0x164a0f0) at qemu/qemu_monitor.c:82
#5  0x0000000000439a24 in qemuDomainObjEnterMonitorWithDriver (driver=0x1631720, obj=0x1648da0) at qemu/qemu_driver.c:309
#6  0x000000000043c8d0 in qemudInitCpus (conn=0x1638530, driver=0x1631720, vm=0x1648da0, migrateFrom=0x0) at qemu/qemu_driver.c:1427
#7  0x000000000043f14e in qemudStartVMDaemon (conn=0x1638530, driver=0x1631720, vm=0x1648da0, migrateFrom=0x0, stdin_fd=-1)
    at qemu/qemu_driver.c:2327
#8  0x00000000004449ad in qemudDomainStart (dom=0x1648a80) at qemu/qemu_driver.c:4384
#9  0x00007f108a4ae2b3 in virDomainCreate (domain=0x1648a80) at libvirt.c:4509
#10 0x000000000041d567 in remoteDispatchDomainCreate (server=0x162cfa0, client=0x1638690, conn=0x1638530, hdr=0x168b150, 
    rerr=0x7f10833a9e30, args=0x7f10833a9ed0, ret=0x7f10833a9f20) at remote.c:853
---Type <return> to continue, or q <return> to quit---
#11 0x000000000042619f in remoteDispatchClientCall (server=0x162cfa0, client=0x1638690, msg=0x164b140) at dispatch.c:506
#12 0x0000000000425d74 in remoteDispatchClientRequest (server=0x162cfa0, client=0x1638690, msg=0x164b140) at dispatch.c:388
#13 0x00000000004172af in qemudWorker (data=0x16312d8) at libvirtd.c:1518
#14 0x00007f10876413ba in start_thread () from /lib/libpthread.so.0
#15 0x00007f10873adfcd in clone () from /lib/libc.so.6
#16 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f1083bab950 (LWP 26828)):
#0  0x00007f10876452e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x0000000000432135 in virCondWait (c=0x162cfc8, m=0x162cfa0) at util/threads-pthread.c:84
#2  0x00000000004171ea in qemudWorker (data=0x16312c0) at libvirtd.c:1496
#3  0x00007f10876413ba in start_thread () from /lib/libpthread.so.0
#4  0x00007f10873adfcd in clone () from /lib/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f10843ac950 (LWP 26826)):
#0  0x00007f1087647a94 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007f1087643190 in _L_lock_102 () from /lib/libpthread.so.0
#2  0x00007f1087642a7e in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x000000000043207a in virMutexLock (m=0x1648da0) at util/threads-pthread.c:52
#4  0x00007f108a491abf in virDomainObjLock (obj=0x1648da0) at conf/domain_conf.c:5344
#5  0x000000000045ccb9 in qemuMonitorIO (watch=8, fd=16, events=0, opaque=0x164a0f0) at qemu/qemu_monitor.c:440
#6  0x00000000004132af in virEventDispatchHandles (nfds=6, fds=0x7f107c0008f0) at event.c:473
#7  0x00000000004138b7 in virEventRunOnce () at event.c:601
#8  0x00000000004188f7 in qemudOneLoop () at libvirtd.c:2165
#9  0x0000000000418e26 in qemudRunLoop (opaque=0x162cfa0) at libvirtd.c:2274
#10 0x00007f10876413ba in start_thread () from /lib/libpthread.so.0
#11 0x00007f10873adfcd in clone () from /lib/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f108adcc780 (LWP 26823)):
#0  0x00007f1087641c95 in pthread_join () from /lib/libpthread.so.0
#1  0x000000000041bb39 in main (argc=1, argv=0x7fff92e04068) at libvirtd.c:3183


More information about the libvir-list mailing list