[libvirt] [PATCH v5 3/3] libvirtd: fix crash on termination

Nikolay Shirokovskiy nshirokovskiy at virtuozzo.com
Mon Dec 25 07:42:41 UTC 2017

On 22.12.2017 17:13, John Ferlan wrote:
> [...]
>>> Still adding the "virHashRemoveAll(dmn->servers);" into
>>> virNetDaemonClose doesn't help the situation as I can still either crash
>>> randomly or hang, so I'm less convinced this would really fix anything.
>>> It does change the "nature" of the hung thread stack trace though, as
>>> the second thread is now:
>> virHashRemoveAll is not enough now. Due to unref reordeing last ref to @srv is
>> unrefed after virStateCleanup. So we need to virObjectUnref(srv|srvAdm) before
>> virStateCleanup. Or we can call virThreadPoolFree from virNetServerClose (
>> as in the first version of the patch and as Erik suggests) instead
>> of virHashRemoveAll.
> Patches w/
>  1. Long pause before GetAllStats (without using [u]sleep)
>  2. Adjustment to call virNetServerServiceToggle in
> virNetServerServiceClose (instead of virNetServerDispose)
>  3. Call virHashRemoveAll in virNetDaemonClose
>  4. Call virThreadPoolFree in virNetServerClose
>  5. Perform Unref (adminProgram, srvAdm, qemuProgram, lxcProgram,
> remoteProgream, and srv) before virNetDaemonClose
> Still has the virCondWait's - so as Daniel points out there's quite a
> bit more work to be done. Like most Red Hat engineers - I will not be
> very active over the next week or so (until the New Year) as it's a
> holiday break/vacation for us.
> So unless you have the burning desire to put together some patches and
> do the work yourself, more thoughts/work will need to wait.
> John

I've checked what's going on after applying patch you described above
(however it would be enough to apply only 3 (or 4) and part of 5 besides
pause hunk). I get hangs too and this kind of hangs are fixed by 
second series - '[PATCH 0/4] libvirtd: fix hang on termination in qemu driver'.
That is there is a next hang backtrace besides hang in thread
freeing thread pool you already mentioned:

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff7335c58 in virCondWait (c=0x7fffc4000e18, m=0x7fffc4000df0) at util/virthread.c:154
#2  0x00007fffd9605983 in qemuMonitorSend (mon=0x7fffc4000de0, msg=0x7fffe70bd1f0) at qemu/qemu_monitor.c:1067
#3  0x00007fffd961b68f in qemuMonitorJSONCommandWithFd (mon=0x7fffc4000de0, cmd=0x7fffb0005310, scm_fd=-1, 
    reply=0x7fffe70bd2d0) at qemu/qemu_monitor_json.c:300
#4  0x00007fffd961b7c1 in qemuMonitorJSONCommand (mon=0x7fffc4000de0, cmd=0x7fffb0005310, reply=0x7fffe70bd2d0)
    at qemu/qemu_monitor_json.c:330
#5  0x00007fffd9629f0b in qemuMonitorJSONGetObjectListPaths (mon=0x7fffc4000de0, 
    path=0x7fffd96a7c96 "/machine/peripheral", paths=0x7fffe70bd380) at qemu/qemu_monitor_json.c:5715
#6  0x00007fffd962dcc4 in qemuMonitorJSONFindObjectPathByAlias (mon=0x7fffc4000de0, 
    name=0x7fffd969f3cd "virtio-balloon-pci", alias=0x7fffcc1e8d30 "balloon0", path=0x7fffe70bd450)
    at qemu/qemu_monitor_json.c:7235
#7  0x00007fffd962e231 in qemuMonitorJSONFindLinkPath (mon=0x7fffc4000de0, name=0x7fffd969f3cd "virtio-balloon-pci", 
    alias=0x7fffcc1e8d30 "balloon0", path=0x7fffe70bd450) at qemu/qemu_monitor_json.c:7349
#8  0x00007fffd9605bf7 in qemuMonitorInitBalloonObjectPath (mon=0x7fffc4000de0, balloon=0x7fffcc1e8e60)
    at qemu/qemu_monitor.c:1157
#9  0x00007fffd96098d3 in qemuMonitorGetMemoryStats (mon=0x7fffc4000de0, balloon=0x7fffcc1e8e60, 
    stats=0x7fffe70bd5b0, nr_stats=10) at qemu/qemu_monitor.c:2133
#10 0x00007fffd964e70c in qemuDomainMemoryStatsInternal (driver=0x7fffcc1872a0, vm=0x7fffcc2737e0, 
    stats=0x7fffe70bd5b0, nr_stats=10) at qemu/qemu_driver.c:11453
#11 0x00007fffd9667013 in qemuDomainGetStatsBalloon (driver=0x7fffcc1872a0, dom=0x7fffcc2737e0, 
    record=0x7fffb00008c0, maxparams=0x7fffe70bd6b0, privflags=1) at qemu/qemu_driver.c:19478
#12 0x00007fffd9669597 in qemuDomainGetStats (conn=0x7fffb80030e0, dom=0x7fffcc2737e0, stats=127, 
    record=0x7fffe70bd790, flags=1) at qemu/qemu_driver.c:20133
#13 0x00007fffd966997f in qemuConnectGetAllDomainStats (conn=0x7fffb80030e0, doms=0x7fffb0005220, ndoms=1, 
    stats=127, retStats=0x7fffe70bd8e0, flags=0) at qemu/qemu_driver.c:20226
#14 0x00007ffff7424fd7 in virDomainListGetStats (doms=0x7fffb0005220, stats=0, retStats=0x7fffe70bd8e0, flags=0)
    at libvirt-domain.c:11595
#15 0x00005555555ac030 in remoteDispatchConnectGetAllDomainStats (server=0x55555612a3a0, client=0x555556151d10, 
    msg=0x555556152540, rerr=0x7fffe70bda20, args=0x7fffb00036e0, ret=0x7fffb0002d20) at remote.c:6538

I'm writing this not to involve you back into the work and do not expect a reply. It is holydays)
Only to document my research.


