Squelch 'eof from qemu monitor' error on normal VM shutdown

Jim Fehlig jfehlig at suse.com
Fri Oct 8 21:42:35 UTC 2021


On 10/7/21 20:45, Jim Fehlig wrote:
> On 10/4/21 08:55, Michal Prívozník wrote:
>> On 9/30/21 7:15 PM, Jim Fehlig wrote:
>>> On 9/29/21 21:29, Jim Fehlig wrote:
>>>> Hi All,
>>>>
>>>> Likely Christian received a bug report that motivated commit
>>>> aeda1b8c56, which was later reverted by Michal with commit 72adaf2f10.
>>>> In the past, I recall being asked about "internal error: End of file
>>>> from qemu monitor" on normal VM shutdown and gave a hand wavy response
>>>> using some of Michal's words from the revert commit message.
>>>>
>>>> I recently received a bug report (sorry, but no public link) from a
>>>> concerned user about this error and wondered if there is some way to
>>>> improve it? I went down some dead ends before circling back to
>>>> Christian's patch. When rebased to latest master, I cannot reproduce
>>>> the hangs reported by Michal [1]. Perhaps Nikolay's series to resolve
>>>> hangs/crashes of libvirtd [2] has now made Christian's patch viable?
>>>
>>> Hmm, Nikolay's series improves thread management at daemon shutdown and
>>> doesn't touch VM shutdown logic. But there has been some behavior change
>>> from the time aeda1b8c56 was committed (3.4.0 dev cycle) to current git
>>> master. At least I don't see libvirtd hanging when running Michal's test
>>> on master + rebased aeda1b8c56.
> 
> After reworking the tests a bit, I still don't see libvirtd hanging, but I do 
> see VM shutdown stuck in "in shutdown" state. Attaching gdb shows the following 
> thread blocked waiting for a response from the monitor that will never come 
> since EOF has already occurred on the socket
> 
> Thread 21 (Thread 0x7fdb557fa700 (LWP 9110) "rpc-worker"):
> #0  0x00007fdbc922a4dc in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
> #1  0x00007fdbccee2310 in virCondWait (c=0x7fdba403cd40, m=0x7fdba403cd18) at 
> ../src/util/virthread.c:156
> #2  0x00007fdb87150a8b in qemuMonitorSend (mon=0x7fdba403cd00, 
> msg=0x7fdb557f95b0) at ../src/qemu/qemu_monito
> r.c:964
> #3  0x00007fdb8715fbf1 in qemuMonitorJSONCommandWithFd (mon=0x7fdba403cd00, 
> cmd=0x7fdb4015bae0, scm_fd=-1, re
> ply=0x7fdb557f9678) at ../src/qemu/qemu_monitor_json.c:327
> #4  0x00007fdb8715fda0 in qemuMonitorJSONCommand (mon=0x7fdba403cd00, 
> cmd=0x7fdb4015bae0, reply=0x7fdb557f967
> 8) at ../src/qemu/qemu_monitor_json.c:352
> #5  0x00007fdb87174b71 in qemuMonitorJSONGetIOThreads (mon=0x7fdba403cd00, 
> iothreads=0x7fdb557f9798, niothrea
> ds=0x7fdb557f9790) at ../src/qemu/qemu_monitor_json.c:7838
> #6  0x00007fdb8715d059 in qemuMonitorGetIOThreads (mon=0x7fdba403cd00, 
> iothreads=0x7fdb557f9798, niothreads=0
> x7fdb557f9790) at ../src/qemu/qemu_monitor.c:4083
> #7  0x00007fdb870e8ae3 in qemuDomainGetIOThreadsMon (driver=0x7fdb6c06a4f0, 
> vm=0x7fdb6c063940, iothreads=0x7f
> db557f9798, niothreads=0x7fdb557f9790) at ../src/qemu/qemu_driver.c:4941
> #8  0x00007fdb871129bf in qemuDomainGetStatsIOThread (driver=0x7fdb6c06a4f0, 
> dom=0x7fdb6c063940, params=0x7fd
> b401c0cc0, privflags=1) at ../src/qemu/qemu_driver.c:18292
> #9  0x00007fdb871130ee in qemuDomainGetStats (conn=0x7fdb9c006760, 
> dom=0x7fdb6c063940, stats=1023, record=0x7
> fdb557f98d0, flags=1) at ../src/qemu/qemu_driver.c:18504
> #10 0x00007fdb87113526 in qemuConnectGetAllDomainStats (conn=0x7fdb9c006760, 
> doms=0x0, ndoms=0, stats=1023, r
> etStats=0x7fdb557f9990, flags=0) at ../src/qemu/qemu_driver.c:18598
> #11 0x00007fdbcd163e4e in virConnectGetAllDomainStats (conn=0x7fdb9c006760, 
> stats=0, retStats=0x7fdb557f9990,
>   flags=0) at ../src/libvirt-domain.c:11975
> ...
> 
> So indeed, reporting the error when processing monitor IO is needed to prevent 
> other threads from subsequently writing to the socket. One idea I had was to 
> postpone reporting the error until someone tries to write to the socket, 
> although not reporting an error when it is encountered seems distasteful. I've 
> been testing such a hack (attached) and along with squelching the error message, 
> I no longer see VMs stuck in the "in shutdown" state after 32 iterations of the 
> test. A simple rebase of aeda1b8c56 on current git master never survived more 
> than a dozen iterations. I'll let the test continue to run.

The test has now survived 1134 iterations. I adjusted the patch to be more 
upstream friendly and submitted it as an RFC

https://listman.redhat.com/archives/libvir-list/2021-October/msg00484.html

I restarted the tests with this version of the patch and now at iteration 75.

Regards,
Jim





More information about the libvir-list mailing list