[libvirt] libvirtd not responding to virsh, results in virsh hanging

Chris Friesen chris.friesen at windriver.com
Fri Mar 17 22:21:02 UTC 2017


Hi,

We've recently run into an issue with libvirt 1.2.17 in the context of an 
OpenStack deployment.

Occasionally after doing live migrations from a compute node with libvirt 1.2.17 
to a compute node with libvirt 2.0.0 we see libvirtd on the 1.2.17 side stop 
responding.  When this happens, if you run a command like "sudo virsh list" then 
it just hangs waiting for a response from libvirtd.

Running "ps -elfT|grep libvirtd" shows many threads waiting on a futex, but two 
threads in poll_schedule_timeout() as part of the poll() syscall.  On a non-hung 
libvirtd I only see one thread in poll_schedule_timeout().

If I kill and restart libvirtd (this took two tries, it didn't actually die the 
first time) then the problem seems to go away.

I just tried attaching gdb to the "hung" libvirtd process and running "thread 
apply all backtrace".  This printed backtraces for the threads, including the 
one that was apparently stuck in poll():

Thread 17 (Thread 0x7f0573fff700 (LWP 186865)):
#0  0x00007f05b59d769d in poll () from /lib64/libc.so.6
#1  0x00007f05b7f01b9a in virNetClientIOEventLoop () from /lib64/libvirt.so.0
#2  0x00007f05b7f0234b in virNetClientSendInternal () from /lib64/libvirt.so.0
#3  0x00007f05b7f036f3 in virNetClientSendWithReply () from /lib64/libvirt.so.0
#4  0x00007f05b7f04eb3 in virNetClientStreamSendPacket () from /lib64/libvirt.so.0
#5  0x00007f05b7ed8db5 in remoteStreamFinish () from /lib64/libvirt.so.0
#6  0x00007f05b7ec7eaa in virStreamFinish () from /lib64/libvirt.so.0
#7  0x00007f059bd9323d in qemuMigrationIOFunc () from 
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#8  0x00007f05b7e09aa2 in virThreadHelper () from /lib64/libvirt.so.0
#9  0x00007f05b5cb4dc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f05b59e1ced in clone () from /lib64/libc.so.6


Interestingly, when I hit "c" to continue in the debugger, I got this:

(gdb) c
Continuing.

Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7f0573fff700 (LWP 186865)]
0x00007f05b5cbb1cd in write () from /lib64/libpthread.so.0
(gdb) c
Continuing.
[Thread 0x7f0573fff700 (LWP 186865) exited]
(gdb) quit
A debugging session is active.

         Inferior 1 [process 37471] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/sbin/libvirtd, process 37471


Now thread 186865 seems to be gone, and libvirtd is no longer hung.

Has anyone seen anything like this before?  Anyone have an idea where to start 
looking?

Thanks,
Chris





More information about the libvir-list mailing list