[libvirt] [BUG] libvirtd on destination crash frequently while migrating vms concurrently

Michal Privoznik mprivozn at redhat.com
Fri Oct 11 07:31:50 UTC 2013


On 27.09.2013 09:55, Wangyufei (A) wrote:
> Hello,
> I found a problem that libvirtd on destination crash frequently while
> migrating vms concurrently. For example, if I migrate 10 vms
> concurrently ceaselessly, then after about 30 minutes the libvirtd on
> destination will crash. So I analyzed and found two bugs during
> migration process.
> First, during migration prepare phase on destination, libvirtd assigns
> ports to qemu to be startd on destination. But the port increase
> operation is not aomic, so there’s a chance that multi vms get the same
> port, and only the first one can start successfully, others will fail to
> start. I’ve applied a patch to solve this bug, and I test it, it works
> well. If only this bug exists, libvirtd will not crash. The second bug
> is fatal.
> Second, I found the libvirtd crash because of segment fault which is
> produced by accessing vm released. Apparently it’s caused by
> multi-thread operation, thread A access vm data which has released by
> thread B. At last I proved my thought right.
>  
> Step 1. Because of bug one, the port is already occupied, so qemu on
> destination failed to start and sent a HANGUP signal to libvirtd, then
> libvirtd received this VIR_EVENT_HANDLE_HANGUP event, thread A dealing
> with events called qemuProcessHandleMonitorEOF as following:
>  
> #0  qemuProcessHandleMonitorEOF (mon=0x7f4dcd9c3130, vm=0x7f4dcd9c9780)
>     at qemu/qemu_process.c:399
> #1  0x00007f4dc18d9e87 in qemuMonitorIO (watch=68, fd=27, events=8,
>     opaque=0x7f4dcd9c3130) at qemu/qemu_monitor.c:668
> #2  0x00007f4dccae6604 in virEventPollDispatchHandles (nfds=18,
>     fds=0x7f4db4017e70) at util/vireventpoll.c:500
> #3  0x00007f4dccae7ff2 in virEventPollRunOnce () at util/vireventpoll.c:646
> #4  0x00007f4dccae60e4 in virEventRunDefaultImpl () at util/virevent.c:273
> #5  0x00007f4dccc40b25 in virNetServerRun (srv=0x7f4dcd8d26b0)
>     at rpc/virnetserver.c:1106
> #6  0x00007f4dcd6164c9 in main (argc=3, argv=0x7fff8d8f9f88)
>     at libvirtd.c:1518
> 

In fact I saw the very same issue and I proposed a patch:

https://www.redhat.com/archives/libvir-list/2013-October/msg00347.html

I got ACKed however, prior pushing it I've done some testing and it
seems that under heavy load it doesn't play nice (qemuhotplug test is
getting NULL monitor, ouch.) But if you can apply the patch and see if
it fixes your problem that would be helpful - at least knowing I'm going
the right way.

Michal




More information about the libvir-list mailing list