[libvirt-users] Migration hangs on Gentoo with KVM

Eric Blake eblake at redhat.com
Wed Aug 17 20:45:13 UTC 2011


On 08/17/2011 02:30 PM, Jonathan Stoppani wrote:
>>>>> Thanks for the prompt answer Eric! Yes, nc has a q option:
>>>>>
>>>>> -q, --hold-timeout=SEC1[:SEC2]   Set hold timeout(s) for local [and remote]
>>>>

We still haven't incorporated patches to autodetect nc usage on the 
remote side (some have been proposed by Guido, but there were some 
additional issues to address first).  Hopefully by 0.9.5...

Until that is fixed, then it very well could be that you are deadlocking 
the libvirtd handling of the remote connection due to nc holding the 
connection open too long, explaining while all further attempts to do 
something with the domain are getting stuck waiting for the nc 
connection to resolve.

>> Tested using qemu+tcp and it hangs the same. If I interrupt the migration (^C), the domain is correctly destroyed on the destination but left in the paused state on the source. If I try to start it manually, I obtain this error:
>>
>> # virsh resume 1
>> error: Failed to resume domain 1
>> error: Timed out during operation: cannot acquire state change lock

This is the internal mutex lock used for serializing access to libvirt 
internal structures, such as when coordinating with a remote server 
(which coordination involves the use of nc).  When you get this message, 
about the only thing you can do is restart libvirtd.  Which version of 
libvirt were you testing?  0.9.4 adds quite a few improvements on being 
able to gracefully recover from failed migrations.

>>
>> Any insights?
>
> Can someone shed some light on the libvirt locking possibilities? It seems to me that sanlock is not supported on gentoo (and libvirt is compiled using --without-sanlock); could this be the cause of the problem?

Completely unrelated.  sanlock is a program for controlling access to 
shared file storage, and has nothing to do with the internal mutex lock 
failure message you quoted above.

> Is there some way to explicitly set the locking mechanism to a noop in the libvirt configuration?

You are confusing two terms; using the sanlock or no-op disk manager has 
nothing to do with libvirtd getting confused and deadlocking on internal 
data structures.  If you built --without-sanlock, then you are already 
using the no-op disk manager; but if sanlock is compiled in, you control 
whether to use it by modifying /etc/libvirt/qemu.conf.  But making a 
configuration change there won't affect the problem you actually saw above.

-- 
Eric Blake   eblake at redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org




More information about the libvirt-users mailing list