[libvirt-users] Live migration with non-shared storage leads to corrupted file system

Xinglong Wu cakuba at gmail.com
Sun Nov 25 10:57:19 UTC 2012


Hi,

      We have the following environment for live-migration with
non-shared stroage between two nodes,

      Host OS: RHEL 6.3
      Kernel: 2.6.32-279.el6.x86_64
      Qemu-kvm: 1.2.0
      libvirt: 0.10.1

and use "virsh" to do the job as

virsh -c 'qemu:///system' migrate --live --persistent
--copy-storage-all <guest-name> qemu+ssh://<target-node>/system

The above command itself returns no error, and the migrated domain in
the destination node starts fine. But when I log into the migrated
domain, some commands failed immediately. And if I shutdown the
domain, it won't boot up any more, complaining about the corrupted
file system. Furthermore, I can confirm that the domain before
migration works flawlessly after thorough test.

The log file in /var/log/libvirt/qemu looks fine without any warnings
or errors. And the only error message I can observe is found at
/var/log/libvirt/libvirtd.log

2012-11-25 10:00:55.001+0000: 15398: warning :
qemuDomainObjBeginJobInternal:838 : Cannot start job (query, none) for
domain testVM; current job is (async nested, migration out) owned by
(15397, 15397)
2012-11-25 10:00:55.001+0000: 15398: error :
qemuDomainObjBeginJobInternal:842 : Timed out during operation: cannot
acquire state change lock
2012-11-25 10:00:57.009+0000: 15393: error : virNetSocketReadWire:1184
: End of file while reading data: Input/output error

I also noticed that the raw image file used by the migrated domain has
the different sizes (reported by "du") before and after the migration.
Is there anybody having the similiar experience with live migration on
non-shared storage? It apparently leads to failed migrations in
libvirt but no cirtical errors ever reported.

Brett




More information about the libvirt-users mailing list