[libvirt] When vm's status file being left over, some persistent but inactive vms will be lost by libvirtd after libvirtd rebooting.

Wangyufei (A) james.wangyufei at huawei.com
Fri Oct 18 03:00:22 UTC 2013


Hello,
  I found a problem that:
  vm's status file may be left over in the path /var/run/libvirt/qemu under some situation, such as host reboot. When vm's status file is left over, some 
persistent but inactive vms will be lost by libvirtd after it is rebooted. And you can do as follows to reproduce the problem:
  1、Create a vm and start it by the commands: virsh define vm-xml and virsh start vm-name.
  2、Stop the libvirtd by the command: service libvirtd stop.
  3、Kill the qemu process related to the vm, and make the vm's status file left over.
  4、Start libvirtd.
  After starting the libvirtd service, we find that the vm has been lost by libvirtd with command"virsh list --all". 
What we expect is that the vm is shown with shutoff status, should we?

The reason for the problem is that:
  During libvirtd startup, it first loads status files of vms under the path /var/run/libvirt/qemu, creates virDomainObj for each vm and adds it to 
driver->domains list.  
  Then it creates a thread to connect related qemu process for each virDomainObj in the domains list.Because the qemu process has been killed, so connecting to 
qemu will be failed. When connecting to qemu failed, connection-thread will do the follows: 
  1、Check if vm->persistent is 1. 
  2、If vm->persistent is not 1, then qemuDomainRemoveInactive() is called to remove the virDomainObj.
  3、Then the following calling sequence will occur:qemuDomainRemoveInactive() -->virDomainObjListRemove()-->virHashRemoveEntry(). Around virHashRemoveEntry(), 
  domlist and dom will be locked and unlocked sequencely.
  The problem of the above steps is that vm->persistent maybe has been set to 1 by libvirtd main-thread when connection-thread calling virHashRemoveEntry() to 
remove the dom. That is a persistent virDomainObj is removed during libvirtd startup.

Two ways can resolve the above problem:
  1、expending the range of locking virDomainObj and virDomainObjList, lock the object of virDomainObj and virDomainObjList in connection-thread before checking vm->persistent.
  2、checking vm->persistent again before calling virHashRemoveEntry().

  Do you think it is a problem described above and which way listed above is more suitable to resolve the problem, or is there any other better idea? Any suggestions?

Best Regards,
-WangYufei






More information about the libvir-list mailing list