[libvirt] libvirt tracking of hung/stopped QEMU VMs

Ryan Harper ryanh at us.ibm.com
Tue May 3 21:50:13 UTC 2011


I've encountered an interesting scenario:

1. define a guest via virsh define <xml>
2  start this guest via virsh
3. one of the disk elements is a multipath device that is currently
   misconfigured such that any io to the device hangs the calling process
4. libvirt times out when attemping to communicate via the monitor to
the guest (btw, this timeout isn't configurable AFAICT)
5. returns an error from create indicating that we failed to create the VM

At this point:

1) libvirt reports that the VM is stopped (and this is true, the qemu
   process has never been issued the 'cont' command and thus won't ever
   execute gues tcode)
2) the qemu process for this VM is still running (just blocked on IO)

3) it is possible that if the process becomes unblocked that the QEMU
process will be functional again, but won't be started, and the process
won't be terminated since libvirt isn't tracking this any more, and is
consuming some amount of resources that are allocated on start up.


How can we clean up from this failure scenario?  Would it make sense for
libvirt to send a SIGTERM to a qemu if it failed to create?  In the
above scenario, this would allow us to reap the process if it ever
became unblocked.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh at us.ibm.com



More information about the libvir-list mailing list