[libvirt] libvirt tracking of hung/stopped QEMU VMs

Ryan Harper ryanh at us.ibm.com
Wed May 4 01:34:09 UTC 2011


* Ryan Harper <ryanh at us.ibm.com> [2011-05-03 16:57]:
> I've encountered an interesting scenario:
> 
> 1. define a guest via virsh define <xml>
> 2  start this guest via virsh
> 3. one of the disk elements is a multipath device that is currently
>    misconfigured such that any io to the device hangs the calling process
> 4. libvirt times out when attemping to communicate via the monitor to
> the guest (btw, this timeout isn't configurable AFAICT)
> 5. returns an error from create indicating that we failed to create the VM
> 
> At this point:
> 
> 1) libvirt reports that the VM is stopped (and this is true, the qemu
>    process has never been issued the 'cont' command and thus won't ever
>    execute gues tcode)
> 2) the qemu process for this VM is still running (just blocked on IO)
> 
> 3) it is possible that if the process becomes unblocked that the QEMU
> process will be functional again, but won't be started, and the process
> won't be terminated since libvirt isn't tracking this any more, and is
> consuming some amount of resources that are allocated on start up.
> 
> 
> How can we clean up from this failure scenario?  Would it make sense for
> libvirt to send a SIGTERM to a qemu if it failed to create?  In the
> above scenario, this would allow us to reap the process if it ever
> became unblocked.

Looks like I completely missed

src/qemu/qemu_process.c:qemuProcessStop() which does indeed send SIGTERM
and SIGKILL.  

This should be sufficient to clean up in the above case.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh at us.ibm.com



More information about the libvir-list mailing list