[libvirt] Failed to terminate process 1275 with SIGTERM: Device or resource busy
Richard W.M. Jones
rjones at redhat.com
Tue Jan 19 13:39:41 UTC 2016
On Tue, Jan 19, 2016 at 12:31:48PM +0100, Kashyap Chamarthy wrote:
> On Mon, Jan 18, 2016 at 04:19:58PM +0000, Richard W.M. Jones wrote:
> > On Mon, Jan 18, 2016 at 03:33:25PM +0000, Richard W.M. Jones wrote:
> > > I tried another workaround which was to get virt-resize to fsync the
> > > output file before closing the libvirt connection, but that doesn't
> > > work for reasons I don't understand so far - still studying this.
> >
> > I worked out what was happening here -- I'd inserted the fsync at the
> > wrong place in virt-resize. So I have now successfully worked around
> > this for the virt-resize case, however it's still a problem that could
> > manifest itself in other uses of libvirt + qemu + slow devices.
>
> We've seen the "Failed to terminate process 1275 with SIGTERM: Device or
> resource busy" error occur in context of OpenStack as well[1][2].
>
> The behavior is from virDomainDestroy() API (src/libvirt-domain.c):
>
> [...]
> * virDomainDestroy first requests that a guest terminate (e.g.
> * SIGTERM), then waits for it to comply. After a reasonable timeout,
> * if the guest still exists, virDomainDestroy will forcefully
> * terminate the guest (e.g. SIGKILL) if necessary (which may produce
> * undesirable results, for example unflushed disk cache in the
> * guest). To avoid this possibility, it's recommended to instead
> * call virDomainDestroyFlags, sending the
> * VIR_DOMAIN_DESTROY_GRACEFUL flag.
> [...]
>
> Dan Berrange explains[1]:
>
> There are two reasons why you'd get this failure ("Failed to terminate
> process: Device or resource busy") from libvirt.
>
> - The host is so overloaded that the kernel was not able to clean up
> the process in the time that libvirt was prepared to wait. If this
> is the case, the process should eventually go away on its own
> after a short while longer and everything should return to normal
>
> - There is some problem, causing the process to get stuck in an
> uninterruptable wait state. This is usually due to something going
> wrong in the storage stack, causing some I/O read/write operation
> to hang in kernel space. In this case the process will stay around
> in the zombie state forever, or until the storage problem is
> resolved.
Thanks for finding this documentation.
The problem with this theory is we are passing the
VIR_DOMAIN_DESTROY_GRACEFUL flag, so that would indicate that this
flag is buggy.
I think what we need is a test case, so here goes. Note you must run
these steps as *non-root*.
(1) Download the attachment to /var/tmp
(2) chmod +x /var/tmp/qemu.sh
(3) killall libvirtd ;# kills the session libvirtd
(4) LIBGUESTFS_HV=/var/tmp/qemu.sh guestfish -N fs exit -vx
You should see at the end of the output:
libguestfs: calling virDomainDestroy "guestfs-q94hsiz89t8jp418" flags=VIR_DOMAIN_DESTROY_GRACEFUL
[pause of a few seconds]
libguestfs: error: could not destroy libvirt domain: Failed to terminate process 11412 with SIGTERM: Device or resource busy [code=38 domain=0]
If someone else can reproduce this, then I will file a bug.
Rich.
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1205647 --
> nova.virt.libvirt.driver fails to shutdown reboot instance with
> error 'Code=38 Error=Failed to terminate process 4260 with SIGKILL:
> Device or resource busy'
> [2] https://bugs.launchpad.net/nova/+bug/1353939 -- Rescue fails with
> 'Failed to terminate process: Device or resource busy' in the n-cpu
> log
>
> --
> /kashyap
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
-------------- next part --------------
A non-text attachment was scrubbed...
Name: qemu.sh
Type: application/x-sh
Size: 444 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20160119/1e60ce3c/attachment-0001.sh>
More information about the libvir-list
mailing list