[libvirt] [PATCH] fix deadlock when qemu cannot start
Serge Hallyn
serge.hallyn at canonical.com
Fri Mar 30 13:45:35 UTC 2012
Quoting Daniel Veillard (veillard at redhat.com):
> On Fri, Mar 16, 2012 at 02:37:42PM +0800, Wen Congyang wrote:
> > When qemu cannot start, we may call qemuProcessStop() twice.
> > We have check whether the vm is running at the beginning of
> > qemuProcessStop() to avoid libvirt deadlock. We call
> > qemuProcessStop() with driver and vm locked. It seems that
> > we can avoid libvirt deadlock. But unfortunately we may
> > unlock driver and vm in the function qemuProcessKill() while
> > vm->def->id is not -1. So qemuProcessStop() will be run twice,
> > and monitor will be freed unexpectedly. So we should set
> > vm->def->id to -1 at the beginning of qemuProcessStop().
Oh, uh, Huh. This seems like it could be responsible for what I was
reporting a few days ago (*1). I spent most of yesterday trying to
track it down, only to finally realize that the QemuProcessStart would
silently die at various places. So i was getting ready to send an email
postulating that what's happening is that a virsh list starts, then
a virsh start starts, and when the virsh list ends it somehow causes
the virsh start to be killed.
I'll test and see if this patch fixes it.
thanks,
-serge
*1: If I create 4 vms and do
for i in `seq 1 4`; do virsh start o$i > /tmp/o$i 2>&1 &; done; for i in `seq 1 4`; do virsh list > /dev/null 2>&1; done; sleep 1; virsh list
most of the time at least one vm won't start.
> > ---
> > src/qemu/qemu_process.c | 20 ++++++++++++++------
> > src/qemu/qemu_process.h | 2 ++
> > 2 files changed, 16 insertions(+), 6 deletions(-)
> >
> > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> > index 0af3751..44814df 100644
> > --- a/src/qemu/qemu_process.c
> > +++ b/src/qemu/qemu_process.c
> > @@ -3633,10 +3633,11 @@ qemuProcessKill(struct qemud_driver *driver,
> > VIR_DEBUG("vm=%s pid=%d flags=%x",
> > vm->def->name, vm->pid, flags);
> >
> > - if (!virDomainObjIsActive(vm)) {
> > - VIR_DEBUG("VM '%s' not active", vm->def->name);
> > - return 0;
> > - }
> > + if (!(flags & VIR_QEMU_PROCESS_KILL_NOCHECK))
> > + if (!virDomainObjIsActive(vm)) {
> > + VIR_DEBUG("VM '%s' not active", vm->def->name);
> > + return 0;
> > + }
> >
> > /* This loop sends SIGTERM (or SIGKILL if flags has
> > * VIR_QEMU_PROCESS_KILL_FORCE and VIR_QEMU_PROCESS_KILL_NOWAIT),
> > @@ -3739,6 +3740,13 @@ void qemuProcessStop(struct qemud_driver *driver,
> > return;
> > }
> >
> > + /*
> > + * We may unlock driver and vm in qemuProcessKill(), so the other thread
> > + * can lock driver and vm, and then call qemuProcessStop(). So we should
> > + * set vm->def->id to -1 here to avoid qemuProcessStop() called twice.
> > + */
> > + vm->def->id = -1;
> > +
> > if ((logfile = qemuDomainCreateLog(driver, vm, true)) < 0) {
> > /* To not break the normal domain shutdown process, skip the
> > * timestamp log writing if failed on opening log file. */
> > @@ -3801,7 +3809,8 @@ void qemuProcessStop(struct qemud_driver *driver,
> > }
> >
> > /* shut it off for sure */
> > - ignore_value(qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE));
> > + ignore_value(qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE|
> > + VIR_QEMU_PROCESS_KILL_NOCHECK));
> >
> > /* Stop autodestroy in case guest is restarted */
> > qemuProcessAutoDestroyRemove(driver, vm);
> > @@ -3892,7 +3901,6 @@ retry:
> >
> > vm->taint = 0;
> > vm->pid = -1;
> > - vm->def->id = -1;
> > virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason);
> > VIR_FREE(priv->vcpupids);
> > priv->nvcpupids = 0;
> > diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h
> > index 761db6f..891f7a5 100644
> > --- a/src/qemu/qemu_process.h
> > +++ b/src/qemu/qemu_process.h
> > @@ -72,6 +72,8 @@ int qemuProcessAttach(virConnectPtr conn,
> > typedef enum {
> > VIR_QEMU_PROCESS_KILL_FORCE = 1 << 0,
> > VIR_QEMU_PROCESS_KILL_NOWAIT = 1 << 1,
> > + VIR_QEMU_PROCESS_KILL_NOCHECK = 1 << 2, /* donot check whether the vm is
> > + running */
> > } virQemuProcessKillMode;
> >
> > int qemuProcessKill(struct qemud_driver *driver,
>
> Hi Wen,
>
> sorry for the delay in reviewing the patch. I think I understand the
> issue, and removing the pid and using an extra flag to qemuProcessKill
> to avoid the race sounds reasonable. I rebased the patch a bit, made
> some cosmetic changes especially on the comments and pushed it so it is
> included in rc2 for more testing,
>
> thanks !
>
> Daniel
>
> --
> Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
> daniel at veillard.com | Rpmfind RPM search engine http://rpmfind.net/
> http://veillard.com/ | virtualization library http://libvirt.org/
>
> --
> libvir-list mailing list
> libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
More information about the libvir-list
mailing list