[libvirt] qemu: Add timeout for monitor to avoid virsh getting stuck when monitor gets die.

Thu Apr 7 02:42:53 UTC 2011

On Wed, Apr 06, 2011 at 11:02:58AM +0100, Daniel P. Berrange wrote:
> On Wed, Apr 06, 2011 at 12:13:10PM +0800, Mark Wu wrote:
> > Hello Guys,
> > 
> > When the qemu process becomes hung,  virsh will get stuck on the
> > hung guest and can't move forward.  It can be reproduced by the
> > following steps:
> > 
> > 1. setup a virt guest with qemu-kvm, and start it
> > 2. stop qemu process with following:
> >  kill -STOP  `ps aux | grep qemu | grep -v grep | awk '{print $2}'`
> > 3. run the following command:
> > virsh list
> > 
> > I think we can add a timeout for qemu monitor to resolve this
> > problem: using virCondWaitUntil instead of virCondWait in
> > qemuMonitorSend. What's your opinions?
> > 
> > Thanks!
> > 
> 
> > diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
> > index fca8590..65d8de9 100644
> > --- a/src/qemu/qemu_monitor.c
> > +++ b/src/qemu/qemu_monitor.c
> > @@ -25,6 +25,7 @@
> >  
> >  #include <poll.h>
> >  #include <sys/un.h>
> > +#include <sys/time.h>
> >  #include <unistd.h>
> >  #include <fcntl.h>
> >  
> > @@ -691,11 +692,14 @@ int qemuMonitorClose(qemuMonitorPtr mon)
> >      return refs;
> >  }
> >  
> > +#define QEMU_JOB_WAIT_TIME (1000ull * 30)
> >  
> >  int qemuMonitorSend(qemuMonitorPtr mon,
> >                      qemuMonitorMessagePtr msg)
> >  {
> >      int ret = -1;
> > +    struct timeval now;
> > +    unsigned long long then;
> >  
> >      if (mon->eofcb) {
> >          msg->lastErrno = EIO;
> > @@ -706,7 +710,14 @@ int qemuMonitorSend(qemuMonitorPtr mon,
> >      qemuMonitorUpdateWatch(mon);
> >  
> >      while (!mon->msg->finished) {
> > -        if (virCondWait(&mon->notify, &mon->lock) < 0)
> > +       if (gettimeofday(&now, NULL) < 0) {
> > +               virReportSystemError(errno, "%s",
> > +                             _("cannot get time of day"));
> > +               return -1;
> > +       }
> > +       then = (now.tv_sec * 1000ull) + (now.tv_usec / 1000);
> > +       then += QEMU_JOB_WAIT_TIME;
> > +        if (virCondWaitUntil(&mon->notify, &mon->lock, then) < 0)
> >              goto cleanup;
> >      }
> 
> This may seem simple, but it has a lot of nasty consequences.
> 
> Adding the timeout causes the thread to stop waiting for the
> monitor command reply, and returns an error for that API call.
> 
> If QEMU should recover though, and more API calls are made
> which issue monitor commands, all those future commands will
> receive the wrong reply data.
> 
> If we are going to allow a timeout when waiting for a reply
> to a monitor command, then we need to mark the monitor as
> 'broken' and forbid all future use of it until the VM is
> restarted.
> 
> If it was a simple 'info' command, then we could potentially
> add code to read+discard the delayed reply later.
> 
> If it was an action command though, we can't simply discard
> the delayed reply, because that will result in libvirt's
> view of the world becoming out of sync with QEMU. In certain
> cases we may be able to cope with this, by listening for
> event notifications.
> 
> eg, if 'stop' command times out, and libvirt will thing the
> VM is still running, but if QEMU later completes it, then
> it will in fact be paused. We will see a 'PAUSED' event from
> QEMU that lets us re-sync our state.
> 
> If something like a 'device_add' commands time out though, we
> have no way to gracefully recover.
> 
> 
> NB, I'm not saying your patch is wrong. In fact I think it is
> potentially a good idea, but we need to make sure we are able
> to safely deal with the consequences of timing out first.

  What I wonder is if we shouldn't do more tight monitoring of
the subprocesses instead of just relying on the pipe to the monitor
to just detect exit. At least on linux we could use /proc/$pid/status
to get more information about the process state, this would allow to
detect maybe ahead of time things like stopped processes or abnormal
activity. The problem as you said is that once we sent a command to
the monitor, not waiting for the answer put the whole communication at
risk, Ideally we should be able to signal the qemu child that we want
to reset the interface like the reset button on a VT220 terminal sending
a break through the serial line, it would be really useful to have
something to reset asynchronously data on a QEmu monitor. I don't know
what would be the best way to implement this on an Unix/Linux pipe
though.
  But I agree using a timeout and aborting in the current framework
doesn't work.
  I also agree with Jirka that a simpler API for just listing state
not involving a round-trip to the app would be a significant
improvement.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel at veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/