[libvirt] cont command failing via JSON monitor on restore

Laine Stump laine at laine.org
Fri Jan 14 20:20:21 UTC 2011


On 01/13/2011 04:29 PM, Jim Fehlig wrote:
> Daniel P. Berrange wrote:
>> >
>> >  Yep, it wasn't really intended as a fix. It was intended to make
>> >  the error scenario clearly detectable, which has succeeded as per
>> >  Jim's report. The fact that QMP returned an error in this way,
>> >  means we can now reliably detect, usleep(1000), and then retry
>> >  the 'cont' again at which point we should succeed.
>> >  
> Something like the attached patch?  I'm not quite sure about the retry
> bounds (currently 1 second).  In my testing, it always succeeds on the
> second attempt - even with large memory vms.

In my tests, 250ms was more than enough, so I'm guessing it's okay, 
although there's probably no hurt in making it a bit larger - it's not 
like this is something that repeats all day every day :-)

Would it be possible for you to add the same thing into the text monitor 
version of the code, so both fixes would travel together as the patch 
gets cherry-picked around?

> 0001-qemu-Retry-JSON-monitor-cont-cmd-on-MigrationExpecte.patch
>
>
> > From ec9109b40a4b2c45035495e0e4f65824a92dcf3d Mon Sep 17 00:00:00 2001
> From: Jim Fehlig<jfehlig at novell.com>
> Date: Thu, 13 Jan 2011 12:52:23 -0700
> Subject: [PATCH] qemu: Retry JSON monitor cont cmd on MigrationExpected error
>
> When restoring a saved qemu instance via JSON monitor, the vm is
> left in a paused state.  Turns out the 'cont' cmd was failing with
> "MigrationExpected" error class and "An incoming migration is
> expected before this command can be executed" error description
> due to migration (restore) not yet complete.
>
> Detect if 'cont' cmd fails with "MigrationExpecte" error class and
> retry 'cont' cmd.
> ---
>   src/qemu/qemu_monitor_json.c |   20 +++++++++++++++++---
>   1 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c
> index 7877731..63b736e 100644
> --- a/src/qemu/qemu_monitor_json.c
> +++ b/src/qemu/qemu_monitor_json.c
> @@ -702,13 +702,27 @@ qemuMonitorJSONStartCPUs(qemuMonitorPtr mon,
>       int ret;
>       virJSONValuePtr cmd = qemuMonitorJSONMakeCommand("cont", NULL);
>       virJSONValuePtr reply = NULL;
> +    int i = 0, timeout = 3;
>       if (!cmd)
>           return -1;
>
> -    ret = qemuMonitorJSONCommand(mon, cmd,&reply);
> +    do {
> +        ret = qemuMonitorJSONCommand(mon, cmd,&reply);
>
> -    if (ret == 0)
> -        ret = qemuMonitorJSONCheckError(cmd, reply);
> +        if (ret != 0)
> +            break;
> +
> +        /* If no error, we're done */
> +        if ((ret = qemuMonitorJSONCheckError(cmd, reply)) == 0)
> +            break;
> +
> +        /* If error class is not MigrationExpected, we're done.
> +         * Otherwise try 'cont' cmd again */
> +        if (!qemuMonitorJSONHasError(reply, "MigrationExpected"))
> +            break;
> +
> +        virJSONValueFree(reply);
> +    } while ((++i<= timeout)&&  (usleep(250000)<=0));
>
>       virJSONValueFree(cmd);
>       virJSONValueFree(reply);

Doesn't this end up doing a double-free of reply if it times out? 
virJSONValueFree doesn't update the pointer that's free'd like VIR_FREE 
does (it can't, since it's a function call rather than a macro).




More information about the libvir-list mailing list