[libvirt] cont command failing via JSON monitor on restore

Jim Fehlig jfehlig at novell.com
Fri Jan 14 21:26:31 UTC 2011


Laine Stump wrote:
> On 01/13/2011 04:29 PM, Jim Fehlig wrote:
>> Daniel P. Berrange wrote:
>>> >
>>> >  Yep, it wasn't really intended as a fix. It was intended to make
>>> >  the error scenario clearly detectable, which has succeeded as per
>>> >  Jim's report. The fact that QMP returned an error in this way,
>>> >  means we can now reliably detect, usleep(1000), and then retry
>>> >  the 'cont' again at which point we should succeed.
>>> >  
>> Something like the attached patch?  I'm not quite sure about the retry
>> bounds (currently 1 second).  In my testing, it always succeeds on the
>> second attempt - even with large memory vms.
>
> In my tests, 250ms was more than enough, so I'm guessing it's okay,
> although there's probably no hurt in making it a bit larger - it's not
> like this is something that repeats all day every day :-)
>
> Would it be possible for you to add the same thing into the text
> monitor version of the code, so both fixes would travel together as
> the patch gets cherry-picked around?

I can, but have not seen this issue with the text monitor.  And the
error reporting is not so fine grained correct?

>
>> 0001-qemu-Retry-JSON-monitor-cont-cmd-on-MigrationExpecte.patch
>>
>>
>> > From ec9109b40a4b2c45035495e0e4f65824a92dcf3d Mon Sep 17 00:00:00 2001
>> From: Jim Fehlig<jfehlig at novell.com>
>> Date: Thu, 13 Jan 2011 12:52:23 -0700
>> Subject: [PATCH] qemu: Retry JSON monitor cont cmd on
>> MigrationExpected error
>>
>> When restoring a saved qemu instance via JSON monitor, the vm is
>> left in a paused state.  Turns out the 'cont' cmd was failing with
>> "MigrationExpected" error class and "An incoming migration is
>> expected before this command can be executed" error description
>> due to migration (restore) not yet complete.
>>
>> Detect if 'cont' cmd fails with "MigrationExpecte" error class and
>> retry 'cont' cmd.
>> ---
>>   src/qemu/qemu_monitor_json.c |   20 +++++++++++++++++---
>>   1 files changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c
>> index 7877731..63b736e 100644
>> --- a/src/qemu/qemu_monitor_json.c
>> +++ b/src/qemu/qemu_monitor_json.c
>> @@ -702,13 +702,27 @@ qemuMonitorJSONStartCPUs(qemuMonitorPtr mon,
>>       int ret;
>>       virJSONValuePtr cmd = qemuMonitorJSONMakeCommand("cont", NULL);
>>       virJSONValuePtr reply = NULL;
>> +    int i = 0, timeout = 3;
>>       if (!cmd)
>>           return -1;
>>
>> -    ret = qemuMonitorJSONCommand(mon, cmd,&reply);
>> +    do {
>> +        ret = qemuMonitorJSONCommand(mon, cmd,&reply);
>>
>> -    if (ret == 0)
>> -        ret = qemuMonitorJSONCheckError(cmd, reply);
>> +        if (ret != 0)
>> +            break;
>> +
>> +        /* If no error, we're done */
>> +        if ((ret = qemuMonitorJSONCheckError(cmd, reply)) == 0)
>> +            break;
>> +
>> +        /* If error class is not MigrationExpected, we're done.
>> +         * Otherwise try 'cont' cmd again */
>> +        if (!qemuMonitorJSONHasError(reply, "MigrationExpected"))
>> +            break;
>> +
>> +        virJSONValueFree(reply);
>> +    } while ((++i<= timeout)&&  (usleep(250000)<=0));
>>
>>       virJSONValueFree(cmd);
>>       virJSONValueFree(reply);
>
> Doesn't this end up doing a double-free of reply if it times out?
> virJSONValueFree doesn't update the pointer that's free'd like
> VIR_FREE does (it can't, since it's a function call rather than a macro).

virJSONValueFree() calls VIR_FREE() on the value passed to it, so reply
should be set to NULL when virJSONValueFree() returns.

While looking at the patch again, not sure I'm fond of the

while ((++i<= timeout) &&  (usleep(250000)<=0));

Perhaps should move the usleep inside the loop and clean that up a bit.

Regards,
Jim




More information about the libvir-list mailing list