[libvirt] cont command failing via JSON monitor on restore

Laine Stump laine at laine.org
Wed Jan 19 15:09:48 UTC 2011


On 01/18/2011 01:15 PM, Jim Fehlig wrote:
> Laine Stump wrote:
>> On 01/14/2011 04:26 PM, Jim Fehlig wrote:
>>> Laine Stump wrote:
>>>> On 01/13/2011 04:29 PM, Jim Fehlig wrote:
>>>>> Daniel P. Berrange wrote:
>>>>>>>    Yep, it wasn't really intended as a fix. It was intended to make
>>>>>>>    the error scenario clearly detectable, which has succeeded as per
>>>>>>>    Jim's report. The fact that QMP returned an error in this way,
>>>>>>>    means we can now reliably detect, usleep(1000), and then retry
>>>>>>>    the 'cont' again at which point we should succeed.
>>>>>>>
>>>>> Something like the attached patch?  I'm not quite sure about the retry
>>>>> bounds (currently 1 second).  In my testing, it always succeeds on the
>>>>> second attempt - even with large memory vms.
>>>> In my tests, 250ms was more than enough, so I'm guessing it's okay,
>>>> although there's probably no hurt in making it a bit larger - it's not
>>>> like this is something that repeats all day every day :-)
>>>>
>>>> Would it be possible for you to add the same thing into the text
>>>> monitor version of the code, so both fixes would travel together as
>>>> the patch gets cherry-picked around?
>>> I can, but have not seen this issue with the text monitor.  And the
>>> error reporting is not so fine grained correct?
>> Truthfully I haven't spent any time with the JSON code, and only
>> enough with the text monitor to (locally) put in a rough hack for the
>> same problem - I just delayed 250ms unconditionally.
>>
>> Maybe it is safest for your patch to just have what you're able to
>> test for.
> Agreed.  What error do you get back from the text monitor when cont cmd
> fails?

Right now the machine that experiences this failure is stuck on 
qemu-0.12.5, so it doesn't fail, it "succeeds" :-) (since it doesn't 
have the necessary qemu patch, when "cont" is run before the migration 
has been able to start, it just believes that it was successful, and 
starts the CPU up with garbage in the guest memory.)

I'll try to get qemu-0.13 installed on this machine as soon as I can, 
but I'm trying to get something else fixed before Thursday.

> I'd be interested in the following output from
> $root/src/qemu/qemu_monitor_text.c:qemuMonitorCommandWithHandler()
>
>      VIR_DEBUG("Receive command reply ret=%d errno=%d %d bytes '%s'",
>                ret, msg.lastErrno, msg.rxLength, msg.rxBuffer);
>
>> Of course my system isn't setup to test exactly this fix either - the
>> machine that displays the problem when resuming is still running
>> qemu-kvm-0.12.5.
>>
>>
>>>>> - if (ret == 0)
>>>>> -        ret = qemuMonitorJSONCheckError(cmd, reply);
>>>>> +        if (ret != 0)
>>>>> +            break;
>>>>> +
>>>>> +        /* If no error, we're done */
>>>>> +        if ((ret = qemuMonitorJSONCheckError(cmd, reply)) == 0)
>>>>> +            break;
>>>>> +
>>>>> +        /* If error class is not MigrationExpected, we're done.
>>>>> +         * Otherwise try 'cont' cmd again */
>>>>> +        if (!qemuMonitorJSONHasError(reply, "MigrationExpected"))
>>>>> +            break;
>>>>> +
>>>>> +        virJSONValueFree(reply);
>>>>> +    } while ((++i<= timeout)&&    (usleep(250000)<=0));
>>>>>
>>>>>         virJSONValueFree(cmd);
>>>>>         virJSONValueFree(reply);
>>>> Doesn't this end up doing a double-free of reply if it times out?
>>>> virJSONValueFree doesn't update the pointer that's free'd like
>>>> VIR_FREE does (it can't, since it's a function call rather than a
>>>> macro).
>>> virJSONValueFree() calls VIR_FREE() on the value passed to it, so reply
>>> should be set to NULL when virJSONValueFree() returns.
>>
>> Yes, but it's pass by value, not reference, so VIR_FREE() is NULLing
>> the copy of the pointer that was passed as an argument, not the
>> original pointer itself.
> Err, right.  I'll fix this in v2 - but need to understand the text
> monitor error first.

I'll do my best to get to it as soon as possible.

> Thanks,
> Jim
>
>




More information about the libvir-list mailing list