[libvirt] cont command failing via JSON monitor on restore
Jim Fehlig
jfehlig at novell.com
Tue Jan 18 18:15:07 UTC 2011
Laine Stump wrote:
> On 01/14/2011 04:26 PM, Jim Fehlig wrote:
>> Laine Stump wrote:
>>> On 01/13/2011 04:29 PM, Jim Fehlig wrote:
>>>> Daniel P. Berrange wrote:
>>>>>> Yep, it wasn't really intended as a fix. It was intended to make
>>>>>> the error scenario clearly detectable, which has succeeded as per
>>>>>> Jim's report. The fact that QMP returned an error in this way,
>>>>>> means we can now reliably detect, usleep(1000), and then retry
>>>>>> the 'cont' again at which point we should succeed.
>>>>>>
>>>> Something like the attached patch? I'm not quite sure about the retry
>>>> bounds (currently 1 second). In my testing, it always succeeds on the
>>>> second attempt - even with large memory vms.
>>> In my tests, 250ms was more than enough, so I'm guessing it's okay,
>>> although there's probably no hurt in making it a bit larger - it's not
>>> like this is something that repeats all day every day :-)
>>>
>>> Would it be possible for you to add the same thing into the text
>>> monitor version of the code, so both fixes would travel together as
>>> the patch gets cherry-picked around?
>> I can, but have not seen this issue with the text monitor. And the
>> error reporting is not so fine grained correct?
>
> Truthfully I haven't spent any time with the JSON code, and only
> enough with the text monitor to (locally) put in a rough hack for the
> same problem - I just delayed 250ms unconditionally.
>
> Maybe it is safest for your patch to just have what you're able to
> test for.
Agreed. What error do you get back from the text monitor when cont cmd
fails? I'd be interested in the following output from
$root/src/qemu/qemu_monitor_text.c:qemuMonitorCommandWithHandler()
VIR_DEBUG("Receive command reply ret=%d errno=%d %d bytes '%s'",
ret, msg.lastErrno, msg.rxLength, msg.rxBuffer);
>
> Of course my system isn't setup to test exactly this fix either - the
> machine that displays the problem when resuming is still running
> qemu-kvm-0.12.5.
>
>
>>>> - if (ret == 0)
>>>> - ret = qemuMonitorJSONCheckError(cmd, reply);
>>>> + if (ret != 0)
>>>> + break;
>>>> +
>>>> + /* If no error, we're done */
>>>> + if ((ret = qemuMonitorJSONCheckError(cmd, reply)) == 0)
>>>> + break;
>>>> +
>>>> + /* If error class is not MigrationExpected, we're done.
>>>> + * Otherwise try 'cont' cmd again */
>>>> + if (!qemuMonitorJSONHasError(reply, "MigrationExpected"))
>>>> + break;
>>>> +
>>>> + virJSONValueFree(reply);
>>>> + } while ((++i<= timeout)&& (usleep(250000)<=0));
>>>>
>>>> virJSONValueFree(cmd);
>>>> virJSONValueFree(reply);
>>> Doesn't this end up doing a double-free of reply if it times out?
>>> virJSONValueFree doesn't update the pointer that's free'd like
>>> VIR_FREE does (it can't, since it's a function call rather than a
>>> macro).
>> virJSONValueFree() calls VIR_FREE() on the value passed to it, so reply
>> should be set to NULL when virJSONValueFree() returns.
>
>
> Yes, but it's pass by value, not reference, so VIR_FREE() is NULLing
> the copy of the pointer that was passed as an argument, not the
> original pointer itself.
Err, right. I'll fix this in v2 - but need to understand the text
monitor error first.
Thanks,
Jim
More information about the libvir-list
mailing list