[libvirt] [PATCH] qemu: agent: Fix QEMU guest agent is not available due to an error

Xiubo Li lixiubo at cmss.chinamobile.com
Thu Jul 7 08:42:49 UTC 2016


Hi,

For this bug, by checking the object's type is valid or not will always 
be true.
We should check for the object's value type instead.

I will send the V2 patch.

Thanks,

BRs
Xiubo


On 07/07/2016 09:12, Xiubo Li wrote:
> Hi All,
>
> This patch I have test for more than 3 days using 5 different scripts at
> the same time, and the test scripts and libvirtd are all running happily
> till now.
>
>
> Thanks,
> BRs
> Xiubo
>
>
> On 06/07/2016 15:21, Xiubo Li wrote:
>> These days, we experienced one qemu guest agent is not available
>> error. Then the guest agent couldn't be used forever before rebooting
>> the libvirtd daemon or reboot the qemu-guest-agent in the VM(you can
>> also reboot the VM) to clear the priv->agentError.
>>
>> This is one bug for a long time in many verisons of libvirtd:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1090551
>>
>> And there is one python script to reproduce this bug from the bugzilla:
>> https://github.com/aspirer/study/blob/master/qemu-guest-agent/test2.py
>> Just set the timeout to 0 at Line26, you can reproduce it very quickly:
>> rsp = libvirt_qemu.qemuAgentCommand(dom, cmd, 1, 0) // 1 -> 0
>>
>> The reason why this could happen is that:
>>
>> In case we received something like:  {"return": {}} for example, it is
>> likely that somebody:
>>
>> 1) Started GA which sent one "guest-ping" command with seconds equals
>> to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0), which makes this function
>> return "immediately" without waiting, and sleeps in virCondWait().
>>
>> 2) While in AgentIO thread the "guest-ping" command is sent successfully,
>> and then waiting for reply.
>>
>> 3) Then the GA thread is woken up and the mon->msg is set to NULL and
>> exit.
>>
>> 4) Now the AgentIO has received the reply for "guest-ping" command with
>> the mon->msg == NULL.
>>
>> Before we just check the guest-sync case, and should also check for all
>> the other GA commands.
>>
>> I have test many of the GA commands, which all can reporduce this bug.
>> This patch will check if this is the case and don't report an error but
>> return silently.
>>
>> Signed-off-by: Xiubo Li <lixiubo at cmss.chinamobile.com>
>> Signed-off-by: Zhuoyu Zhang <zhangzhuoyu at cmss.chinamobile.com>
>> Signed-off-by: Wei Tang <tangwei at cmss.chinamobile.com>
>> Signed-off-by: Yaowei Bai <baiyaowei at cmss.chinamobile.com>
>> Signed-off-by: Qide Chen <chenqide at cmss.chinamobile.com>
>> ---
>>   src/qemu/qemu_agent.c | 30 +++++++++++++++++++++---------
>>   src/util/virjson.h    |  7 +++++++
>>   2 files changed, 28 insertions(+), 9 deletions(-)
>>
>> diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
>> index eeede6b..5f08ba0 100644
>> --- a/src/qemu/qemu_agent.c
>> +++ b/src/qemu/qemu_agent.c
>> @@ -308,7 +308,6 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
>>   {
>>       virJSONValuePtr obj = NULL;
>>       int ret = -1;
>> -    unsigned long long id;
>>
>>       VIR_DEBUG("Line [%s]", line);
>>
>> @@ -333,16 +332,29 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
>>               obj = NULL;
>>               ret = 0;
>>           } else {
>> -            /* If we've received something like:
>> -             *  {"return": 1234}
>> -             * it is likely that somebody started GA
>> -             * which is now processing our previous
>> -             * guest-sync commands. Check if this is
>> -             * the case and don't report an error but
>> +            /* In case we received something like:
>> +             *  {"return": {}} for example,
>> +             * it is likely that somebody:
>> +             *
>> +             * 1) Started GA which sent one "guest-ping" command with
>> +             * seconds equals to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0)
>> +             * makes this function return immediately without waiting,
>> +             * and sleeps in virCondWait().
>> +             *
>> +             * 2) While in AgentIO thread the "guest-ping" command is
>> sent
>> +             * successfully, and the waiting for reply.
>> +             *
>> +             * 3) Then the GA thread is woken up and the mon->msg is set
>> +             * to NULL and exit.
>> +             *
>> +             * 4) Now the AgentIO has received the reply for
>> "guest-ping"
>> +             * command with the mon->msg == NULL.
>> +             *
>> +             * Check if this is the case and don't report an error but
>>                * return silently.
>>                */
>> -            if (virJSONValueObjectGetNumberUlong(obj, "return", &id)
>> == 0) {
>> -                VIR_DEBUG("Ignoring delayed reply to guest-sync:
>> %llu", id);
>> +            if (virIsValidJSONType(obj)) {
>> +                VIR_DEBUG("Ignoring delayed command reply");
>>                   ret = 0;
>>                   goto cleanup;
>>               }
>> diff --git a/src/util/virjson.h b/src/util/virjson.h
>> index 66ed48a..d36ce8e 100644
>> --- a/src/util/virjson.h
>> +++ b/src/util/virjson.h
>> @@ -80,6 +80,13 @@ struct _virJSONValue {
>>       } data;
>>   };
>>
>> +static inline bool
>> +virIsValidJSONType(virJSONValuePtr object)
>> +{
>> +    return ((object->type >= VIR_JSON_TYPE_OBJECT)
>> +        && (object->type <= VIR_JSON_TYPE_BOOLEAN));
>> +}
>> +
>>   void virJSONValueFree(virJSONValuePtr value);
>>
>>   int virJSONValueObjectCreate(virJSONValuePtr *obj, ...)
>>





More information about the libvir-list mailing list