[libvirt] [PATCH] qemu: agent: Fix QEMU guest agent is not available due to an error

Xiubo Li lixiubo at cmss.chinamobile.com
Thu Jul 7 01:12:10 UTC 2016


Hi All,

This patch I have test for more than 3 days using 5 different scripts at 
the same time, and the test scripts and libvirtd are all running happily 
till now.


Thanks,
BRs
Xiubo


On 06/07/2016 15:21, Xiubo Li wrote:
> These days, we experienced one qemu guest agent is not available
> error. Then the guest agent couldn't be used forever before rebooting
> the libvirtd daemon or reboot the qemu-guest-agent in the VM(you can
> also reboot the VM) to clear the priv->agentError.
>
> This is one bug for a long time in many verisons of libvirtd:
> https://bugzilla.redhat.com/show_bug.cgi?id=1090551
>
> And there is one python script to reproduce this bug from the bugzilla:
> https://github.com/aspirer/study/blob/master/qemu-guest-agent/test2.py
> Just set the timeout to 0 at Line26, you can reproduce it very quickly:
> rsp = libvirt_qemu.qemuAgentCommand(dom, cmd, 1, 0) // 1 -> 0
>
> The reason why this could happen is that:
>
> In case we received something like:  {"return": {}} for example, it is
> likely that somebody:
>
> 1) Started GA which sent one "guest-ping" command with seconds equals
> to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0), which makes this function
> return "immediately" without waiting, and sleeps in virCondWait().
>
> 2) While in AgentIO thread the "guest-ping" command is sent successfully,
> and then waiting for reply.
>
> 3) Then the GA thread is woken up and the mon->msg is set to NULL and
> exit.
>
> 4) Now the AgentIO has received the reply for "guest-ping" command with
> the mon->msg == NULL.
>
> Before we just check the guest-sync case, and should also check for all
> the other GA commands.
>
> I have test many of the GA commands, which all can reporduce this bug.
> This patch will check if this is the case and don't report an error but
> return silently.
>
> Signed-off-by: Xiubo Li <lixiubo at cmss.chinamobile.com>
> Signed-off-by: Zhuoyu Zhang <zhangzhuoyu at cmss.chinamobile.com>
> Signed-off-by: Wei Tang <tangwei at cmss.chinamobile.com>
> Signed-off-by: Yaowei Bai <baiyaowei at cmss.chinamobile.com>
> Signed-off-by: Qide Chen <chenqide at cmss.chinamobile.com>
> ---
>   src/qemu/qemu_agent.c | 30 +++++++++++++++++++++---------
>   src/util/virjson.h    |  7 +++++++
>   2 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
> index eeede6b..5f08ba0 100644
> --- a/src/qemu/qemu_agent.c
> +++ b/src/qemu/qemu_agent.c
> @@ -308,7 +308,6 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
>   {
>       virJSONValuePtr obj = NULL;
>       int ret = -1;
> -    unsigned long long id;
>
>       VIR_DEBUG("Line [%s]", line);
>
> @@ -333,16 +332,29 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
>               obj = NULL;
>               ret = 0;
>           } else {
> -            /* If we've received something like:
> -             *  {"return": 1234}
> -             * it is likely that somebody started GA
> -             * which is now processing our previous
> -             * guest-sync commands. Check if this is
> -             * the case and don't report an error but
> +            /* In case we received something like:
> +             *  {"return": {}} for example,
> +             * it is likely that somebody:
> +             *
> +             * 1) Started GA which sent one "guest-ping" command with
> +             * seconds equals to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0)
> +             * makes this function return immediately without waiting,
> +             * and sleeps in virCondWait().
> +             *
> +             * 2) While in AgentIO thread the "guest-ping" command is sent
> +             * successfully, and the waiting for reply.
> +             *
> +             * 3) Then the GA thread is woken up and the mon->msg is set
> +             * to NULL and exit.
> +             *
> +             * 4) Now the AgentIO has received the reply for "guest-ping"
> +             * command with the mon->msg == NULL.
> +             *
> +             * Check if this is the case and don't report an error but
>                * return silently.
>                */
> -            if (virJSONValueObjectGetNumberUlong(obj, "return", &id) == 0) {
> -                VIR_DEBUG("Ignoring delayed reply to guest-sync: %llu", id);
> +            if (virIsValidJSONType(obj)) {
> +                VIR_DEBUG("Ignoring delayed command reply");
>                   ret = 0;
>                   goto cleanup;
>               }
> diff --git a/src/util/virjson.h b/src/util/virjson.h
> index 66ed48a..d36ce8e 100644
> --- a/src/util/virjson.h
> +++ b/src/util/virjson.h
> @@ -80,6 +80,13 @@ struct _virJSONValue {
>       } data;
>   };
>
> +static inline bool
> +virIsValidJSONType(virJSONValuePtr object)
> +{
> +    return ((object->type >= VIR_JSON_TYPE_OBJECT)
> +	    && (object->type <= VIR_JSON_TYPE_BOOLEAN));
> +}
> +
>   void virJSONValueFree(virJSONValuePtr value);
>
>   int virJSONValueObjectCreate(virJSONValuePtr *obj, ...)
>





More information about the libvir-list mailing list