[libvirt] [PATCHv2] qemu: agent: Fix QEMU guest agent is not available due to an error
John Ferlan
jferlan at redhat.com
Wed Sep 21 11:01:37 UTC 2016
On 07/07/2016 05:53 AM, Xiubo Li wrote:
> These days, we experienced one qemu guest agent is not available
> error. Then the guest agent couldn't be used forever before rebooting
> the libvirtd daemon or reboot the qemu-guest-agent in the VM(you can
> also reboot the VM) to clear the priv->agentError.
>
> This is one bug for a long time in many verisons of libvirtd:
> https://bugzilla.redhat.com/show_bug.cgi?id=1090551
>
> And there is one python script to reproduce this bug from the bugzilla:
> https://github.com/aspirer/study/blob/master/qemu-guest-agent/test2.py
> Just set the timeout to 0 at Line26, you can reproduce it very quickly:
> rsp = libvirt_qemu.qemuAgentCommand(dom, cmd, 1, 0) // 1 -> 0
>
> The reason why this could happen is that:
>
> In case we received something like {"return": {}} for example, it is
> likely that somebody:
>
> 1) Started GA thread which sent one "guest-ping" command with seconds
> equals to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0) makes this function
> return immediately without waiting, but after updating the events, it
> will sleep in virCondWait() or virCondWaitUntil().
>
> 2) While in AgentIO thread the "guest-ping" command is sent successfully,
> with updating the events it will wait for reply.
>
> 3) Then the GA thread is woken up and the mon->msg is set to NULL and exit.
>
> 4) At last the AgentIO thread receives the reply of "guest-ping" command
> with the mon->msg == NULL.
>
> This case could be adapt to all the GA commands, including guest-sync, etc.
>
> This patch will check if this is the case and don't report an error but
> return silently.
>
> Signed-off-by: Xiubo Li <lixiubo at cmss.chinamobile.com>
> Signed-off-by: Zhuoyu Zhang <zhangzhuoyu at cmss.chinamobile.com>
> Signed-off-by: Wei Tang <tangwei at cmss.chinamobile.com>
> Signed-off-by: Yaowei Bai <baiyaowei at cmss.chinamobile.com>
> Signed-off-by: Qide Chen <chenqide at cmss.chinamobile.com>
> ---
>
> Changes in V2:
> - check value type instead of object type.
>
>
>
> src/qemu/qemu_agent.c | 37 ++++++++++++++++++++++++++++---------
> src/util/virjson.h | 7 +++++++
> 2 files changed, 35 insertions(+), 9 deletions(-)
>
Since this has been sitting around for a long time with a response -
figured I'd point out something that was pushed today:
http://www.redhat.com/archives/libvir-list/2016-September/msg00570.html
It addresses/resolves the issue in essentially the same manner, although
it doesn't check the empty {} reply, rather it just makes the else
condition be a ignored delay reply.... There's also a few more patches
to the series.
John
> diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
> index eeede6b..f5408cc 100644
> --- a/src/qemu/qemu_agent.c
> +++ b/src/qemu/qemu_agent.c
> @@ -308,7 +308,6 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
> {
> virJSONValuePtr obj = NULL;
> int ret = -1;
> - unsigned long long id;
>
> VIR_DEBUG("Line [%s]", line);
>
> @@ -333,16 +332,36 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
> obj = NULL;
> ret = 0;
> } else {
> - /* If we've received something like:
> - * {"return": 1234}
> - * it is likely that somebody started GA
> - * which is now processing our previous
> - * guest-sync commands. Check if this is
> - * the case and don't report an error but
> + virJSONValuePtr val;
> +
> + /* In case we received something like:
> + * {"return": {}} for example,
> + * it is likely that somebody:
> + *
> + * 1) Started GA thread which sent one "guest-ping" command
> + * with seconds equals to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0)
> + * makes this function return immediately without waiting, but
> + * after updating the events, it will sleep in virCondWait() or
> + * virCondWaitUntil().
> + *
> + * 2) While in AgentIO thread the "guest-ping" command is sent
> + * successfully, with updating the events it will wait for reply.
> + *
> + * 3) Then the GA thread is woken up and the mon->msg is set
> + * to NULL and exit.
> + *
> + * 4) At last the AgentIO thread receives the reply of "guest-ping"
> + * command with the mon->msg == NULL.
> + *
> + * This case could be adapt to all the GA commands, including
> + * guest-sync, etc.
> + *
> + * Check if this is the case and don't report an error but
> * return silently.
> */
> - if (virJSONValueObjectGetNumberUlong(obj, "return", &id) == 0) {
> - VIR_DEBUG("Ignoring delayed reply to guest-sync: %llu", id);
> + val = virJSONValueObjectGet(obj, "return");
> + if (val && virJSONValueTypeIsValid(val)) {
> + VIR_DEBUG("Ignoring delayed command reply");
> ret = 0;
> goto cleanup;
> }
> diff --git a/src/util/virjson.h b/src/util/virjson.h
> index 66ed48a..cb1d973 100644
> --- a/src/util/virjson.h
> +++ b/src/util/virjson.h
> @@ -80,6 +80,13 @@ struct _virJSONValue {
> } data;
> };
>
> +static inline bool
> +virJSONValueTypeIsValid(virJSONValuePtr value)
> +{
> + return ((value->type >= VIR_JSON_TYPE_OBJECT) &&
> + (value->type <= VIR_JSON_TYPE_BOOLEAN));
> +}
> +
> void virJSONValueFree(virJSONValuePtr value);
>
> int virJSONValueObjectCreate(virJSONValuePtr *obj, ...)
>
More information about the libvir-list
mailing list