[libvirt] [RFC] Add API to change qemu agent response timeout

Peter Krempa pkrempa at redhat.com
Fri Oct 4 06:14:39 UTC 2019


On Thu, Oct 03, 2019 at 16:52:12 -0500, Jonathon Jongsma wrote:
> Some layered products such as oVirt have requested a way to avoid being
> blocked by guest agent commands when querying a loaded vm. For example,
> many guest agent commands are polled periodically to monitor changes,
> and rather than blocking the calling process, they'd prefer to simply
> time out when an agent query is taking too long.
> 
> This patch adds a way for the user to specify a custom agent timeout
> that is applied to all agent commands.
> 
> One special case to note here is the 'guest-sync' command. 'guest-sync'
> is issued internally prior to calling any other command. (For example,
> when libvirt wants to call 'guest-get-fsinfo', we first call
> 'guest-sync' and then call 'guest-get-fsinfo').
> 
> Previously, the 'guest-sync' command used a 5-second timeout
> (VIR_DOMAIN_QEMU_AGENT_COMMAND_DEFAULT), whereas the actual command that
> followed always blocked indefinitely
> (VIR_DOMAIN_QEMU_AGENT_COMMAND_BLOCK). As part of this patch, if a
> custom timeout is specified that is shorter than
> 5 seconds,  this new timeout also used for 'guest-sync'. If there is no
> custom timeout or if the custom timeout is longer than 5 seconds, we
> will continue to use the 5-second timeout.
> 
> See https://bugzilla.redhat.com/show_bug.cgi?id=1705426 for additional details.
> 
> Signed-off-by: Jonathon Jongsma <jjongsma at redhat.com>
> ---
>  include/libvirt/libvirt-qemu.h |  2 +
>  src/driver-hypervisor.h        |  5 +++
>  src/libvirt-qemu.c             | 40 ++++++++++++++++++++
>  src/libvirt_qemu.syms          |  4 ++
>  src/qemu/qemu_agent.c          | 69 +++++++++++++++++++---------------
>  src/qemu/qemu_agent.h          |  3 ++
>  src/qemu/qemu_driver.c         | 24 ++++++++++++
>  src/qemu_protocol-structs      |  8 ++++
>  src/remote/qemu_protocol.x     | 18 ++++++++-
>  src/remote/remote_driver.c     |  1 +
>  10 files changed, 143 insertions(+), 31 deletions(-)

[...]

> @@ -2737,3 +2730,19 @@ qemuAgentGetTimezone(qemuAgentPtr mon,
>  
>      return 0;
>  }
> +
> +int
> +qemuAgentSetTimeout(qemuAgentPtr mon,
> +                    int timeout)
> +{
> +    if (timeout < VIR_DOMAIN_QEMU_AGENT_COMMAND_MIN) {
> +        virReportError(VIR_ERR_INVALID_ARG,
> +                       _("guest agent timeout '%d' is "
> +                         "less than the minimum '%d'"),
> +                       timeout, VIR_DOMAIN_QEMU_AGENT_COMMAND_MIN);

This error is misleading as -1 and -2 are special values and not actual
timeout.

> +        return -1;
> +    }
> +
> +    mon->timeout = timeout;
> +    return 0;
> +}

[...]

> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 1e041a8bac..09251cc9e2 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -23434,6 +23434,29 @@ qemuDomainGetGuestInfo(virDomainPtr dom,
>      return ret;
>  }
>  
> +static int
> +qemuDomainQemuAgentSetTimeout(virDomainPtr dom,
> +                              int timeout)
> +{
> +    virDomainObjPtr vm = NULL;
> +    qemuAgentPtr agent;
> +    int ret = -1;
> +
> +    if (!(vm = qemuDomObjFromDomain(dom)))
> +        goto cleanup;
> +
> +    if (virDomainQemuAgentSetTimeoutEnsureACL(dom->conn, vm->def) < 0)
> +        goto cleanup;
> +
> +    agent = qemuDomainObjEnterAgent(vm);

You must acquire a job on @vm if you want to call this:

/*
 * obj must be locked before calling
 *
 * To be called immediately before any QEMU agent API call.
 * Must have already called qemuDomainObjBeginAgentJob() or
 * qemuDomainObjBeginJobWithAgent() and checked that the VM is
 * still active.
 *
 * To be followed with qemuDomainObjExitAgent() once complete
 */
qemuAgentPtr
qemuDomainObjEnterAgent(virDomainObjPtr obj)


> +    ret = qemuAgentSetTimeout(agent, timeout);
> +    qemuDomainObjExitAgent(vm, agent);

Also this API is inherently racy if you have two clients setting the
timeout and it will influence calls of a different client.

IMO the only reasonable approach is to add new APIs which have a
'timeout' parameter for any agent API which requires tunable timeout to
prevent any races and unexpected behaviour.

Other possibility may be to add a qemu config file option for this but
that is not really dynamic.




More information about the libvir-list mailing list