[libvirt] [PATCH 1/2] qemu: Fix @vm locking issue when connecting to the monitor

Daniel Henrique Barboza danielhb413 at gmail.com
Tue Oct 8 20:15:24 UTC 2019



On 10/8/19 6:15 AM, Michal Privoznik wrote:
> When connecting to qemu's monitor the @vm object is unlocked.
> This is justified - connecting may take a long time and we don't
> want to wait with the domain object locked. However, just before
> the domain object is locked again, the monitor's FD is registered
> in the event loop. Therefore, there is a small window where the
> event loop has a chance to call a handler for an event that
> occurred on the monitor FD but vm is not initalized properly just
> yet (i.e. priv->mon is not set). For instance, if there's an
> incoming migration, qemu creates its socket but then fails to
> initialize (for various reasons, I'm reproducing this by using
> hugepages but leaving the HP pool empty) then the following may
> happen:
>
> 1) qemuConnectMonitor() unlocks @vm
>
> 2) qemuMonitorOpen() connects to the monitor socket and by
>     calling qemuMonitorOpenInternal() which subsequently calls
>     qemuMonitorRegister() the event handler is installed
>
> 3) qemu fails to initialize and exit()-s, which closes the
>     monitor
>
> 4) The even loop sees EOF on the monitor and the control gets to
>     qemuProcessEventHandler() which locks @vm and calls
>     processMonitorEOFEvent() which then calls
>     qemuMonitorLastError(priv->mon). But priv->mon is not set just
>     yet.
>
> 5) qemuMonitorLastError() dereferences NULL pointer
>
> The solution is to unlock the domain object for a shorter time
> and most importantly, register event handler with domain object
> locked so that any possible event processing is done only after
> @vm's private data was properly initialized.
>
> This issue is also mentioned in v4.2.0-99-ga5a777a8ba.
>
> Since we are unlocking @vm and locking it back, another thread
> might have destroyed the domain meanwhile. Therefore we have to
> check if domain is still activem, and we have to do it at the

s/activem/active


> same place where domain lock is acquired back, i.e. in
> qemuMonitorOpen(). This creates a small problem for our test
> suite which calls qemuMonitorOpen() directly and passes @vm which
> has no definition. This makes virDomainObjIsActive() call crash.
> Fortunately, allocating empty domain definition is sufficient.
>
> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
> ---

LGTM. Hopefully this will be enough to prevent this vm lock race
with the monitor at VM startup.


Reviewed-by: Daniel Henrique Barboza <danielhb413 at gmail.com>

>
> This is an alternative approach to:
>
> https://www.redhat.com/archives/libvir-list/2019-September/msg00749.html
>
>   src/qemu/qemu_monitor.c      | 33 +++++++++++++++++++++++++--------
>   src/qemu/qemu_process.c      | 12 ------------
>   tests/qemumonitortestutils.c |  2 ++
>   3 files changed, 27 insertions(+), 20 deletions(-)
>
> diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
> index 58de26a276..a53d3b1d60 100644
> --- a/src/qemu/qemu_monitor.c
> +++ b/src/qemu/qemu_monitor.c
> @@ -810,35 +810,52 @@ qemuMonitorOpen(virDomainObjPtr vm,
>                   qemuMonitorCallbacksPtr cb,
>                   void *opaque)
>   {
> -    int fd;
> +    int fd = -1;
>       bool hasSendFD = false;
> -    qemuMonitorPtr ret;
> +    qemuMonitorPtr ret = NULL;
>   
>       timeout += QEMU_DEFAULT_MONITOR_WAIT;
>   
> +    /* Hold an extra reference because we can't allow 'vm' to be
> +     * deleted until the monitor gets its own reference. */
> +    virObjectRef(vm);
> +
>       switch (config->type) {
>       case VIR_DOMAIN_CHR_TYPE_UNIX:
>           hasSendFD = true;
> -        if ((fd = qemuMonitorOpenUnix(config->data.nix.path,
> -                                      vm->pid, retry, timeout)) < 0)
> -            return NULL;
> +        virObjectUnlock(vm);
> +        fd = qemuMonitorOpenUnix(config->data.nix.path,
> +                                 vm->pid, retry, timeout);
> +        virObjectLock(vm);
>           break;
>   
>       case VIR_DOMAIN_CHR_TYPE_PTY:
> -        if ((fd = qemuMonitorOpenPty(config->data.file.path)) < 0)
> -            return NULL;
> +        virObjectUnlock(vm);
> +        fd = qemuMonitorOpenPty(config->data.file.path);
> +        virObjectLock(vm);
>           break;
>   
>       default:
>           virReportError(VIR_ERR_INTERNAL_ERROR,
>                          _("unable to handle monitor type: %s"),
>                          virDomainChrTypeToString(config->type));
> -        return NULL;
> +        break;
> +    }
> +
> +    if (fd < 0)
> +        goto cleanup;
> +
> +    if (!virDomainObjIsActive(vm)) {
> +        virReportError(VIR_ERR_OPERATION_FAILED, "%s",
> +                       _("domain is not running"));
> +        goto cleanup;
>       }
>   
>       ret = qemuMonitorOpenInternal(vm, fd, hasSendFD, cb, opaque);
> + cleanup:
>       if (!ret)
>           VIR_FORCE_CLOSE(fd);
> +    virObjectUnref(vm);
>       return ret;
>   }
>   
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index a50cd54393..b303e5ba05 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -1955,13 +1955,8 @@ qemuConnectMonitor(virQEMUDriverPtr driver, virDomainObjPtr vm, int asyncJob,
>        * 1GiB of guest RAM. */
>       timeout = vm->def->mem.total_memory / (1024 * 1024);
>   
> -    /* Hold an extra reference because we can't allow 'vm' to be
> -     * deleted until the monitor gets its own reference. */
> -    virObjectRef(vm);
> -
>       ignore_value(virTimeMillisNow(&priv->monStart));
>       monConfig = virObjectRef(priv->monConfig);
> -    virObjectUnlock(vm);
>   
>       mon = qemuMonitorOpen(vm,
>                             monConfig,
> @@ -1978,15 +1973,8 @@ qemuConnectMonitor(virQEMUDriverPtr driver, virDomainObjPtr vm, int asyncJob,
>                                   qemuProcessMonitorLogFree);
>       }
>   
> -    virObjectLock(vm);
>       virObjectUnref(monConfig);
> -    virObjectUnref(vm);
>       priv->monStart = 0;
> -
> -    if (!virDomainObjIsActive(vm)) {
> -        qemuMonitorClose(mon);
> -        mon = NULL;
> -    }
>       priv->mon = mon;
>   
>       if (qemuSecurityClearSocketLabel(driver->securityManager, vm->def) < 0) {
> diff --git a/tests/qemumonitortestutils.c b/tests/qemumonitortestutils.c
> index e9dff123f8..c7580c5f28 100644
> --- a/tests/qemumonitortestutils.c
> +++ b/tests/qemumonitortestutils.c
> @@ -1085,6 +1085,8 @@ qemuMonitorCommonTestNew(virDomainXMLOptionPtr xmlopt,
>           test->vm = virDomainObjNew(xmlopt);
>           if (!test->vm)
>               goto error;
> +        if (!(test->vm->def = virDomainDefNew()))
> +            goto error;
>       }
>   
>       if (virNetSocketNewListenUNIX(path, 0700, geteuid(), getegid(),




More information about the libvir-list mailing list