[PATCH v2 10/13] qemu: avoid deadlock in qemuDomainObjStopWorker

Daniel Henrique Barboza danielhb413 at gmail.com
Mon Aug 10 17:40:09 UTC 2020



On 7/23/20 7:14 AM, Nikolay Shirokovskiy wrote:
> We are dropping the only reference here so that the event loop thread
> is going to be exited synchronously. In order to avoid deadlocks we
> need to unlock the VM so that any handler being called can finish
> execution and thus even loop thread be finished too.
> 
> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy at virtuozzo.com>
> ---
>   src/qemu/qemu_domain.c | 18 ++++++++++++++----
>   1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> index 5b22eb2..82b3d11 100644
> --- a/src/qemu/qemu_domain.c
> +++ b/src/qemu/qemu_domain.c
> @@ -1637,11 +1637,21 @@ void
>   qemuDomainObjStopWorker(virDomainObjPtr dom)
>   {
>       qemuDomainObjPrivatePtr priv = dom->privateData;
> +    virEventThread *eventThread;
>   
> -    if (priv->eventThread) {
> -        g_object_unref(priv->eventThread);
> -        priv->eventThread = NULL;
> -    }
> +    if (!priv->eventThread)
> +        return;
> +
> +    /*
> +     * We are dropping the only reference here so that the event loop thread
> +     * is going to be exited synchronously. In order to avoid deadlocks we
> +     * need to unlock the VM so that any handler being called can finish
> +     * execution and thus even loop thread be finished too.
> +     */
> +    eventThread = g_steal_pointer(&priv->eventThread);
> +    virObjectUnlock(dom);
> +    g_object_unref(eventThread);
> +    virObjectLock(dom);

I understand the intention of the code thanks to the comment just before
it, but this is not robust. This unlocking -> do stuff -> lock will only
work if the caller of qemuDomainObjStopWorker() is holding the mutex.

I see that this is the case for qemuDomainObjStopWorkerIter(), but
qemuDomainObjStopWorker() is also called by qemuProcessStop(). qemuProcessStop()
does not make any mutex lock/unlock, so you'll be assuming that all callers of
qemuProcessStop() will hold the mutex before calling it. One of them is qemuProcessInit(),
which calls qemuProcessStop() in the 'stop' jump, and there is no mutex lock
beforehand as well.

Now we're assuming that all callers of qemuProcessInit() are holding the mutex
lock beforehand. In a quick glance in the code I saw at least 2 instances that
this isn't the case, then we'll need to assume that the callers of those functions
are holding the mutex lock. So on and so forth.


Given that this code only makes sense when called from qemuDomainObjStopWorkerIter(),
I'd suggest removing the lock/unlock of this function (turning it into just a call
to qemuDomainObjStopWorker()) and move them inside the body of qemuDomainObjStopWorker(),
locking and unlocking the mutex inside the scope of the same function.


Thanks,


DHB

>   }
>   
>   
> 




More information about the libvir-list mailing list