[libvirt] [PATCH v2] qemu: qemuDomainRename and virDomainObjListNumOfDomains ABBA deadlock fix

Nikolay Shirokovskiy nshirokovskiy at virtuozzo.com
Sat Jan 30 15:38:39 UTC 2016


Sorry. I need to rework it.

I've missed the option to wait for job condition outside the
callback. That is before calling the generic rename function.

Also the proposed version has the problem of inconsistent
list state too, just as rejected version with generic rename
that unlocks list lock before calling the callback.


On 30.01.2016 18:03, Nikolay Shirokovskiy wrote:
> A pretty nasty deadlock occurs while trying to rename a VM in parallel
> with virDomainObjListNumOfDomains.
> The short description of the problem is as follows:
> 
> Thread #1:
> 
> qemuDomainRename:
>     ------> aquires domain lock by qemuDomObjFromDomain
>        ---------> waits for domain list lock in any of the listed functions:
>           - virDomainObjListFindByName
>           - virDomainObjListRenameAddNew
>           - virDomainObjListRenameRemove
> 
> Thread #2:
> 
> virDomainObjListNumOfDomains:
>     ------> aquires domain list lock
>        ---------> waits for domain lock in virDomainObjListCount
> 
> Consider another solution for the deadlock besides those proposed by Maxim
> and Michael. Simply unlock before calling any function that internally
> grabs domains list lock.
> 
> First it looks correct. We use qemuDomainObjBeginJob/qemuDomainObjEndJob to
> wrap the operation so no one will run another operation on that domain in
> parallel. In THREADS.txt this unlocking/locking is described explicitly for the
> cases when we need to wait/sleep. Here we use same means just to different
> purpose. Also called functions do not use domain internals in any way.
> 
> Second I find approach proposed by Michael difficult to implement correctly.
> Imagine we write such a generic rename function with callbacks. Then it would
> look like this:
> 
> lock domains list
> ...
> add new name to list
> unlock domain list
> 
> call callback to change driver state
> 
> lock domain list
> remove old or new name (depends on callback result)
> unlock domain list
> 
> We need to unlock domain list or we block operations on all domains while we
> wait for job conditions in the callback. But now we are in potential trouble as
> we leaving list in an inconsistent state. We can't move remove operation before
> calling callback or we will be in trouble on callback failures - we would need
> to add the old name back and this is not always possible. Current rename
> implementation leaves config with new name on disk in worst case on rollback
> but does not break the list.
> ---
>  src/qemu/qemu_driver.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index abcdbe6..26a8fb0 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -19987,6 +19987,7 @@ static int qemuDomainRename(virDomainPtr dom,
>      virObjectEventPtr event_new = NULL;
>      virObjectEventPtr event_old = NULL;
>      int ret = -1;
> +    int rc;
>      char *new_dom_name = NULL;
>      char *old_dom_name = NULL;
>      char *old_dom_cfg_file = NULL;
> @@ -20036,10 +20037,11 @@ static int qemuDomainRename(virDomainPtr dom,
>  
>      /*
>       * This is a rather racy check, but still better than reporting
> -     * internal error.  And since new_name != name here, there's no
> -     * deadlock imminent.
> +     * internal error.
>       */
> +    virObjectUnlock(vm);
>      tmp_dom = virDomainObjListFindByName(driver->domains, new_name);
> +    virObjectLock(vm);
>      if (tmp_dom) {
>          virObjectUnlock(tmp_dom);
>          virObjectUnref(tmp_dom);
> @@ -20057,7 +20059,10 @@ static int qemuDomainRename(virDomainPtr dom,
>          goto endjob;
>      }
>  
> -    if (virDomainObjListRenameAddNew(driver->domains, vm, new_name) < 0)
> +    virObjectUnlock(vm);
> +    rc = virDomainObjListRenameAddNew(driver->domains, vm, new_name);
> +    virObjectLock(vm);
> +    if (rc < 0)
>          goto endjob;
>  
>      event_old = virDomainEventLifecycleNewFromObj(vm,
> @@ -20081,7 +20086,9 @@ static int qemuDomainRename(virDomainPtr dom,
>      }
>  
>      /* Remove old domain name from table. */
> +    virObjectUnlock(vm);
>      virDomainObjListRenameRemove(driver->domains, old_dom_name);
> +    virObjectLock(vm);
>  
>      event_new = virDomainEventLifecycleNewFromObj(vm,
>                                                VIR_DOMAIN_EVENT_DEFINED,
> @@ -20110,7 +20117,9 @@ static int qemuDomainRename(virDomainPtr dom,
>          old_dom_name = NULL;
>      }
>  
> +    virObjectUnlock(vm);
>      virDomainObjListRenameRemove(driver->domains, new_name);
> +    virObjectLock(vm);
>      goto endjob;
>  }
>  
> 




More information about the libvir-list mailing list