[libvirt] [PATCH v2] qemu: qemuDomainRename and virDomainObjListNumOfDomains ABBA deadlock fix

Nikolay Shirokovskiy nshirokovskiy at virtuozzo.com
Sat Jan 30 15:03:43 UTC 2016


A pretty nasty deadlock occurs while trying to rename a VM in parallel
with virDomainObjListNumOfDomains.
The short description of the problem is as follows:

Thread #1:

qemuDomainRename:
    ------> aquires domain lock by qemuDomObjFromDomain
       ---------> waits for domain list lock in any of the listed functions:
          - virDomainObjListFindByName
          - virDomainObjListRenameAddNew
          - virDomainObjListRenameRemove

Thread #2:

virDomainObjListNumOfDomains:
    ------> aquires domain list lock
       ---------> waits for domain lock in virDomainObjListCount

Consider another solution for the deadlock besides those proposed by Maxim
and Michael. Simply unlock before calling any function that internally
grabs domains list lock.

First it looks correct. We use qemuDomainObjBeginJob/qemuDomainObjEndJob to
wrap the operation so no one will run another operation on that domain in
parallel. In THREADS.txt this unlocking/locking is described explicitly for the
cases when we need to wait/sleep. Here we use same means just to different
purpose. Also called functions do not use domain internals in any way.

Second I find approach proposed by Michael difficult to implement correctly.
Imagine we write such a generic rename function with callbacks. Then it would
look like this:

lock domains list
...
add new name to list
unlock domain list

call callback to change driver state

lock domain list
remove old or new name (depends on callback result)
unlock domain list

We need to unlock domain list or we block operations on all domains while we
wait for job conditions in the callback. But now we are in potential trouble as
we leaving list in an inconsistent state. We can't move remove operation before
calling callback or we will be in trouble on callback failures - we would need
to add the old name back and this is not always possible. Current rename
implementation leaves config with new name on disk in worst case on rollback
but does not break the list.
---
 src/qemu/qemu_driver.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index abcdbe6..26a8fb0 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -19987,6 +19987,7 @@ static int qemuDomainRename(virDomainPtr dom,
     virObjectEventPtr event_new = NULL;
     virObjectEventPtr event_old = NULL;
     int ret = -1;
+    int rc;
     char *new_dom_name = NULL;
     char *old_dom_name = NULL;
     char *old_dom_cfg_file = NULL;
@@ -20036,10 +20037,11 @@ static int qemuDomainRename(virDomainPtr dom,
 
     /*
      * This is a rather racy check, but still better than reporting
-     * internal error.  And since new_name != name here, there's no
-     * deadlock imminent.
+     * internal error.
      */
+    virObjectUnlock(vm);
     tmp_dom = virDomainObjListFindByName(driver->domains, new_name);
+    virObjectLock(vm);
     if (tmp_dom) {
         virObjectUnlock(tmp_dom);
         virObjectUnref(tmp_dom);
@@ -20057,7 +20059,10 @@ static int qemuDomainRename(virDomainPtr dom,
         goto endjob;
     }
 
-    if (virDomainObjListRenameAddNew(driver->domains, vm, new_name) < 0)
+    virObjectUnlock(vm);
+    rc = virDomainObjListRenameAddNew(driver->domains, vm, new_name);
+    virObjectLock(vm);
+    if (rc < 0)
         goto endjob;
 
     event_old = virDomainEventLifecycleNewFromObj(vm,
@@ -20081,7 +20086,9 @@ static int qemuDomainRename(virDomainPtr dom,
     }
 
     /* Remove old domain name from table. */
+    virObjectUnlock(vm);
     virDomainObjListRenameRemove(driver->domains, old_dom_name);
+    virObjectLock(vm);
 
     event_new = virDomainEventLifecycleNewFromObj(vm,
                                               VIR_DOMAIN_EVENT_DEFINED,
@@ -20110,7 +20117,9 @@ static int qemuDomainRename(virDomainPtr dom,
         old_dom_name = NULL;
     }
 
+    virObjectUnlock(vm);
     virDomainObjListRenameRemove(driver->domains, new_name);
+    virObjectLock(vm);
     goto endjob;
 }
 
-- 
1.8.3.1




More information about the libvir-list mailing list