[libvirt] [PATCH 00/19] Rollback migration when libvirtd restarts

Jiri Denemark jdenemar at redhat.com
Wed Jul 13 07:16:11 UTC 2011


On Fri, Jul 08, 2011 at 01:34:05 +0200, Jiri Denemark wrote:
> This series is also available at
> https://gitorious.org/~jirka/libvirt/jirka-staging/commits/migration-recovery
> 
> The series does several things:
> - persists current job and its phase in status xml
> - allows safe monitor commands to be run during migration/save/dump jobs
> - implements recovery when libvirtd is restarted while a job is active
> - consolidates some code and fixes bugs I found when working in the area
> 
> The series is not perfect and still needs some corner cases to be fixed but I
> think it's better to send the series for review now and add small additional
> fixes in the next version(s) instead of waiting for it to be perfect.

OK, I pushed the following patches (01-08/19 and 13/19) which were already
acked. When doing so, I also updated documentation in src/qemu/THREADS.txt as
part of the "qemu: Allow all query commands to be run during long jobs" patch.
The diff to THREADS.txt is attached.

>   qemu: Separate job related data into a new object
>   qemu: Consolidate BeginJob{,WithDriver} into a single method
>   qemu: Consolidate {Enter,Exit}Monitor{,WithDriver}
>   qemu: Allow all query commands to be run during long jobs
>   qemu: Save job type in domain status XML
>   qemu: Recover from interrupted jobs
>   qemu: Add support for job phase
>   qemu: Consolidate qemuMigrationPrepare{Direct,Tunnel}
>   qemu: Fix monitor unlocking in some error paths

Jirka
-------------- next part --------------
diff --git a/src/qemu/THREADS.txt b/src/qemu/THREADS.txt
index 1e0b5ab..3a27a85 100644
--- a/src/qemu/THREADS.txt
+++ b/src/qemu/THREADS.txt
@@ -49,17 +49,39 @@ There are a number of locks on various objects
 
 
 
-  * qemuMonitorPrivatePtr: Job condition
+  * qemuMonitorPrivatePtr: Job conditions
 
     Since virDomainObjPtr lock must not be held during sleeps, the job
-    condition provides additional protection for code making updates.
+    conditions provide additional protection for code making updates.
+
+    Qemu driver uses two kinds of job conditions: asynchronous and
+    normal.
+
+    Asynchronous job condition is used for long running jobs (such as
+    migration) that consist of several monitor commands and it is
+    desirable to allow calling a limited set of other monitor commands
+    while such job is running.  This allows clients to, e.g., query
+    statistical data, cancel the job, or change parameters of the job.
+
+    Normal job condition is used by all other jobs to get exclusive
+    access to the monitor and also by every monitor command issued by an
+    asynchronous job.  When acquiring normal job condition, the job must
+    specify what kind of action it is about to take and this is checked
+    against the allowed set of jobs in case an asynchronous job is
+    running.  If the job is incompatible with current asynchronous job,
+    it needs to wait until the asynchronous job ends and try to acquire
+    the job again.
 
     Immediately after acquiring the virDomainObjPtr lock, any method
-    which intends to update state must acquire the job condition. The
-    virDomainObjPtr lock is released while blocking on this condition
-    variable. Once the job condition is acquired, a method can safely
-    release the virDomainObjPtr lock whenever it hits a piece of code
-    which may sleep/wait, and re-acquire it after the sleep/wait.
+    which intends to update state must acquire either asynchronous or
+    normal job condition.  The virDomainObjPtr lock is released while
+    blocking on these condition variables.  Once the job condition is
+    acquired, a method can safely release the virDomainObjPtr lock
+    whenever it hits a piece of code which may sleep/wait, and
+    re-acquire it after the sleep/wait.  Whenever an asynchronous job
+    wants to talk to the monitor, it needs to acquire nested job (a
+    special kind of normla job) to obtain exclusive access to the
+    monitor.
 
     Since the virDomainObjPtr lock was dropped while waiting for the
     job condition, it is possible that the domain is no longer active
@@ -111,31 +133,74 @@ To lock the virDomainObjPtr
 
 
 
-To acquire the job mutex
+To acquire the normal job condition
 
   qemuDomainObjBeginJob()           (if driver is unlocked)
     - Increments ref count on virDomainObjPtr
-    - Wait qemuDomainObjPrivate condition 'jobActive != 0' using
-      virDomainObjPtr mutex
-    - Sets jobActive to 1
+    - Waits until the job is compatible with current async job or no
+      async job is running
+    - Waits job.cond condition 'job.active != 0' using virDomainObjPtr
+      mutex
+    - Rechecks if the job is still compatible and repeats waiting if it
+      isn't
+    - Sets job.active to the job type
 
   qemuDomainObjBeginJobWithDriver() (if driver needs to be locked)
-    - Unlocks driver
     - Increments ref count on virDomainObjPtr
-    - Wait qemuDomainObjPrivate condition 'jobActive != 0' using
-      virDomainObjPtr mutex
-    - Sets jobActive to 1
+    - Unlocks driver
+    - Waits until the job is compatible with current async job or no
+      async job is running
+    - Waits job.cond condition 'job.active != 0' using virDomainObjPtr
+      mutex
+    - Rechecks if the job is still compatible and repeats waiting if it
+      isn't
+    - Sets job.active to the job type
     - Unlocks virDomainObjPtr
     - Locks driver
     - Locks virDomainObjPtr
 
-   NB: this variant is required in order to comply with lock ordering rules
-   for virDomainObjPtr vs driver
+   NB: this variant is required in order to comply with lock ordering
+   rules for virDomainObjPtr vs driver
 
 
   qemuDomainObjEndJob()
-    - Set jobActive to 0
-    - Signal on qemuDomainObjPrivate condition
+    - Sets job.active to 0
+    - Signals on job.cond condition
+    - Decrements ref count on virDomainObjPtr
+
+
+
+To acquire the asynchronous job condition
+
+  qemuDomainObjBeginAsyncJob()            (if driver is unlocked)
+    - Increments ref count on virDomainObjPtr
+    - Waits until no async job is running
+    - Waits job.cond condition 'job.active != 0' using virDomainObjPtr
+      mutex
+    - Rechecks if any async job was started while waiting on job.cond
+      and repeats waiting in that case
+    - Sets job.asyncJob to the asynchronous job type
+
+  qemuDomainObjBeginAsyncJobWithDriver()  (if driver needs to be locked)
+    - Increments ref count on virDomainObjPtr
+    - Unlocks driver
+    - Waits until no async job is running
+    - Waits job.cond condition 'job.active != 0' using virDomainObjPtr
+      mutex
+    - Rechecks if any async job was started while waiting on job.cond
+      and repeats waiting in that case
+    - Sets job.asyncJob to the asynchronous job type
+    - Unlocks virDomainObjPtr
+    - Locks driver
+    - Locks virDomainObjPtr
+
+   NB: this variant is required in order to comply with lock ordering
+   rules for virDomainObjPtr vs driver
+
+
+  qemuDomainObjEndAsyncJob()
+    - Sets job.asyncJob to 0
+    - Broadcasts on job.asyncCond condition
     - Decrements ref count on virDomainObjPtr
 
 
@@ -152,6 +217,11 @@ To acquire the QEMU monitor lock
 
   NB: caller must take care to drop the driver lock if necessary
 
+  These functions automatically begin/end nested job if called inside an
+  asynchronous job.  The caller must then check the return value of
+  qemuDomainObjEnterMonitor to detect if domain died while waiting on
+  the nested job.
+
 
 To acquire the QEMU monitor lock with the driver lock held
 
@@ -167,6 +237,11 @@ To acquire the QEMU monitor lock with the driver lock held
 
   NB: caller must take care to drop the driver lock if necessary
 
+  These functions automatically begin/end nested job if called inside an
+  asynchronous job.  The caller must then check the return value of
+  qemuDomainObjEnterMonitorWithDriver to detect if domain died while
+  waiting on the nested job.
+
 
 To keep a domain alive while waiting on a remote command, starting
 with the driver lock held
@@ -232,7 +307,7 @@ Design patterns
      obj = virDomainFindByUUID(driver->domains, dom->uuid);
      qemuDriverUnlock(driver);
 
-     qemuDomainObjBeginJob(obj);
+     qemuDomainObjBeginJob(obj, QEMU_JOB_TYPE);
 
      ...do work...
 
@@ -253,12 +328,12 @@ Design patterns
      obj = virDomainFindByUUID(driver->domains, dom->uuid);
      qemuDriverUnlock(driver);
 
-     qemuDomainObjBeginJob(obj);
+     qemuDomainObjBeginJob(obj, QEMU_JOB_TYPE);
 
      ...do prep work...
 
      if (virDomainObjIsActive(vm)) {
-         qemuDomainObjEnterMonitor(obj);
+         ignore_value(qemuDomainObjEnterMonitor(obj));
          qemuMonitorXXXX(priv->mon);
          qemuDomainObjExitMonitor(obj);
      }
@@ -280,12 +355,12 @@ Design patterns
      qemuDriverLock(driver);
      obj = virDomainFindByUUID(driver->domains, dom->uuid);
 
-     qemuDomainObjBeginJobWithDriver(obj);
+     qemuDomainObjBeginJobWithDriver(obj, QEMU_JOB_TYPE);
 
      ...do prep work...
 
      if (virDomainObjIsActive(vm)) {
-         qemuDomainObjEnterMonitorWithDriver(driver, obj);
+         ignore_value(qemuDomainObjEnterMonitorWithDriver(driver, obj));
          qemuMonitorXXXX(priv->mon);
          qemuDomainObjExitMonitorWithDriver(driver, obj);
      }
@@ -297,7 +372,47 @@ Design patterns
      qemuDriverUnlock(driver);
 
 
- * Coordinating with a remote server for migraion
+ * Running asynchronous job
+
+     virDomainObjPtr obj;
+     qemuDomainObjPrivatePtr priv;
+
+     qemuDriverLock(driver);
+     obj = virDomainFindByUUID(driver->domains, dom->uuid);
+
+     qemuDomainObjBeginAsyncJobWithDriver(obj, QEMU_ASYNC_JOB_TYPE);
+     qemuDomainObjSetAsyncJobMask(obj, allowedJobs);
+
+     ...do prep work...
+
+     if (qemuDomainObjEnterMonitorWithDriver(driver, obj) < 0) {
+         /* domain died in the meantime */
+         goto error;
+     }
+     ...start qemu job...
+     qemuDomainObjExitMonitorWithDriver(driver, obj);
+
+     while (!finished) {
+         if (qemuDomainObjEnterMonitorWithDriver(driver, obj) < 0) {
+             /* domain died in the meantime */
+             goto error;
+         }
+         ...monitor job progress...
+         qemuDomainObjExitMonitorWithDriver(driver, obj);
+
+         virDomainObjUnlock(obj);
+         sleep(aWhile);
+         virDomainObjLock(obj);
+     }
+
+     ...do final work...
+
+     qemuDomainObjEndAsyncJob(obj);
+     virDomainObjUnlock(obj);
+     qemuDriverUnlock(driver);
+
+
+ * Coordinating with a remote server for migration
 
      virDomainObjPtr obj;
      qemuDomainObjPrivatePtr priv;
@@ -305,7 +420,7 @@ Design patterns
      qemuDriverLock(driver);
      obj = virDomainFindByUUID(driver->domains, dom->uuid);
 
-     qemuDomainObjBeginJobWithDriver(obj);
+     qemuDomainObjBeginAsyncJobWithDriver(obj, QEMU_ASYNC_JOB_TYPE);
 
      ...do prep work...
 
@@ -322,7 +437,7 @@ Design patterns
 
      ...do final work...
 
-     qemuDomainObjEndJob(obj);
+     qemuDomainObjEndAsyncJob(obj);
      virDomainObjUnlock(obj);
      qemuDriverUnlock(driver);
 


More information about the libvir-list mailing list