[libvirt] [PATCH 00/19] Rollback migration when libvirtd restarts
Jiri Denemark
jdenemar at redhat.com
Wed Jul 13 07:16:11 UTC 2011
On Fri, Jul 08, 2011 at 01:34:05 +0200, Jiri Denemark wrote:
> This series is also available at
> https://gitorious.org/~jirka/libvirt/jirka-staging/commits/migration-recovery
>
> The series does several things:
> - persists current job and its phase in status xml
> - allows safe monitor commands to be run during migration/save/dump jobs
> - implements recovery when libvirtd is restarted while a job is active
> - consolidates some code and fixes bugs I found when working in the area
>
> The series is not perfect and still needs some corner cases to be fixed but I
> think it's better to send the series for review now and add small additional
> fixes in the next version(s) instead of waiting for it to be perfect.
OK, I pushed the following patches (01-08/19 and 13/19) which were already
acked. When doing so, I also updated documentation in src/qemu/THREADS.txt as
part of the "qemu: Allow all query commands to be run during long jobs" patch.
The diff to THREADS.txt is attached.
> qemu: Separate job related data into a new object
> qemu: Consolidate BeginJob{,WithDriver} into a single method
> qemu: Consolidate {Enter,Exit}Monitor{,WithDriver}
> qemu: Allow all query commands to be run during long jobs
> qemu: Save job type in domain status XML
> qemu: Recover from interrupted jobs
> qemu: Add support for job phase
> qemu: Consolidate qemuMigrationPrepare{Direct,Tunnel}
> qemu: Fix monitor unlocking in some error paths
Jirka
-------------- next part --------------
diff --git a/src/qemu/THREADS.txt b/src/qemu/THREADS.txt
index 1e0b5ab..3a27a85 100644
--- a/src/qemu/THREADS.txt
+++ b/src/qemu/THREADS.txt
@@ -49,17 +49,39 @@ There are a number of locks on various objects
- * qemuMonitorPrivatePtr: Job condition
+ * qemuMonitorPrivatePtr: Job conditions
Since virDomainObjPtr lock must not be held during sleeps, the job
- condition provides additional protection for code making updates.
+ conditions provide additional protection for code making updates.
+
+ Qemu driver uses two kinds of job conditions: asynchronous and
+ normal.
+
+ Asynchronous job condition is used for long running jobs (such as
+ migration) that consist of several monitor commands and it is
+ desirable to allow calling a limited set of other monitor commands
+ while such job is running. This allows clients to, e.g., query
+ statistical data, cancel the job, or change parameters of the job.
+
+ Normal job condition is used by all other jobs to get exclusive
+ access to the monitor and also by every monitor command issued by an
+ asynchronous job. When acquiring normal job condition, the job must
+ specify what kind of action it is about to take and this is checked
+ against the allowed set of jobs in case an asynchronous job is
+ running. If the job is incompatible with current asynchronous job,
+ it needs to wait until the asynchronous job ends and try to acquire
+ the job again.
Immediately after acquiring the virDomainObjPtr lock, any method
- which intends to update state must acquire the job condition. The
- virDomainObjPtr lock is released while blocking on this condition
- variable. Once the job condition is acquired, a method can safely
- release the virDomainObjPtr lock whenever it hits a piece of code
- which may sleep/wait, and re-acquire it after the sleep/wait.
+ which intends to update state must acquire either asynchronous or
+ normal job condition. The virDomainObjPtr lock is released while
+ blocking on these condition variables. Once the job condition is
+ acquired, a method can safely release the virDomainObjPtr lock
+ whenever it hits a piece of code which may sleep/wait, and
+ re-acquire it after the sleep/wait. Whenever an asynchronous job
+ wants to talk to the monitor, it needs to acquire nested job (a
+ special kind of normla job) to obtain exclusive access to the
+ monitor.
Since the virDomainObjPtr lock was dropped while waiting for the
job condition, it is possible that the domain is no longer active
@@ -111,31 +133,74 @@ To lock the virDomainObjPtr
-To acquire the job mutex
+To acquire the normal job condition
qemuDomainObjBeginJob() (if driver is unlocked)
- Increments ref count on virDomainObjPtr
- - Wait qemuDomainObjPrivate condition 'jobActive != 0' using
- virDomainObjPtr mutex
- - Sets jobActive to 1
+ - Waits until the job is compatible with current async job or no
+ async job is running
+ - Waits job.cond condition 'job.active != 0' using virDomainObjPtr
+ mutex
+ - Rechecks if the job is still compatible and repeats waiting if it
+ isn't
+ - Sets job.active to the job type
qemuDomainObjBeginJobWithDriver() (if driver needs to be locked)
- - Unlocks driver
- Increments ref count on virDomainObjPtr
- - Wait qemuDomainObjPrivate condition 'jobActive != 0' using
- virDomainObjPtr mutex
- - Sets jobActive to 1
+ - Unlocks driver
+ - Waits until the job is compatible with current async job or no
+ async job is running
+ - Waits job.cond condition 'job.active != 0' using virDomainObjPtr
+ mutex
+ - Rechecks if the job is still compatible and repeats waiting if it
+ isn't
+ - Sets job.active to the job type
- Unlocks virDomainObjPtr
- Locks driver
- Locks virDomainObjPtr
- NB: this variant is required in order to comply with lock ordering rules
- for virDomainObjPtr vs driver
+ NB: this variant is required in order to comply with lock ordering
+ rules for virDomainObjPtr vs driver
qemuDomainObjEndJob()
- - Set jobActive to 0
- - Signal on qemuDomainObjPrivate condition
+ - Sets job.active to 0
+ - Signals on job.cond condition
+ - Decrements ref count on virDomainObjPtr
+
+
+
+To acquire the asynchronous job condition
+
+ qemuDomainObjBeginAsyncJob() (if driver is unlocked)
+ - Increments ref count on virDomainObjPtr
+ - Waits until no async job is running
+ - Waits job.cond condition 'job.active != 0' using virDomainObjPtr
+ mutex
+ - Rechecks if any async job was started while waiting on job.cond
+ and repeats waiting in that case
+ - Sets job.asyncJob to the asynchronous job type
+
+ qemuDomainObjBeginAsyncJobWithDriver() (if driver needs to be locked)
+ - Increments ref count on virDomainObjPtr
+ - Unlocks driver
+ - Waits until no async job is running
+ - Waits job.cond condition 'job.active != 0' using virDomainObjPtr
+ mutex
+ - Rechecks if any async job was started while waiting on job.cond
+ and repeats waiting in that case
+ - Sets job.asyncJob to the asynchronous job type
+ - Unlocks virDomainObjPtr
+ - Locks driver
+ - Locks virDomainObjPtr
+
+ NB: this variant is required in order to comply with lock ordering
+ rules for virDomainObjPtr vs driver
+
+
+ qemuDomainObjEndAsyncJob()
+ - Sets job.asyncJob to 0
+ - Broadcasts on job.asyncCond condition
- Decrements ref count on virDomainObjPtr
@@ -152,6 +217,11 @@ To acquire the QEMU monitor lock
NB: caller must take care to drop the driver lock if necessary
+ These functions automatically begin/end nested job if called inside an
+ asynchronous job. The caller must then check the return value of
+ qemuDomainObjEnterMonitor to detect if domain died while waiting on
+ the nested job.
+
To acquire the QEMU monitor lock with the driver lock held
@@ -167,6 +237,11 @@ To acquire the QEMU monitor lock with the driver lock held
NB: caller must take care to drop the driver lock if necessary
+ These functions automatically begin/end nested job if called inside an
+ asynchronous job. The caller must then check the return value of
+ qemuDomainObjEnterMonitorWithDriver to detect if domain died while
+ waiting on the nested job.
+
To keep a domain alive while waiting on a remote command, starting
with the driver lock held
@@ -232,7 +307,7 @@ Design patterns
obj = virDomainFindByUUID(driver->domains, dom->uuid);
qemuDriverUnlock(driver);
- qemuDomainObjBeginJob(obj);
+ qemuDomainObjBeginJob(obj, QEMU_JOB_TYPE);
...do work...
@@ -253,12 +328,12 @@ Design patterns
obj = virDomainFindByUUID(driver->domains, dom->uuid);
qemuDriverUnlock(driver);
- qemuDomainObjBeginJob(obj);
+ qemuDomainObjBeginJob(obj, QEMU_JOB_TYPE);
...do prep work...
if (virDomainObjIsActive(vm)) {
- qemuDomainObjEnterMonitor(obj);
+ ignore_value(qemuDomainObjEnterMonitor(obj));
qemuMonitorXXXX(priv->mon);
qemuDomainObjExitMonitor(obj);
}
@@ -280,12 +355,12 @@ Design patterns
qemuDriverLock(driver);
obj = virDomainFindByUUID(driver->domains, dom->uuid);
- qemuDomainObjBeginJobWithDriver(obj);
+ qemuDomainObjBeginJobWithDriver(obj, QEMU_JOB_TYPE);
...do prep work...
if (virDomainObjIsActive(vm)) {
- qemuDomainObjEnterMonitorWithDriver(driver, obj);
+ ignore_value(qemuDomainObjEnterMonitorWithDriver(driver, obj));
qemuMonitorXXXX(priv->mon);
qemuDomainObjExitMonitorWithDriver(driver, obj);
}
@@ -297,7 +372,47 @@ Design patterns
qemuDriverUnlock(driver);
- * Coordinating with a remote server for migraion
+ * Running asynchronous job
+
+ virDomainObjPtr obj;
+ qemuDomainObjPrivatePtr priv;
+
+ qemuDriverLock(driver);
+ obj = virDomainFindByUUID(driver->domains, dom->uuid);
+
+ qemuDomainObjBeginAsyncJobWithDriver(obj, QEMU_ASYNC_JOB_TYPE);
+ qemuDomainObjSetAsyncJobMask(obj, allowedJobs);
+
+ ...do prep work...
+
+ if (qemuDomainObjEnterMonitorWithDriver(driver, obj) < 0) {
+ /* domain died in the meantime */
+ goto error;
+ }
+ ...start qemu job...
+ qemuDomainObjExitMonitorWithDriver(driver, obj);
+
+ while (!finished) {
+ if (qemuDomainObjEnterMonitorWithDriver(driver, obj) < 0) {
+ /* domain died in the meantime */
+ goto error;
+ }
+ ...monitor job progress...
+ qemuDomainObjExitMonitorWithDriver(driver, obj);
+
+ virDomainObjUnlock(obj);
+ sleep(aWhile);
+ virDomainObjLock(obj);
+ }
+
+ ...do final work...
+
+ qemuDomainObjEndAsyncJob(obj);
+ virDomainObjUnlock(obj);
+ qemuDriverUnlock(driver);
+
+
+ * Coordinating with a remote server for migration
virDomainObjPtr obj;
qemuDomainObjPrivatePtr priv;
@@ -305,7 +420,7 @@ Design patterns
qemuDriverLock(driver);
obj = virDomainFindByUUID(driver->domains, dom->uuid);
- qemuDomainObjBeginJobWithDriver(obj);
+ qemuDomainObjBeginAsyncJobWithDriver(obj, QEMU_ASYNC_JOB_TYPE);
...do prep work...
@@ -322,7 +437,7 @@ Design patterns
...do final work...
- qemuDomainObjEndJob(obj);
+ qemuDomainObjEndAsyncJob(obj);
virDomainObjUnlock(obj);
qemuDriverUnlock(driver);
More information about the libvir-list
mailing list