[libvirt] [PATCHv6 5/8] blockjob: make drive-reopen safer

Tue Apr 24 02:49:27 UTC 2012

Since libvirt drops locks between issuing a monitor command and
getting a response, it is possible for libvirtd to be restarted
before getting a response on a drive-reopen command; worse, it is
also possible for the guest to shut itself down during the window
while libvirtd is down, ending the qemu process.  A management app
needs to know if the pivot happened (and the destination file
contains guest contents not in the source) or failed (and the source
file contains guest contents not in the destination), but since
the job is finished, 'query-block-jobs' no longer tracks the
status of the job, and if the qemu process itself has disappeared,
even 'query-block' cannot be checked to ask qemu its current state.
An upstream qemu proposal[1] suggested adding an optional witness
fd, passed in by 'getfd', where qemu writes one byte to that file at
the point of a successful pivot, and where restarting libvirtd
would check the contents of the witness file.  However, that
requires quite a bit of libvirt work[2], especially since libvirt
does not currently have a way to expose such a witness fd to the
management application; and it is not guaranteed that upstream qemu
will even accept the witness fd solution.
[1] https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01638.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg02293.html

Meanwhile, we can go with a simpler solution; it is not as elegant,
but is something that can be implemented right away without design
complications, and can be enhanced later if future qemu gives us
more power.  If we surround 'drive-reopen' with a pause/resume
pair, then we can guarantee that the guest cannot modify either
source or destination files in the window of libvirtd uncertainty,
and the management app is guaranteed that either libvirt knows the
outcome and reported it correctly; or that on libvirtd restart, the
guest will still be paused and that the qemu process cannot have
disappeared due to guest shutdown; and use that as a clue that the
management app must implement recovery protocol, with both source
and destination files still being in sync and with 'query-block'
still being an option as part of that recovery.  My testing of
early implementations of 'drive-reopen' show that the pause window
will typically be only a fraction of a second.

* src/qemu/qemu_driver.c (qemuDomainBlockPivot): Pause around
drive-reopen.
(qemuDomainBlockJobImpl): Update caller.
---

v6: new patch

 src/qemu/qemu_driver.c |   38 ++++++++++++++++++++++++++++++++++++--
 1 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index e562844..adbc27d 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -11645,12 +11645,14 @@ cleanup:
  * either success or failure (although there are some forms of
  * catastrophic failure that will leave the VM unusable).  */
 static int
-qemuDomainBlockPivot(struct qemud_driver *driver, virDomainObjPtr vm,
+qemuDomainBlockPivot(virConnectPtr conn,
+                     struct qemud_driver *driver, virDomainObjPtr vm,
                      const char *device, virDomainDiskDefPtr disk)
 {
     int ret = -1;
     qemuDomainObjPrivatePtr priv = vm->privateData;
     virDomainBlockJobInfo info;
+    bool resume = false;

     /* Probe the status, if needed.  */
     if (!disk->mirroring) {
@@ -11677,6 +11679,30 @@ qemuDomainBlockPivot(struct qemud_driver *driver, virDomainObjPtr vm,
         goto cleanup;
     }

+    /* Until 'drive-reopen' is part of a 'transaction', we want to
+     * make sure that management apps can tell whether the command
+     * succeeded, even if libvirtd is restarted at the wrong time.  To
+     * accomplish that, we pause the guest before drive-reopen, and
+     * resume it only when we know the outcome; if libvirtd restarts,
+     * then management will see the guest still paused, and know that
+     * no guest I/O has caused the source and mirror to diverge.
+     *
+     * XXX once 'drive-reopen' is part of 'transaction', then we don't
+     * need to do this; we will want to add an atomic flag that says
+     * whether to fail or fall back to this method for older qemu.  */
+    if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) {
+        if (qemuProcessStopCPUs(driver, vm, VIR_DOMAIN_PAUSED_SAVE,
+                                QEMU_ASYNC_JOB_NONE) < 0)
+            goto cleanup;
+
+        resume = true;
+        if (!virDomainObjIsActive(vm)) {
+            qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                            _("guest unexpectedly quit"));
+            goto cleanup;
+        }
+    }
+
     /* Attempt the pivot.  */
     qemuDomainObjEnterMonitorWithDriver(driver, vm);
     ret = qemuMonitorDriveReopen(priv->mon, device, disk->mirror,
@@ -11717,6 +11743,14 @@ qemuDomainBlockPivot(struct qemud_driver *driver, virDomainObjPtr vm,
     }

 cleanup:
+    if (resume && virDomainObjIsActive(vm) &&
+        qemuProcessStartCPUs(driver, vm, conn,
+                             VIR_DOMAIN_RUNNING_UNPAUSED,
+                             QEMU_ASYNC_JOB_NONE) < 0 &&
+        virGetLastError() == NULL) {
+        qemuReportError(VIR_ERR_OPERATION_FAILED, "%s",
+                        _("resuming after drive-reopen failed"));
+    }
     return ret;
 }

@@ -11795,7 +11829,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base,

     if (disk->mirror && mode == BLOCK_JOB_ABORT &&
         (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) {
-        ret = qemuDomainBlockPivot(driver, vm, device, disk);
+        ret = qemuDomainBlockPivot(dom->conn, driver, vm, device, disk);
         goto endjob;
     }

-- 
1.7.7.6