[libvirt] [PATCH v5 2/4] blockjob: properly track blockcopy xml changes on disk

Peter Krempa pkrempa at redhat.com
Tue Jul 29 11:40:43 UTC 2014


On 07/29/14 06:20, Eric Blake wrote:
> We were not directly saving the domain XML to file after starting
> or finishing a blockcopy.  Without the startup write, a libvirtd
> restart in the middle of a copy job would forget that the job was
> underway.  Then at pivot, we were indirectly writing new XML in
> reaction to events that occur as we stop and restart the guest CPUs.
> But there was a race: since pivot is an async action, it is possible
> that libvirtd is restarted before the pivot completes, so if XML
> changes during the event, that change was not written.  The original
> blockcopy code cleared out the <mirror> element prior to restarting
> the CPUs, but this is also a race, observed if a user does an async
> pivot and a dumpxml before the event occurs.  Furthermore, this race
> will interfere with active commit in a future patch, because that
> code will rely on the <mirror> element at the time of the qemu event
> to determine whether to inform the user of a normal commit or an
> active commit.
> 
> Fix things by saving state any time we modify live XML, while
> delaying XML disk modifications until after the event completes.  We
> still need a to teach libvirtd restarts to examine all existing
> <mirror> elements to see if the job completed in the meantime (that
> is, if libvirtd misses the event, the updated state still needs to be
> updated in live XML), but that will be a later patch, in part because
> we also need to to start taking advantage of newer qemu's ability to
> keep the job around after completion rather than the current usage
> where the job disappears both on error and on success.
> 
> * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Track XML change
> on disk.
> (qemuDomainBlockJobImpl, qemuDomainBlockPivot): Move job-end XML
> rewrites...
> * src/qemu/qemu_process.c (qemuProcessHandleBlockJob): ...here.
> 
> Signed-off-by: Eric Blake <eblake at redhat.com>
> ---
>  src/qemu/qemu_driver.c  | 121 ++++++++++++++++++++++++++++++------------------
>  src/qemu/qemu_process.c |  53 ++++++++++++++++-----
>  2 files changed, 117 insertions(+), 57 deletions(-)
> 
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index acc9ef0..180333c 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -14911,38 +14911,43 @@ qemuDomainBlockPivot(virConnectPtr conn,
>           virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, disk) < 0))
>          goto cleanup;
> 
> -    /* Attempt the pivot.  */
> +    /* Attempt the pivot.  Record the attempt now, to prevent duplicate
> +     * attempts; but the actual disk change will be made when emitting
> +     * the event.
> +     * XXX On libvirtd restarts, if we missed the qemu event, we need
> +     * to double check what state qemu is in.
> +     * XXX We should be using qemu's rerror flag to make sure the job
> +     * remains alive until we know it's final state.
> +     * XXX If the abort command is synchronous but the qemu event says
> +     * that pivot failed, we need to reflect that failure into the
> +     * overall return value.  */
> +    disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_PIVOT;
>      qemuDomainObjEnterMonitor(driver, vm);
>      ret = qemuMonitorDrivePivot(priv->mon, device, disk->mirror->path, format);
>      qemuDomainObjExitMonitor(driver, vm);
> 
> -    if (ret == 0) {
> -        /* XXX We want to revoke security labels and disk lease, as
> -         * well as audit that revocation, before dropping the original
> -         * source.  But it gets tricky if both source and mirror share
> -         * common backing files (we want to only revoke the non-shared
> -         * portion of the chain, and is made more difficult by the
> -         * fact that we aren't tracking the full chain ourselves; so
> -         * for now, we leak the access to the original.  */
> -        virStorageSourceFree(oldsrc);
> -        oldsrc = NULL;
> -    } else {
> +    if (ret < 0) {
>          /* On failure, qemu abandons the mirror, and reverts back to
>           * the source disk (RHEL 6.3 has a bug where the revert could
>           * cause catastrophic failure in qemu, but we don't need to
>           * worry about it here as it is not an upstream qemu problem.  */
>          /* XXX should we be parsing the exact qemu error, or calling
>           * 'query-block', to see what state we really got left in
> -         * before killing the mirroring job?  And just as on the
> -         * success case, there's security labeling to worry about.  */
> +         * before killing the mirroring job?
> +         * XXX We want to revoke security labels and disk lease, as
> +         * well as audit that revocation, before dropping the original
> +         * source.  But it gets tricky if both source and mirror share
> +         * common backing files (we want to only revoke the non-shared
> +         * portion of the chain); so for now, we leak the access to
> +         * the original.  */

Libvirt has now the option to revert labeling on individual images, but
that can be done later.


>          virStorageSourceFree(disk->mirror);
> +        disk->mirror = NULL;
> +        disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT;
>      }
> -
> -    disk->mirror = NULL;
> -    disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT;
> +    if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0)
> +        ret = -1;
> 
>   cleanup:
> -    /* revert to original disk def on failure */
>      if (oldsrc)
>          disk->src = oldsrc;
> 
> @@ -14985,6 +14990,8 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm,
>      unsigned int baseIndex = 0;
>      char *basePath = NULL;
>      char *backingPath = NULL;
> +    virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
> +    bool save = false;
> 
>      if (!virDomainObjIsActive(vm)) {
>          virReportError(VIR_ERR_OPERATION_INVALID, "%s",
> @@ -15038,19 +15045,30 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm,
>                         disk->dst);
>          goto endjob;
>      }
> -    if (mode == BLOCK_JOB_ABORT &&
> -        (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT) &&
> -        !(async && disk->mirror)) {
> -        virReportError(VIR_ERR_OPERATION_INVALID,
> -                       _("pivot of disk '%s' requires an active copy job"),
> -                       disk->dst);
> -        goto endjob;
> -    }
> +    if (mode == BLOCK_JOB_ABORT) {
> +        if ((flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT) &&
> +            !(async && disk->mirror)) {
> +            virReportError(VIR_ERR_OPERATION_INVALID,
> +                           _("pivot of disk '%s' requires an active copy job"),
> +                           disk->dst);
> +            goto endjob;
> +        }
> +        if (disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_DEFAULT &&
> +            disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_READY) {
> +            virReportError(VIR_ERR_OPERATION_INVALID,
> +                           _("another job on disk '%s' is still being ended"),
> +                           disk->dst);
> +            goto endjob;
> +        }
> 
> -    if (disk->mirror && mode == BLOCK_JOB_ABORT &&
> -        (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) {
> -        ret = qemuDomainBlockPivot(conn, driver, vm, device, disk);
> -        goto waitjob;
> +        if (disk->mirror && (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) {
> +            ret = qemuDomainBlockPivot(conn, driver, vm, device, disk);
> +            goto waitjob;
> +        }
> +        if (disk->mirror) {
> +            disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_ABORT;
> +            save = true;
> +        }
>      }
> 
>      if (base &&
> @@ -15089,36 +15107,40 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm,
>      ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath,
>                                bandwidth, info, mode, async);
>      qemuDomainObjExitMonitor(driver, vm);
> -    if (ret < 0)
> +    if (ret < 0) {
> +        if (mode == BLOCK_JOB_ABORT && disk->mirror)
> +            disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT;
>          goto endjob;
> +    }
> 
>      /* Snoop block copy operations, so future cancel operations can
>       * avoid checking if pivot is safe.  */
>      if (mode == BLOCK_JOB_INFO && ret == 1 && disk->mirror &&
> -        info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY)
> +        info->cur == info->end && !disk->mirrorState) {
>          disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY;
> -
> -    /* A successful block job cancelation stops any mirroring.  */
> -    if (mode == BLOCK_JOB_ABORT && disk->mirror) {
> -        /* XXX We should also revoke security labels and disk lease on
> -         * the mirror, and audit that fact, before dropping things.  */
> -        virStorageSourceFree(disk->mirror);
> -        disk->mirror = NULL;
> -        disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT;
> +        save = true;
>      }
> 
>   waitjob:
> +    /* If we have made changes to XML due to a copy job, make a best
> +     * effort to save it now.  But we can ignore failure, since there
> +     * will be further changes when the event marks completion.  */
> +    if (save)
> +        ignore_value(virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm));
> +
>      /* With synchronous block cancel, we must synthesize an event, and
>       * we silently ignore the ABORT_ASYNC flag.  With asynchronous
> -     * block cancel, the event will come from qemu, but without the
> -     * ABORT_ASYNC flag, we must block to guarantee synchronous
> -     * operation.  We do the waiting while still holding the VM job,
> -     * to prevent newly scheduled block jobs from confusing us.  */
> +     * block cancel, the event will come from qemu and will update the
> +     * XML as appropriate, but without the ABORT_ASYNC flag, we must
> +     * block to guarantee synchronous operation.  We do the waiting
> +     * while still holding the VM job, to prevent newly scheduled
> +     * block jobs from confusing us.  */
>      if (mode == BLOCK_JOB_ABORT) {
>          if (!async) {
>              /* Older qemu that lacked async reporting also lacked
> -             * active commit, so we can hardcode the event to pull.
> -             * We have to generate two variants of the event. */
> +             * blockcopy and active commit, so we can hardcode the
> +             * event to pull, and we know the XML doesn't need
> +             * updating.  We have to generate two event variants.  */
>              int type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL;
>              int status = VIR_DOMAIN_BLOCK_JOB_CANCELED;
>              event = virDomainEventBlockJobNewFromObj(vm, disk->src->path, type,
> @@ -15126,6 +15148,8 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm,
>              event2 = virDomainEventBlockJob2NewFromObj(vm, disk->dst, type,
>                                                         status);
>          } else if (!(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) {
> +            /* XXX If the event reports failure, we should reflect
> +             * that back into the return status of this API call.  */
>              while (1) {
>                  /* Poll every 50ms */
>                  static struct timespec ts = { .tv_sec = 0,
> @@ -15163,6 +15187,7 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm,
>      }
> 
>   cleanup:
> +    virObjectUnref(cfg);
>      VIR_FREE(basePath);
>      VIR_FREE(backingPath);
>      VIR_FREE(device);
> @@ -15394,6 +15419,10 @@ qemuDomainBlockCopy(virDomainObjPtr vm,
>      disk->mirror = mirror;
>      mirror = NULL;
> 
> +    if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0)
> +        VIR_WARN("Unable to save status on vm %s after state change",
> +                 vm->def->name);
> +
>   endjob:
>      if (need_unlink && unlink(dest))
>          VIR_WARN("unable to unlink just-created %s", dest);
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 73ad27d..e770a77 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -1016,6 +1016,8 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED,
>      virObjectEventPtr event2 = NULL;
>      const char *path;
>      virDomainDiskDefPtr disk;
> +    virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
> +    bool save = false;
> 
>      virObjectLock(vm);
>      disk = qemuProcessFindDomainDiskByAlias(vm, diskAlias);
> @@ -1027,29 +1029,58 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED,
>          event = virDomainEventBlockJobNewFromObj(vm, path, type, status);
>          event2 = virDomainEventBlockJob2NewFromObj(vm, disk->dst, type,
>                                                     status);
> -        /* XXX If we completed a block pull or commit, then recompute
> -         * the cached backing chain to match.  Better would be storing
> -         * the chain ourselves rather than reprobing, but this
> -         * requires modifying domain_conf and our XML to fully track
> -         * the chain across libvirtd restarts.  For that matter, if
> -         * qemu gains support for committing the active layer, we have
> -         * to update disk->src.  */
> -        if ((type == VIR_DOMAIN_BLOCK_JOB_TYPE_PULL ||
> -             type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT) &&
> -            status == VIR_DOMAIN_BLOCK_JOB_COMPLETED)
> +
> +        /* If we completed a block pull or commit, then update the XML
> +         * to match.  */
> +        if (status == VIR_DOMAIN_BLOCK_JOB_COMPLETED) {
> +            bool has_mirror = !!disk->mirror;


The control flow is becoming less obvious in this function.

> +
> +            if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_PIVOT) {
> +                /* XXX We want to revoke security labels and disk
> +                 * lease, as well as audit that revocation, before
> +                 * dropping the original source.  But it gets tricky
> +                 * if both source and mirror share common backing
> +                 * files (we want to only revoke the non-shared
> +                 * portion of the chain); so for now, we leak the
> +                 * access to the original.  */
> +                virStorageSourceFree(disk->src);
> +                disk->src = disk->mirror;
> +                disk->mirror = NULL;

If we were here, then

> +            }
> +            if (has_mirror) {

This condition is also true and we try to free the source of the mirror
again, even if we did it above. On the other hand, this is reached even
if the _ABORT job is completed, where this makes sense.

> +                virStorageSourceFree(disk->mirror);
> +                disk->mirror = NULL;
> +                save = disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_DEFAULT;
> +                disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT;
> +            }
> +            /* Recompute the cached backing chain to match our
> +             * updates.  Better would be storing the chain ourselves
> +             * rather than reprobing, but we haven't quite completed
> +             * that conversion to use our XML tracking. */
>              qemuDomainDetermineDiskChain(driver, vm, disk, true);
> -        if (disk->mirror && type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) {
> +        }
> +
> +        if (disk->mirror &&

Now this really is an else section to the above as both paths above
clear disk->mirror.

> +            type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) {
>              if (status == VIR_DOMAIN_BLOCK_JOB_READY) {
>                  disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY;
> +                save = true;
>              } else if (status == VIR_DOMAIN_BLOCK_JOB_FAILED) {
>                  virStorageSourceFree(disk->mirror);
>                  disk->mirror = NULL;
>                  disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT;
> +                save = true;
>              }
>          }
>      }
> 
> +    if (save) {

Save is true if this function was reached due to a mirroring operation
change ...

> +        if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0)
> +            VIR_WARN("Unable to save status on vm %s after block job",
> +                     vm->def->name);
> +    }
>      virObjectUnlock(vm);
> +    virObjectUnref(cfg);
> 
>      if (event)
>          qemuDomainEventQueue(driver, event);
> 


ACK, the function flow isn't obvious but it's correct from my POV

Peter

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 884 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20140729/0b9d943a/attachment-0001.sig>


More information about the libvir-list mailing list