[libvirt] [PATCH 2/2] blockjob: properly track blockcopy xml changes on disk
chen.fan.fnst at cn.fujitsu.com
chen.fan.fnst at cn.fujitsu.com
Tue Jul 22 05:38:59 UTC 2014
On Mon, 2014-07-21 at 22:34 -0600, Eric Blake wrote:
> We were not directly saving the domain XML to file after starting
> or finishing a blockcopy. Without the startup write, a libvirtd
> restart in the middle of a copy job would forget that the job was
> underway. Then at pivot, we were indirectly writing new XML in
> reaction to events that occur as we stop and restart the guest CPUs.
> But there was a race: since pivot is an async action, it is possible
> that the guest is restarted before the pivot completes, so if XML
> changes during the event, that change was not written. The original
> blockcopy code cleared out the <mirror> element prior to restarting
> the CPUs, but this is also a race, observed if a user does an async
> pivot and a dumpxml before the event occurs. Furthermore, this race
> will interfere with active commit, because that code relies on the
> <mirror> element at the time of the qemu event to determine whether
> to inform the user of a normal commit or an active commit.
>
> Fix things by saving state any time we modify live XML, and by
> delaying XML modifications until after the event completes. We
> still need a solution for the case of libvirtd restarting in between
> requesting qemu to do the pivot and the actual event (that is, if
> libvirtd misses the event, but learns the job is complete thanks to
> a block query, then the updated state still needs to be updated in
> live XML), but that will be a later patch, in part because we want
> to start taking advantage of newer qemu's ability to keep the job
> around after completion rather than the current usage where the job
> disappears both on error and on success.
>
> * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Track XML change
> on disk.
> (qemuDomainBlockPivot): Move post-pivot XML rewrites...
> * src/qemu/qemu_process.c (qemuProcessHandleBlockJob): ...here,
> and expand to cover case of persistent vm.
>
> Signed-off-by: Eric Blake <eblake at redhat.com>
> ---
> src/qemu/qemu_driver.c | 46 ++++++++++++++++++---------------
> src/qemu/qemu_process.c | 69 +++++++++++++++++++++++++++++++++++++++++++------
> 2 files changed, 86 insertions(+), 29 deletions(-)
>
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 368e112..a5eaead 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -14843,39 +14841,41 @@ qemuDomainBlockPivot(virConnectPtr conn,
> virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, disk) < 0))
> goto cleanup;
>
> - /* Attempt the pivot. */
> + /* Attempt the pivot. We will update the domain later when qemu
> + * emits an event, telling us if we succeeded or failed.
> + * XXX On libvirtd restarts, if we missed the qemu event, we need
> + * to double check what state qemu is in.
> + * XXX We should be using qemu's rerror flag to make sure the job
> + * remains alive until we know it's final state.
> + * XXX If the abort command is synchronous but the qemu event is
> + * that the pivot failed, we need to reflect that failure into the
> + * overall return value. */
> qemuDomainObjEnterMonitor(driver, vm);
> ret = qemuMonitorDrivePivot(priv->mon, device, disk->mirror->path, format);
> qemuDomainObjExitMonitor(driver, vm);
>
> - if (ret == 0) {
> - /* XXX We want to revoke security labels and disk lease, as
> - * well as audit that revocation, before dropping the original
> - * source. But it gets tricky if both source and mirror share
> - * common backing files (we want to only revoke the non-shared
> - * portion of the chain, and is made more difficult by the
> - * fact that we aren't tracking the full chain ourselves; so
> - * for now, we leak the access to the original. */
> - virStorageSourceFree(oldsrc);
> - oldsrc = NULL;
> - } else {
> + if (ret < 0) {
> /* On failure, qemu abandons the mirror, and reverts back to
> * the source disk (RHEL 6.3 has a bug where the revert could
> * cause catastrophic failure in qemu, but we don't need to
> * worry about it here as it is not an upstream qemu problem. */
> /* XXX should we be parsing the exact qemu error, or calling
> * 'query-block', to see what state we really got left in
> - * before killing the mirroring job? And just as on the
> - * success case, there's security labeling to worry about. */
> + * before killing the mirroring job?
> + * XXX We want to revoke security labels and disk lease, as
> + * well as audit that revocation, before dropping the original
> + * source. But it gets tricky if both source and mirror share
> + * common backing files (we want to only revoke the non-shared
> + * portion of the chain); so for now, we leak the access to
> + * the original. */
> virStorageSourceFree(disk->mirror);
> - }
>
> - disk->mirror = NULL;
> - disk->mirroring = false;
> - disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN;
> + disk->mirror = NULL;
> + disk->mirroring = false;
> + disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN;
> + }
>
> cleanup:
> - /* revert to original disk def on failure */
> if (oldsrc)
> disk->src = oldsrc;
>
> @@ -15329,6 +15329,10 @@ qemuDomainBlockCopy(virDomainObjPtr vm,
> mirror = NULL;
> disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY;
>
> + if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0)
> + VIR_WARN("Unable to save status on vm %s after state change",
> + vm->def->name);
> +
> endjob:
> if (need_unlink && unlink(dest))
> VIR_WARN("unable to unlink just-created %s", dest);
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 65e9faf..3229f81 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -1016,6 +1016,7 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED,
> virObjectEventPtr event2 = NULL;
> const char *path;
> virDomainDiskDefPtr disk;
> + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
Hello Eric,
The cfg should be unref by virObjectUnref(cfg) at the end.
Thanks,
Chen
>
> virObjectLock(vm);
> disk = qemuProcessFindDomainDiskByAlias(vm, diskAlias);
> @@ -1030,15 +1031,67 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED,
> event = virDomainEventBlockJobNewFromObj(vm, path, type, status);
> event2 = virDomainEventBlockJob2NewFromObj(vm, disk->dst, type,
> status);
> - /* XXX If we completed a block pull or commit, then recompute
> - * the cached backing chain to match. Better would be storing
> - * the chain ourselves rather than reprobing, but we haven't
> - * quite completed that conversion to use our XML tracking. */
> - if ((type == VIR_DOMAIN_BLOCK_JOB_TYPE_PULL ||
> - type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT ||
> - type == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT) &&
> - status == VIR_DOMAIN_BLOCK_JOB_COMPLETED)
> +
> + /* If we completed a block pull or commit, then update the XML
> + * to match. */
> + if (status == VIR_DOMAIN_BLOCK_JOB_COMPLETED) {
> + if (disk->mirror) {
> + virDomainDiskDefPtr persistDisk = NULL;
> +
> + if (vm->newDef) {
> + int indx = virDomainDiskIndexByName(vm->newDef, disk->dst,
> + false);
> + virStorageSourcePtr copy = NULL;
> +
> + if (indx >= 0) {
> + persistDisk = vm->newDef->disks[indx];
> + copy = virStorageSourceCopy(disk->mirror, false);
> + if (virStorageSourceInitChainElement(copy,
> + persistDisk->src,
> + false) < 0) {
> + VIR_WARN("Unable to update persistent definition "
> + "on vm %s after block job",
> + vm->def->name);
> + virStorageSourceFree(copy);
> + copy = NULL;
> + persistDisk = NULL;
> + }
> + }
> + if (copy) {
> + virStorageSourceFree(persistDisk->src);
> + persistDisk->src = copy;
> + }
> + }
> +
> + /* XXX We want to revoke security labels and disk
> + * lease, as well as audit that revocation, before
> + * dropping the original source. But it gets tricky
> + * if both source and mirror share common backing
> + * files (we want to only revoke the non-shared
> + * portion of the chain); so for now, we leak the
> + * access to the original. */
> + virStorageSourceFree(disk->src);
> + disk->src = disk->mirror;
> + disk->mirror = NULL;
> + disk->mirroring = false;
> + disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN;
> +
> + if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir,
> + vm) < 0)
> + VIR_WARN("Unable to save status on vm %s after block job",
> + vm->def->name);
> + if (persistDisk && virDomainSaveConfig(cfg->configDir,
> + vm->newDef) < 0)
> + VIR_WARN("Unable to update persistent definition on vm %s "
> + "after block job", vm->def->name);
> + }
> + /* Recompute the cached backing chain to match our
> + * updates. Better would be storing the chain ourselves
> + * rather than reprobing, but we haven't quite completed
> + * that conversion to use our XML tracking. */
> qemuDomainDetermineDiskChain(driver, vm, disk, true);
> + }
> +
> if (disk->mirror &&
> (type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY ||
> type == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT)) {
More information about the libvir-list
mailing list