[libvirt] [PATCH] qemu: Allow migration to be cancelled at any phase

Jiri Denemark jdenemar at redhat.com
Thu Nov 8 10:49:02 UTC 2012


On Thu, Nov 08, 2012 at 11:00:40 +0100, Michal Privoznik wrote:
> On 07.11.2012 23:23, Jiri Denemark wrote:
> > 
> > AbortJob API was never a synchronous one and I don't think we need to change
> > it. Not to mention that this code unlocks and locks vm without holding an
> > extra reference to it. And even if it did so, it cannot guarantee that the job
> > it's checking in the loop is the same one it tried to cancel. It's quite
> > unlikely but the original job could have finished and another one could have
> > been started while this code was sleeping.
> 
> However, AbortJob is documented as synchronous. If current implementation doesn't
> follow docs it is a bug. On the other hand, I don't recall anybody screaming about
> it so far. But that means nothing, right?

IIRC the implementation was never synchronous in which case I think we may
just fix the documentation to match reality :-)

> >> @@ -1172,6 +1172,12 @@ qemuMigrationUpdateJobStatus(struct qemud_driver *driver,
> >>      }
> >>      priv->job.info.timeElapsed -= priv->job.start;
> >>  
> >> +    if (qemuDomainObjAbortAsyncJobRequested(vm)) {
> >> +        VIR_DEBUG("Migration abort requested. Translating "
> >> +                  "status to MIGRATION_STATUS_CANCELLED");
> >> +        status = QEMU_MONITOR_MIGRATION_STATUS_CANCELLED;
> >> +    }
> >> +
> > 
> > This check seems to be to late and that complicates the code later. If we keep
> > qemuDomainAbortJob calling qemuMonitorMigrateCancel in qemuDomainAbortJob, we
> > may count on that and check priv.job.asyncAbort before entering the monitor
> > to tell QEMU to start migrating in qemuMigrationRun. If the abort is not
> > requested by then, it may only happen after we call qemuMonitorMigrateTo* and
> > in that case the migration will be properly aborted by
> > qemuMonitorMigrateCancel.
> > 
> 
> Not really. The issue I've seen was like this:
> 
>    Thread A					Thread B
> 1. start async migration out job
> 2. 						Since job is set, abort it by calling 'migrate_cancel'
> 3.						return to user
> 4. issue 'migrate' on the monitor
> 
> And I think your suggestion makes it just harder to hit this race and not really avoid it.
> However with my code, we are guaranteed that 'migrate_cancel' will have an effect.

Oh right, we actually need to check priv.job.asyncAbort after entering the
monitor (since we may be preempted while waiting for entering the monitor) but
still before calling qemuMonitorMigrateTo* that actually starts the monitor.
When we entered the monitor, the AbortJob must have already been there or it
will come after we leave the monitor.

Jirka




More information about the libvir-list mailing list