[libvirt] [PATCHv2 09/20] qemu: Fix possible race when pausing guest

Eric Blake eblake at redhat.com
Fri Nov 2 17:59:59 UTC 2012


On 11/01/2012 10:22 AM, Peter Krempa wrote:
> When pausing the guest while migration is running (to speed up
> convergence) the virDomainSuspend API checks if the migration job is
> active before entering the job. This could cause a possible race if the
> virDomainSuspend is called while the job is active but ends before the
> Suspend API enters the job (this would require that the migration is
> aborted). This would cause a incorrect event to be emitted.

Not just aborted, but also completed normally.  At any rate, I agree
that we shouldn't be relying on the state of job.asyncJob until we have
obtained the sync job.

> ---
>  src/qemu/qemu_driver.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 7dedb96..98dc441 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -1658,14 +1658,6 @@ static int qemudDomainSuspend(virDomainPtr dom) {
> 
>      priv = vm->privateData;
> 
> -    if (priv->job.asyncJob == QEMU_ASYNC_JOB_MIGRATION_OUT) {
> -        reason = VIR_DOMAIN_PAUSED_MIGRATION;
> -        eventDetail = VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED;
> -    } else {
> -        reason = VIR_DOMAIN_PAUSED_USER;
> -        eventDetail = VIR_DOMAIN_EVENT_SUSPENDED_PAUSED;
> -    }
> -
>      if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_SUSPEND) < 0)
>          goto cleanup;
> 
> @@ -1675,6 +1667,14 @@ static int qemudDomainSuspend(virDomainPtr dom) {
>          goto endjob;
>      }
> 
> +    if (priv->job.asyncJob == QEMU_ASYNC_JOB_MIGRATION_OUT) {
> +        reason = VIR_DOMAIN_PAUSED_MIGRATION;
> +        eventDetail = VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED;
> +    } else {
> +        reason = VIR_DOMAIN_PAUSED_USER;
> +        eventDetail = VIR_DOMAIN_EVENT_SUSPENDED_PAUSED;
> +    }
> +

I think we're missing something.  The idea here is that if the user
starts an async job, then they can request pause but not resume until
that async job has completed.  How do we make sure resume is blocked?
By setting reason to something other than VIR_DOMAIN_PAUSED_USER.

The existing code worked for outgoing live migration by setting up the
async job, then if the user requests pause, sets things to the same
state as if the user had requested paused migration in the first place.

So in a similar vein - if the user requests live snapshot, but then
changes their mind and pauses, we should set things to the same state as
if they had requested paused snapshot in the first place.

[time passes, as well as an IRC chat...]

Oh, you did exactly that in patch 10/20!

In that case, this code is fixing a pre-existing bug, in preparation for
the work in patch 10.  ACK as-is.

-- 
Eric Blake   eblake at redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 617 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20121102/ef8ec049/attachment-0001.sig>


More information about the libvir-list mailing list