[libvirt] [PATCH] Allow suspend during live migration

Daniel P. Berrange berrange at redhat.com
Wed Mar 17 11:22:34 UTC 2010


On Mon, Mar 15, 2010 at 02:19:00PM +0100, Jiri Denemark wrote:
> Currently no command can be sent to a qemu process while another job is
> active. This patch adds support for signaling long-running jobs (such as
> migration) so that other threads may request predefined operations to be
> done during such jobs. Two signals are defined so far:
>     - QEMU_JOB_SIGNAL_CANCEL
>     - QEMU_JOB_SIGNAL_SUSPEND
> 
> The first one is used by qemuDomainAbortJob.
> 
> The second one is used by qemudDomainSuspend for suspending a domain
> during migration, which allows for changing live migration into offline
> migration. However, there is a small issue in the way qemudDomainSuspend
> is currently implemented for migrating domains. The API calls returns
> immediately after signaling migration job which means it is asynchronous
> in this specific case.

That is a fairly minor issue, and I'm not sure if we need to worry about
it or not really.  If we did want to fix it, then the way todo it would
be to add a another condition variable to qemuDomainObjPriv, 

  virCond jobSignalCond.

after the qemuDomainSuspend sets the job signal, it should then wait on
the condition variable. The qemuDomainWaitForMigrationComplete() would
signal the condition variable when it had processed it.

> @@ -3560,6 +3572,7 @@ static int qemudDomainSuspend(virDomainPtr dom) {
>      virDomainObjPtr vm;
>      int ret = -1;
>      virDomainEventPtr event = NULL;
> +    qemuDomainObjPrivatePtr priv;
>  
>      qemuDriverLock(driver);
>      vm = virDomainFindByUUID(&driver->domains, dom->uuid);
> @@ -3571,30 +3584,48 @@ static int qemudDomainSuspend(virDomainPtr dom) {
>                          _("no domain with matching uuid '%s'"), uuidstr);
>          goto cleanup;
>      }
> -    if (qemuDomainObjBeginJobWithDriver(driver, vm) < 0)
> -        goto cleanup;
> -
>      if (!virDomainObjIsActive(vm)) {
>          qemuReportError(VIR_ERR_OPERATION_INVALID,
>                          "%s", _("domain is not running"));
> -        goto endjob;
> +        goto cleanup;
>      }
> -    if (vm->state != VIR_DOMAIN_PAUSED) {
> -        qemuDomainObjPrivatePtr priv = vm->privateData;
> -        qemuDomainObjEnterMonitorWithDriver(driver, vm);
> -        if (qemuMonitorStopCPUs(priv->mon) < 0) {
> -            qemuDomainObjExitMonitorWithDriver(driver, vm);
> +
> +    priv = vm->privateData;
> +
> +    if (priv->jobActive == QEMU_JOB_MIGRATION) {
> +        if (vm->state != VIR_DOMAIN_PAUSED) {
> +            VIR_DEBUG("Requesting domain pause on %s",
> +                      vm->def->name);
> +            priv->jobSignals |= QEMU_JOB_SIGNAL_SUSPEND;

If we did the condition variable thing I mentioned, then you'd also
need to check to see if the signal was already set, and raise an 
error, since you don't want to race multiple concurrent suspend
calls on the condition variable

> +        }
> +        ret = 0;
> +        goto cleanup;
> +    } else {
> +        if (qemuDomainObjBeginJobWithDriver(driver, vm) < 0)
> +            goto cleanup;
> +
> +        if (!virDomainObjIsActive(vm)) {
> +            qemuReportError(VIR_ERR_OPERATION_INVALID,
> +                            "%s", _("domain is not running"));
>              goto endjob;
>          }
> -        qemuDomainObjExitMonitorWithDriver(driver, vm);
> -        vm->state = VIR_DOMAIN_PAUSED;
> -        event = virDomainEventNewFromObj(vm,
> -                                         VIR_DOMAIN_EVENT_SUSPENDED,
> -                                         VIR_DOMAIN_EVENT_SUSPENDED_PAUSED);
> +        if (vm->state != VIR_DOMAIN_PAUSED) {
> +            int rc;
> +
> +            qemuDomainObjEnterMonitorWithDriver(driver, vm);
> +            rc = qemuMonitorStopCPUs(priv->mon);
> +            qemuDomainObjExitMonitorWithDriver(driver, vm);
> +            if (rc < 0)
> +                goto endjob;
> +            vm->state = VIR_DOMAIN_PAUSED;
> +            event = virDomainEventNewFromObj(vm,
> +                                             VIR_DOMAIN_EVENT_SUSPENDED,
> +                                             VIR_DOMAIN_EVENT_SUSPENDED_PAUSED);
> +        }
> +        if (virDomainSaveStatus(driver->caps, driver->stateDir, vm) < 0)
> +            goto endjob;
> +        ret = 0;
>      }
> -    if (virDomainSaveStatus(driver->caps, driver->stateDir, vm) < 0)
> -        goto endjob;
> -    ret = 0;
>  
>  endjob:
>      if (qemuDomainObjEndJob(vm) == 0)


ACK to this patch. If you want to solve the async problem with a condition
variable it can be a add-on patch later

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




More information about the libvir-list mailing list