[libvirt] [PATCH v12] support offline migration

li guang lig.fnst at cn.fujitsu.com
Mon Nov 5 03:33:48 UTC 2012


在 2012-11-02五的 15:32 +0100,Jiri Denemark写道:
> On Fri, Nov 02, 2012 at 09:30:27 +0800, liguang wrote:
> > original migration did not aware of offline case,
> > so, try to support offline migration quietly
> > (did not disturb original migration) by pass
> > VIR_MIGRATE_OFFLINE flag to migration APIs if only
> > the domain is really inactive, and
> > migration process will not puzzled by domain
> > offline and exit unexpectedly.
> > these changes did not take care of disk images the
> > domain required, for them could be transferred by
> > other APIs as suggested, then VIR_MIGRATE_OFFLINE
> > should not combined with VIR_MIGRATE_NON_SHARED_*.
> > if you want a persistent migration,
> > you should  do "virsh migrate --persistent" youself.
> > 
> > v12:
> > rebased for conflicting with commit 2f3e2c0c434218a3d656c08779cb98b327170e11,
> > and take in some messages from Doug Goldstein's patch
> > https://www.redhat.com/archives/libvir-list/2012-October/msg00957.html
> > 
> > Signed-off-by: liguang <lig.fnst at cn.fujitsu.com>
> > ---
> >  include/libvirt/libvirt.h.in |    1 +
> >  src/qemu/qemu_driver.c       |   15 +++++++++
> >  src/qemu/qemu_migration.c    |   66 ++++++++++++++++++++++++++++++++++++-----
> >  src/qemu/qemu_migration.h    |    3 +-
> >  tools/virsh-domain.c         |   10 ++++++
> >  5 files changed, 86 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in
> > index 2b17cef..a8f641f 100644
> > --- a/include/libvirt/libvirt.h.in
> > +++ b/include/libvirt/libvirt.h.in
> > @@ -1089,6 +1089,7 @@ typedef enum {
> >                                                 * whole migration process; this will be used automatically
> >                                                 * when supported */
> >      VIR_MIGRATE_UNSAFE            = (1 << 9), /* force migration even if it is considered unsafe */
> > +    VIR_MIGRATE_OFFLINE           = (1 << 10), /* offline migrate */
> >  } virDomainMigrateFlags;
> >  
> >  /* Domain migration. */
> > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> > index 267bbf1..f5c05e4 100644
> > --- a/src/qemu/qemu_driver.c
> > +++ b/src/qemu/qemu_driver.c
> > @@ -9759,6 +9759,20 @@ qemuDomainMigrateBegin3(virDomainPtr domain,
> >      }
> >  
> >      if (!virDomainObjIsActive(vm)) {
> > +        if (flags & VIR_MIGRATE_OFFLINE) {
> > +            if (flags & (VIR_MIGRATE_NON_SHARED_DISK|
> > +                         VIR_MIGRATE_NON_SHARED_INC)) {
> > +                virReportError(VIR_ERR_OPERATION_INVALID,
> > +                               "%s", _("offline migration cannot handle non-shared storage"));
> > +                goto endjob;
> > +            }
> > +            if (!(flags & VIR_MIGRATE_PERSIST_DEST)) {
> > +                virReportError(VIR_ERR_OPERATION_INVALID,
> > +                               "%s", _("offline migration must be specified with the persistent flag set"));
> > +                goto endjob;
> > +            }
> > +            goto offline;
> > +        }
> >          virReportError(VIR_ERR_OPERATION_INVALID,
> >                         "%s", _("domain is not running"));
> >          goto endjob;
> > @@ -9771,6 +9785,7 @@ qemuDomainMigrateBegin3(virDomainPtr domain,
> >      if (qemuDomainCheckEjectableMedia(driver, vm, asyncJob) < 0)
> >          goto endjob;
> >  
> > +offline:
> >      if (!(xml = qemuMigrationBegin(driver, vm, xmlin, dname,
> >                                     cookieout, cookieoutlen,
> >                                     flags)))
> 
> I think this label is pretty confusing since the code after it is not just for
> offline migration. Not to mention that this label is used to just skip calling
> qemuDomainCheckEjectableMedia(). I think it would look better as
> 
>     bool offline = !!(flags & VIR_MIGRATE_OFFLINE);
>     ...
>     if (!virDomainObjIsActive(vm)) {
>         if (offline) {
>             if (flags & (VIR_MIGRATE_NON_SHARED_DISK|
>                          VIR_MIGRATE_NON_SHARED_INC)) {
>                 virReportError(VIR_ERR_OPERATION_INVALID,
>                                "%s", _("offline migration cannot handle non-shared storage"));
>                 goto endjob;
>             }
>             if (!(flags & VIR_MIGRATE_PERSIST_DEST)) {
>                 virReportError(VIR_ERR_OPERATION_INVALID,
>                                "%s", _("offline migration must be specified with the persistent flag set"));
>                 goto endjob;
>             }
>         } else {
>             virReportError(VIR_ERR_OPERATION_INVALID,
>                            "%s", _("domain is not running"));
>             goto endjob;
>         }
>     }
> 
>     /* Check if there is any ejected media.
>      * We don't want to require them on the destination.
>      */
> 
>     if (!offline &&
>         qemuDomainCheckEjectableMedia(driver, vm, asyncJob) < 0)
>         goto endjob;
> 
>     if (!(xml = qemuMigrationBegin(driver, vm, xmlin, dname,
>                                    cookieout, cookieoutlen,
>                                    flags)))
>         goto endjob;
> 
> 
> Oh. And this whole code is actually at wrong place. It should be placed in
> qemuMigrationBegin() so that it's also called in case of peer-to-peer
> migration. And that also applies for qemuDomainCheckEjectableMedia(), but that
> should be moved first as a separate patch since it is a preexisting bug not
> related to offline migration.
> 

right, I can move this code snippet to qemuMigrationBegin(),
and do some work to prettify.

> > diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
> > index e0acc20..85fc48b 100644
> > --- a/src/qemu/qemu_migration.c
> > +++ b/src/qemu/qemu_migration.c
> > @@ -72,6 +72,7 @@ enum qemuMigrationCookieFlags {
> >      QEMU_MIGRATION_COOKIE_FLAG_LOCKSTATE,
> >      QEMU_MIGRATION_COOKIE_FLAG_PERSISTENT,
> >      QEMU_MIGRATION_COOKIE_FLAG_NETWORK,
> > +    QEMU_MIGRATION_COOKIE_FLAG_OFFLINE,
> >  
> >      QEMU_MIGRATION_COOKIE_FLAG_LAST
> >  };
> > @@ -79,13 +80,14 @@ enum qemuMigrationCookieFlags {
> >  VIR_ENUM_DECL(qemuMigrationCookieFlag);
> >  VIR_ENUM_IMPL(qemuMigrationCookieFlag,
> >                QEMU_MIGRATION_COOKIE_FLAG_LAST,
> > -              "graphics", "lockstate", "persistent", "network");
> > +              "graphics", "lockstate", "persistent", "network", "offline");
> >  
> >  enum qemuMigrationCookieFeatures {
> >      QEMU_MIGRATION_COOKIE_GRAPHICS  = (1 << QEMU_MIGRATION_COOKIE_FLAG_GRAPHICS),
> >      QEMU_MIGRATION_COOKIE_LOCKSTATE = (1 << QEMU_MIGRATION_COOKIE_FLAG_LOCKSTATE),
> >      QEMU_MIGRATION_COOKIE_PERSISTENT = (1 << QEMU_MIGRATION_COOKIE_FLAG_PERSISTENT),
> >      QEMU_MIGRATION_COOKIE_NETWORK = (1 << QEMU_MIGRATION_COOKIE_FLAG_NETWORK),
> > +    QEMU_MIGRATION_COOKIE_OFFLINE = (1 << QEMU_MIGRATION_COOKIE_FLAG_OFFLINE),
> >  };
> >  
> >  typedef struct _qemuMigrationCookieGraphics qemuMigrationCookieGraphics;
> > @@ -594,6 +596,9 @@ qemuMigrationCookieXMLFormat(struct qemud_driver *driver,
> >      if ((mig->flags & QEMU_MIGRATION_COOKIE_NETWORK) && mig->network)
> >          qemuMigrationCookieNetworkXMLFormat(buf, mig->network);
> >  
> > +    if (mig->flags & QEMU_MIGRATION_COOKIE_OFFLINE)
> > +        virBufferAsprintf(buf, "  <offline/>\n");
> > +
> >      virBufferAddLit(buf, "</qemu-migration>\n");
> >      return 0;
> >  }
> > @@ -874,6 +879,11 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig,
> >          (!(mig->network = qemuMigrationCookieNetworkXMLParse(ctxt))))
> >          goto error;
> >  
> > +    if ((flags & QEMU_MIGRATION_COOKIE_OFFLINE)) {
> > +        if (virXPathBoolean("count(./offline) > 0", ctxt))
> > +            mig->flags |= QEMU_MIGRATION_COOKIE_OFFLINE;
> > +    }
> > +
> >      return 0;
> >  
> >  error:
> > @@ -938,6 +948,12 @@ qemuMigrationBakeCookie(qemuMigrationCookiePtr mig,
> >          return -1;
> >      }
> >  
> > +    if (flags & QEMU_MIGRATION_COOKIE_OFFLINE) {
> > +        mig->flags |= QEMU_MIGRATION_COOKIE_OFFLINE;
> > +    }
> > +
> > +
> > +
> 
> To much space here, one empty line is enough :-)
> 

yes, will squash.

> >      if (!(*cookieout = qemuMigrationCookieXMLFormatStr(driver, mig)))
> >          return -1;
> >  
> > @@ -1443,6 +1459,13 @@ char *qemuMigrationBegin(struct qemud_driver *driver,
> >                                  QEMU_MIGRATION_COOKIE_LOCKSTATE) < 0)
> >          goto cleanup;
> >  
> > +    if (flags & VIR_MIGRATE_OFFLINE) {
> > +        if (qemuMigrationBakeCookie(mig, driver, vm,
> > +                                    cookieout, cookieoutlen,
> > +                                    QEMU_MIGRATION_COOKIE_OFFLINE) < 0)
> > +            goto cleanup;
> > +    }
> > +
> >      if (xmlin) {
> >          if (!(def = virDomainDefParseString(driver->caps, xmlin,
> >                                              QEMU_EXPECTED_VIRT_TYPES,
> > @@ -1607,6 +1630,15 @@ qemuMigrationPrepareAny(struct qemud_driver *driver,
> >          goto endjob;
> >      }
> >  
> > +    if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen,
> > +                                       QEMU_MIGRATION_COOKIE_OFFLINE)))
> > +        return ret;
> > +
> 
> Oops, the return statement will leave a big mess uncleaned. You need to goto
> endjob instead.
> 
> > +    if (mig->flags & QEMU_MIGRATION_COOKIE_OFFLINE) {
> > +        ret = 0;
> > +        goto cleanup;
> > +    }
> > +
> 
> This goto cleanup will result in the warning reported by Daniel. Prepare phase
> needs to call qemuMigrationJobContinue() on successful return. That is, you
> need to do "goto done;" and add new "done:" label just above the following
> code (around line 1690 on current HEAD):
> 

good! it's the reason waring spited out, thanks!

> /* add "done:" here */
>      /* We keep the job active across API calls until the finish() call.
>       * This prevents any other APIs being invoked while incoming
>       * migration is taking place.
>       */
>      if (!qemuMigrationJobContinue(vm)) {
>          vm = NULL;
>          virReportError(VIR_ERR_OPERATION_FAILED,
>                         "%s", _("domain disappeared"));
>          goto cleanup;
>      }
>  
>      ret = 0;
> 
> 
> >      /* Start the QEMU daemon, with the same command-line arguments plus
> >       * -incoming $migrateFrom
> >       */
> > @@ -2150,6 +2182,9 @@ qemuMigrationRun(struct qemud_driver *driver,
> >          return -1;
> >      }
> >  
> > +    if (flags & VIR_MIGRATE_OFFLINE)
> > +        return 0;
> > +
> >      if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen,
> >                                         QEMU_MIGRATION_COOKIE_GRAPHICS)))
> >          goto cleanup;
> 
> This check seems to be too deep. The whole point of it is to skip transferring

No, otherwise we would do much more work to clean up offline path.

> migration data (since there's no qemu running anyway) so why doing all the
> preparation work for it? I think Perform phase should be completely skipped
> for offline migrations. Or do we actually need to do something during Perform
> phase? It seems we don't since it is completely skipped during peer-to-peer
> migration (not that it would actually work with this patch).
> 
> > @@ -2665,7 +2700,12 @@ static int doPeer2PeerMigrate3(struct qemud_driver *driver,
> >               uri, &uri_out, flags, dname, resource, dom_xml);
> >          qemuDomainObjExitRemoteWithDriver(driver, vm);
> >      }
> > +
> >      VIR_FREE(dom_xml);
> > +
> > +    if (flags & VIR_MIGRATE_OFFLINE)
> > +        goto cleanup;
> > +
> >      if (ret == -1)
> >          goto cleanup;
> >  
> 
> This will skip not only Perform but also Finish phase in peer-to-peer
> migration. You want to jump to finish label and do that *after* the check for
> ret == 1.
> 
> > @@ -2771,7 +2811,7 @@ finish:
> >                   vm->def->name);
> >  
> >   cleanup:
> > -    if (ddomain) {
> > +    if (ddomain || (flags & VIR_MIGRATE_OFFLINE)) {
> >          virObjectUnref(ddomain);
> >          ret = 0;
> >      } else {
> 
> This is weird. It means that we don't care if Finish succeeded or not during
> offline migration even though Finish is where the domain configuration gets
> stored on disk.

my purpose is to call virObjectUnref(ddomain),
"ret = 0" is no only for offline migration.

> 
> > @@ -2848,7 +2888,7 @@ static int doPeer2PeerMigrate(struct qemud_driver *driver,
> >      }
> >  
> >      /* domain may have been stopped while we were talking to remote daemon */
> > -    if (!virDomainObjIsActive(vm)) {
> > +    if (!virDomainObjIsActive(vm) && !(flags & VIR_MIGRATE_OFFLINE)) {
> >          virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> >                         _("guest unexpectedly quit"));
> >          goto cleanup;
> > @@ -2911,7 +2951,7 @@ qemuMigrationPerformJob(struct qemud_driver *driver,
> >      if (qemuMigrationJobStart(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0)
> >          goto cleanup;
> >  
> > -    if (!virDomainObjIsActive(vm)) {
> > +    if (!virDomainObjIsActive(vm) && !(flags & VIR_MIGRATE_OFFLINE)) {
> >          virReportError(VIR_ERR_OPERATION_INVALID,
> >                         "%s", _("domain is not running"));
> >          goto endjob;
> > @@ -3236,6 +3276,8 @@ qemuMigrationFinish(struct qemud_driver *driver,
> >       */
> >      if (retcode == 0) {
> >          if (!virDomainObjIsActive(vm)) {
> > +            if (flags & VIR_MIGRATE_OFFLINE)
> > +                goto offline;
> >              virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> >                             _("guest unexpectedly quit"));
> >              goto endjob;
> > @@ -3255,6 +3297,7 @@ qemuMigrationFinish(struct qemud_driver *driver,
> >              if (qemuDomainMigrateOPDRelocate(driver, vm, mig) < 0)
> >                  VIR_WARN("unable to provide network data for relocation");
> >  
> > +    offline:
> >          if (flags & VIR_MIGRATE_PERSIST_DEST) {
> >              virDomainDefPtr vmdef;
> >              if (vm->persistent)
> 
> Again, pretty confusing usage of offline label.
> 
> > @@ -3302,7 +3345,7 @@ qemuMigrationFinish(struct qemud_driver *driver,
> >              event = NULL;
> >          }
> >  
> > -        if (!(flags & VIR_MIGRATE_PAUSED)) {
> > +        if (!(flags & VIR_MIGRATE_PAUSED) && !(flags & VIR_MIGRATE_OFFLINE)) {
> >              /* run 'cont' on the destination, which allows migration on qemu
> >               * >= 0.10.6 to work properly.  This isn't strictly necessary on
> >               * older qemu's, but it also doesn't hurt anything there
> > @@ -3351,9 +3394,11 @@ qemuMigrationFinish(struct qemud_driver *driver,
> >                                               VIR_DOMAIN_EVENT_SUSPENDED,
> >                                               VIR_DOMAIN_EVENT_SUSPENDED_PAUSED);
> >          }
> > -        if (virDomainSaveStatus(driver->caps, driver->stateDir, vm) < 0) {
> > -            VIR_WARN("Failed to save status on vm %s", vm->def->name);
> > -            goto endjob;
> > +        if (virDomainObjIsActive(vm)) {
> > +            if (virDomainSaveStatus(driver->caps, driver->stateDir, vm) < 0) {
> > +                VIR_WARN("Failed to save status on vm %s", vm->def->name);
> > +                goto endjob;
> > +            }
> >          }
> >  
> >          /* Guest is successfully running, so cancel previous auto destroy */
> > @@ -3373,6 +3418,7 @@ qemuMigrationFinish(struct qemud_driver *driver,
> >  endjob:
> >      if (qemuMigrationJobFinish(driver, vm) == 0) {
> >          vm = NULL;
> > +    } else if (flags & VIR_MIGRATE_OFFLINE) {
> >      } else if (!vm->persistent && !virDomainObjIsActive(vm)) {
> >          qemuDomainRemoveInactive(driver, vm);
> >          vm = NULL;
> 
> This is wrong since vm->persistent will always be true for offline
> migration. Unless something failed early in which case we actually want to
> remove the domain. The current code should work without any modification.

right, unnecessary now, will remove. 

> 
> > @@ -3420,6 +3466,9 @@ int qemuMigrationConfirm(struct qemud_driver *driver,
> >      if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, 0)))
> >          return -1;
> >  
> > +    if (flags & VIR_MIGRATE_OFFLINE)
> > +        goto offline;
> > +
> >      /* Did the migration go as planned?  If yes, kill off the
> >       * domain object, but if no, resume CPUs
> >       */
> > @@ -3455,6 +3504,7 @@ int qemuMigrationConfirm(struct qemud_driver *driver,
> >          }
> >      }
> >  
> > +offline:
> >      qemuMigrationCookieFree(mig);
> >      rv = 0;
> >  
> 
> Confusing label again, I'd use "done" or something similar.
> 
> We're almost there, just some unusual code paths (other than successful
> non-p2p migration) need fixing. Moreover, I think we should explicitly fail v2
> migration with OFFLINE flag to avoid having to touch that path as well. And

really? I think we will not fall into V2 migration path anymore, for we 
always check V3 first, and will return true(qemu driver).

> since you need to do changes to doPeer2PeerMigrate3 you should do similar
> changes to migration APIs in libvirt.c (since they are very similar) which
> should let you avoid changing some other parts.
> 
> Jirka


Thanks!

-- 
liguang    lig.fnst at cn.fujitsu.com
FNST linux kernel team





More information about the libvir-list mailing list