[libvirt PATCH 54/80] qemu: Implement VIR_MIGRATE_POSTCOPY_RESUME for Begin phase

Peter Krempa pkrempa at redhat.com
Wed May 18 13:59:46 UTC 2022


On Wed, May 18, 2022 at 15:40:18 +0200, Jiri Denemark wrote:
> On Thu, May 12, 2022 at 14:14:15 +0200, Peter Krempa wrote:
> > On Tue, May 10, 2022 at 17:21:15 +0200, Jiri Denemark wrote:
> > > Mostly we just need to check whether the domain is in a failed post-copy
> > > migration that can be resumed.
> > > 
> > > Signed-off-by: Jiri Denemark <jdenemar at redhat.com>
> > > ---
> > >  src/qemu/qemu_migration.c | 143 ++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 143 insertions(+)
> > > 
> > > diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
> > > index c111dd8686..99b1d4b88b 100644
> > > --- a/src/qemu/qemu_migration.c
> > > +++ b/src/qemu/qemu_migration.c
> > > @@ -2717,6 +2717,143 @@ qemuMigrationSrcBeginPhase(virQEMUDriver *driver,
> > >                                      flags);
> > >  }
> > >  
> > > +
> > > +static bool
> > > +qemuMigrationAnyCanResume(virDomainObj *vm,
> > > +                          virDomainAsyncJob job,
> > > +                          unsigned long flags,
> > > +                          qemuMigrationJobPhase expectedPhase)
> > > +{
> > > +    qemuDomainObjPrivate *priv = vm->privateData;
> > > +
> > > +    VIR_DEBUG("vm=%p, job=%s, flags=0x%lx, expectedPhase=%s",
> > > +              vm, virDomainAsyncJobTypeToString(job), flags,
> > > +              qemuDomainAsyncJobPhaseToString(VIR_ASYNC_JOB_MIGRATION_OUT,
> > > +                                              expectedPhase));
> > > +
> > > +    if (!(flags & VIR_MIGRATE_POSTCOPY)) {
> > > +        virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s",
> > > +                       _("resuming failed post-copy migration requires "
> > > +                         "post-copy to be enabled"));
> > 
> > Error messages shall not have line breaks.
> > 
> > Also ... requiring this flag seems to be a bit pointless and ...
> > cosmetic at best. Since the previous migration would need to be started
> > with the post-copy flag already which is checked below.
> 
> Right, the previous migration must have been started as post-copy, but
> resuming it is post-copy too. We could have implied the flag based on
> POSTCOPY_RESUME (which is ugly) or all relevant places checking POSTCOPY
> would need to check POSTCOPY_RESUME too (which is ugly as well). And
> with the expected use case (repeating the failed migration command with
> all the flags and arguments and adding an extra POSTCOPY_RESUME flag) in
> mind requiring POSTCOPY to be used with POSTCOPY_RESUME looks like the
> the best option to me.

I was kind of referring to the fact that it's implied that postcopy is
being resumed with a POSTCOPY_RESUME flag and you need to check anyways
that the migration is indeed postcopy, so this feels redundant, but I
think it's okay.

> > 
> > > +        return false;
> > > +    }
> > > +
> > > +    /* This should never happen since POSTCOPY_RESUME is newer than
> > > +     * CHANGE_PROTECTION, but let's check it anyway in case we're talking to
> > > +     * a weired client.
> > > +     */
> > > +    if (job == VIR_ASYNC_JOB_MIGRATION_OUT &&
> > > +        expectedPhase < QEMU_MIGRATION_PHASE_PERFORM_RESUME &&
> > > +        !(flags & VIR_MIGRATE_CHANGE_PROTECTION)) {
> > > +        virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s",
> > > +                       _("resuming failed post-copy migration requires "
> > > +                         "change protection"));
> > 
> > Same here.
> > 
> > > +        return NULL;
> > > +    }
> > 
> > 
> > [...]
> > 
> > > +static char *
> > > +qemuMigrationSrcBeginResume(virQEMUDriver *driver,
> > > +                            virDomainObj *vm,
> > > +                            const char *xmlin,
> > > +                            char **cookieout,
> > > +                            int *cookieoutlen,
> > > +                            unsigned long flags)
> > > +{
> > > +    virDomainJobStatus status;
> > > +
> > > +    if (qemuMigrationAnyRefreshStatus(driver, vm, VIR_ASYNC_JOB_MIGRATION_OUT,
> > > +                                      &status) < 0)
> > > +        return NULL;
> > > +
> > > +    if (status != VIR_DOMAIN_JOB_STATUS_POSTCOPY_PAUSED) {
> > > +        virReportError(VIR_ERR_OPERATION_INVALID, "%s",
> > > +                       _("QEMU reports migration is still running"));
> > > +        return NULL;
> > > +    }
> > > +
> > > +    return qemuMigrationSrcBeginXML(driver, vm, xmlin,
> > > +                                    cookieout, cookieoutlen, 0, NULL, 0, flags);
> > 
> > Certain steps here around the non-shared-storage migration are redundant
> > at this point and IMO should be skipped.
> 
> Which is what happens in qemuMigrationSrcBegin:
> 
>     if (flags & VIR_MIGRATE_POSTCOPY_RESUME) {
>         xml = qemuMigrationSrcBeginResumePhase(...);
>         goto cleanup;
>     }
> 
>     ...
> 
> Or did I miss anything?

In 'qemuMigrationSrcBeginXML'
qemuMigrationSrcBeginPhaseBlockDirtyBitmaps is called which does a bunch
of setup for bitmap migration which is pointless when recovering as
storage is copied already if the CPUs are running on the destination.



More information about the libvir-list mailing list