[libvirt] [PATCH v2 6/8] Implemented post-copy migration logic in qemu

Cristian Klein cristian.klein at cs.umu.se
Wed Oct 8 15:09:45 UTC 2014


On 07 Oct 2014, at 23:02 , Jiri Denemark <jdenemar at redhat.com> wrote:

> On Tue, Sep 30, 2014 at 16:39:27 +0200, Cristian Klein wrote:
>> Perform phase stops once migration switched to post-copy.
>> Confirm phase waits for post-copy to finish before killing the VM.
>> 
>> Signed-off-by: Cristian Klein <cristian.klein at cs.umu.se>
>> ---
>> src/qemu/qemu_driver.c    |  8 ++++++++
>> src/qemu/qemu_migration.c | 25 ++++++++++++++++++++++---
>> 2 files changed, 30 insertions(+), 3 deletions(-)
>> 
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index e873d45..3fe2216 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -10942,6 +10942,14 @@ qemuDomainMigratePrepare2(virConnectPtr dconn,
>> 
>>     virCheckFlags(QEMU_MIGRATION_FLAGS, -1);
>> 
>> +    if (flags & VIR_MIGRATE_POSTCOPY) {
>> +        /* post-copy migration does not work with Sequence v2 */
>> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>> +                       _("Post-copy migration requested but not "
>> +                         "supported by v2 protocol"));
>> +        goto cleanup;
>> +    }
>> +
> 
> This code should be unreachable. If both source and destination daemons
> support VIR_MIGRATE_POSTCOPY, they support v3 protocol as well. And a
> client new enough to specify VIR_MIGRATE_POSTCOPY will also support v3
> migration protocol. However, it doesn't hurt to check this to be safe.
> 
>>     if (flags & VIR_MIGRATE_TUNNELLED) {
>>         /* this is a logical error; we never should have gotten here with
>>          * VIR_MIGRATE_TUNNELLED set
>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
>> index 4a36946..436b701 100644
>> --- a/src/qemu/qemu_migration.c
>> +++ b/src/qemu/qemu_migration.c
>> @@ -2039,6 +2039,11 @@ qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver,
>>         ret = 0;
>>         break;
>> 
>> +    case QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_ACTIVE:
>> +        jobInfo->type = VIR_DOMAIN_JOB_PHASE1_COMPLETED;
>> +        ret = 0;
>> +        break;
>> +
> 
> This will need to be dropped after 5/8 is removed. However, it reminds
> me...
> 
>    enum {
>        QEMU_MONITOR_MIGRATION_STATUS_INACTIVE,
>        QEMU_MONITOR_MIGRATION_STATUS_ACTIVE,
>        QEMU_MONITOR_MIGRATION_STATUS_COMPLETED,
>        QEMU_MONITOR_MIGRATION_STATUS_ERROR,
>        QEMU_MONITOR_MIGRATION_STATUS_CANCELLED,
>        QEMU_MONITOR_MIGRATION_STATUS_SETUP,
>        QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_ACTIVE,
> 
>        QEMU_MONITOR_MIGRATION_STATUS_LAST
>    };
> 
> in qemu_monitor.h needs to be turned into a proper typedef so that
> 
>    switch (status.status) {
> 
> line in qemuMigrationUpdateJobStatus may be changed to explicitly
> mention the enum so that the compiler may report a warning whenever we
> add new status but forgot to handle it in this switch.

What about `QEMU_MONITOR_MIGRATION_STATUS_LAST`? Should this be included in a switch with an assertion that that code should never be reached?

> Which means the
> new state will need to be handled in the same patch it was introduced,
> i.e, in 3/8.
> 
>>     case QEMU_MONITOR_MIGRATION_STATUS_INACTIVE:
>>         jobInfo->type = VIR_DOMAIN_JOB_NONE;
>>         virReportError(VIR_ERR_OPERATION_FAILED,
>> @@ -2077,6 +2082,7 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver,
>>     qemuDomainJobInfoPtr jobInfo = priv->job.current;
>>     const char *job;
>>     int pauseReason;
>> +    bool inPhase2 = (jobInfo->type == VIR_DOMAIN_JOB_PHASE1_COMPLETED);
> 
> I think it would be cleaner to pass this info in a new parameter for
> qemuMigrationWaitForCompletion instead of doing a hidden black magic
> here.

I agree with you, I wasn’t sure about with this way of coding things either. Is it okey to ask developers to always create a new intermediate parameter to call this function, so as to make its usage easier to read? E.g.,

“””
    bool return_on_postcopy_active = false;
    rv = qemuMigrationWaitForCompletion(driver, vm,
                                        QEMU_ASYNC_JOB_MIGRATION_OUT,
                                        conn, abort_on_error,
                                        return_on_postcopy_active);
“”" 

> 
>> 
>>     switch (priv->job.asyncJob) {
>>     case QEMU_ASYNC_JOB_MIGRATION_OUT:
>> @@ -2092,9 +2098,11 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver,
>>         job = _("job");
>>     }
>> 
>> -    jobInfo->type = VIR_DOMAIN_JOB_UNBOUNDED;
>> +    if (!inPhase2)
>> +        jobInfo->type = VIR_DOMAIN_JOB_UNBOUNDED;
>> 
>> -    while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) {
>> +    while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED ||
>> +           (inPhase2 && jobInfo->type == VIR_DOMAIN_JOB_PHASE1_COMPLETED)) {
> 
> Just use jobInfo->status.status directly.
> 
>>         /* Poll every 50ms for progress & to allow cancellation */
>>         struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull };
>> 
>> @@ -2123,7 +2131,8 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver,
>>         virObjectLock(vm);
>>     }
>> 
>> -    if (jobInfo->type == VIR_DOMAIN_JOB_COMPLETED) {
>> +    if (jobInfo->type == VIR_DOMAIN_JOB_COMPLETED ||
>> +        jobInfo->type == VIR_DOMAIN_JOB_PHASE1_COMPLETED) {
> 
> This shouldn't be needed when 5/8 is dropped.
> 
>>         qemuDomainJobInfoUpdateDowntime(jobInfo);
>>         VIR_FREE(priv->job.completed);
>>         if (VIR_ALLOC(priv->job.completed) == 0)
>> @@ -3149,6 +3158,16 @@ qemuMigrationConfirmPhase(virQEMUDriverPtr driver,
>> 
>>     virCheckFlags(QEMU_MIGRATION_FLAGS, -1);
>> 
>> +    /* Wait for post-copy to complete */
>> +    if (flags & VIR_MIGRATE_POSTCOPY) {
>> +        bool abort_on_error = !!(flags & VIR_MIGRATE_ABORT_ON_ERROR);
>> +        rv = qemuMigrationWaitForCompletion(driver, vm,
>> +                                            QEMU_ASYNC_JOB_MIGRATION_OUT,
>> +                                            conn, abort_on_error);
>> +        if (rv < 0)
>> +            goto cleanup;
>> +    }
>> +
> 
> This has to be done after setting the phase. We may need to add a new so
> that we can recover from post-copy migrations when libvirtd is
> restarted.

Correct me if I’m wrong, but isn’t the phase already set to `QEMU_MIGRATION_PHASE_CONFIRM3` by `qemuMigrationConfirm`, which calls `qemuMigrationConfirmPhase`?

Also, I’m not sure we need another phase. If libvirtd is restarted, it should continue with phase CONFIRM3, wait for post-copy migration to finish or observe that the VM has finished post-copy migration, and kill it. Am I missing something?

> 
>>     qemuMigrationJobSetPhase(driver, vm,
>>                              retcode == 0
>>                              ? QEMU_MIGRATION_PHASE_CONFIRM3

Cristian




More information about the libvir-list mailing list