[libvirt] [PATCH] qemu: Propagate storage errors on migration

Tue Jan 20 10:22:38 UTC 2015

On 19.01.2015 22:58, John Ferlan wrote:
> 
> 
> On 01/09/2015 07:22 AM, Michal Privoznik wrote:
>> There's one bug that got more visible with my patches that
>> automatically precreate storage on migration. The problem is, if
>> there's not enough disk space on destination, we successfully finish
>> the migration instead of failing.
>>
>> Firstly, we instruct qemu to copy the storage. And as it copies the
>> data, one write() will return -ENOSPC, eventually, to which qemu
>> reacts by emitting BLOCK_JOB_ERROR. The event is successfully ignored
>> by us. Then, since the block job has finished, BLOCK_JOB_COMPLETED
>> event is emitted too.
>>
>> Secondly, we are not checking for the block job completion as we
>> should. Currently, we do the check by issuing 'query-block-jobs' and
>> then looking in the command's output for the job we are interested in.
>> If course, we don't fail if the job is not there, in which case the
> 
> s/If/Of
> 
>> number of total bytes to be transferred and bytes already transferred,
>> well they equal both to zero. This is actually the line causing the
> 
> s/well they equal both to zero/will both be equal to zero.
> 
>> bug as it sees both numbers equal to each other and gives green light
>> to the actual migration.
>>
>> The fix consist of two parts:
>>
>> In the first, code handling BLOCK_JOB_ERROR is added. It's a monitor
>> event and we should have handler for that anyway.
>>
>> In the second part, the code doing drive mirroring is rewritten so it
>> waits for the events instead of querying on the monitor repeatedly.
>>
>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>> ---
>>  src/conf/domain_conf.c       |  3 ++-
>>  src/conf/domain_conf.h       |  1 +
>>  src/qemu/qemu_migration.c    | 25 +++++++------------------
>>  src/qemu/qemu_monitor_json.c | 10 ++++++++++
>>  src/qemu/qemu_process.c      | 14 ++++++++++++++
>>  5 files changed, 34 insertions(+), 19 deletions(-)
>>
> 
> The concept seems reasonable - of course the code has change since you
> first sent this, so a rebase will be necessary...
> 
>> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
>> index d1a483a..751a9b5 100644
>> --- a/src/conf/domain_conf.c
>> +++ b/src/conf/domain_conf.c
>> @@ -787,7 +787,8 @@ VIR_ENUM_IMPL(virDomainDiskMirrorState, VIR_DOMAIN_DISK_MIRROR_STATE_LAST,
>>                "none",
>>                "yes",
>>                "abort",
>> -              "pivot")
>> +              "pivot",
>> +              "error")
>>  
>>  VIR_ENUM_IMPL(virDomainLoader,
>>                VIR_DOMAIN_LOADER_TYPE_LAST,
>> diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
>> index ac1f4f8..f18fa80 100644
>> --- a/src/conf/domain_conf.h
>> +++ b/src/conf/domain_conf.h
>> @@ -654,6 +654,7 @@ typedef enum {
>>      VIR_DOMAIN_DISK_MIRROR_STATE_READY, /* Job in second phase */
>>      VIR_DOMAIN_DISK_MIRROR_STATE_ABORT, /* Job aborted, waiting for event */
>>      VIR_DOMAIN_DISK_MIRROR_STATE_PIVOT, /* Job pivoted, waiting for event */
>> +    VIR_DOMAIN_DISK_MIRROR_STATE_ERROR, /* Job failed */
>>  
>>      VIR_DOMAIN_DISK_MIRROR_STATE_LAST
>>  } virDomainDiskMirrorState;
>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
>> index 77e0b35..e2309ea 100644
>> --- a/src/qemu/qemu_migration.c
>> +++ b/src/qemu/qemu_migration.c
>> @@ -1743,7 +1743,6 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver,
>>  
>>      for (i = 0; i < vm->def->ndisks; i++) {
>>          virDomainDiskDefPtr disk = vm->def->disks[i];
>> -        virDomainBlockJobInfo info;
>>  
>>          /* skip shared, RO and source-less disks */
>>          if (disk->src->shared || disk->src->readonly ||
>> @@ -1768,6 +1767,7 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver,
>>          if (mon_ret < 0)
>>              goto error;
>>  
>> +        disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY;
> 
> I assume this is how we "ensure" we get into the else of
> qemuProcessHandleBlockJob. 

Exactly.

> Is setting this outside the context of this
> bugfix fixing a bug?  That is should it have been set all along? Curious
> mostly.

Well, it wouldn't hurt anything it it was set. On the other hand, it
wouldn't help anything either.

> 
>>          lastGood = i;
>>  
>>          /* wait for completion */
>> @@ -1775,38 +1775,27 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver,
>>              /* Poll every 500ms for progress & to allow cancellation */
>>              struct timespec ts = { .tv_sec = 0, .tv_nsec = 500 * 1000 * 1000ull };
>>  
>> -            memset(&info, 0, sizeof(info));
>> -
>> -            if (qemuDomainObjEnterMonitorAsync(driver, vm,
>> -                                               QEMU_ASYNC_JOB_MIGRATION_OUT) < 0)
>> -                goto error;
>>              if (priv->job.asyncAbort) {
>>                  /* explicitly do this *after* we entered the monitor,
>>                   * as this is a critical section so we are guaranteed
>>                   * priv->job.asyncAbort will not change */
> 
> So, since we're not entering the monitor any more - does the text need
> to change?

Oh, right.

> Also, is it 'safe' to look at priv->job.current->type?  Or
> even priv->job.asyncAbort since we won't have a monitor lock?

I think it is. If you look at qemuDomainAbortJob() impl, there's
qemuDomainObjAbortAsyncJob() called with only @vm lock held, outside of
Enter/ExitMonitor(). And the qemuDomainObjAbortAsyncJob() just just a
wrapper over direct access to priv->job.asyncAbort.

> 
> Considering the recent series on ExitMonitor error checks vis-a-vis the
> vm dying during some monitor interaction could this end up being an
> infinite loop?  That is would the necessary mirrorState be set in the
> case where the vm dies?

Interesting. I don't think that can happen. If vm dies,
disk->mirrorState should go to error state.

> 
>> -                qemuDomainObjExitMonitor(driver, vm);
>>                  priv->job.current->type = VIR_DOMAIN_JOB_CANCELLED;
>>                  virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"),
>>                                 qemuDomainAsyncJobTypeToString(priv->job.asyncJob),
>>                                 _("canceled by client"));
> 
> s/canceled/cancelled
> 
> (I see there's also another place in qemuMigrationUpdateJobStatus as
> well as libvirt-domain and virsh* modules, sigh)
> 
>>                  goto error;
>>              }
>> -            mon_ret = qemuMonitorBlockJobInfo(priv->mon, diskAlias, &info,
>> -                                              NULL);
>> -            qemuDomainObjExitMonitor(driver, vm);
>>  
>> -            if (mon_ret < 0)
>> -                goto error;
>> -
>> -            if (info.cur == info.end) {
>> +            if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_STATE_READY) {
>>                  VIR_DEBUG("Drive mirroring of '%s' completed", diskAlias);
>>                  break;
>> +            } else if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_STATE_ERROR) {
>> +                virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"),
>> +                               qemuDomainAsyncJobTypeToString(priv->job.asyncJob),
>> +                               _("canceled by destination"));
> 
> s/canceled/cancelled
> 
>> +                goto error;
>>              }
>>  
>> -            /* XXX Frankly speaking, we should listen to the events,
>> -             * instead of doing this. But this works for now and we
>> -             * are doing something similar in migration itself anyway */
>> -
>>              virObjectUnlock(vm);
>>  
>>              nanosleep(&ts, NULL);
>> diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c
>> index e567aa7..5d6bbca 100644
>> --- a/src/qemu/qemu_monitor_json.c
>> +++ b/src/qemu/qemu_monitor_json.c
>> @@ -76,6 +76,7 @@ static void qemuMonitorJSONHandlePMWakeup(qemuMonitorPtr mon, virJSONValuePtr da
>>  static void qemuMonitorJSONHandlePMSuspend(qemuMonitorPtr mon, virJSONValuePtr data);
>>  static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, virJSONValuePtr data);
>>  static void qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon, virJSONValuePtr data);
>> +static void qemuMonitorJSONHandleBlockJobError(qemuMonitorPtr mon, virJSONValuePtr data);
>>  static void qemuMonitorJSONHandleBlockJobReady(qemuMonitorPtr mon, virJSONValuePtr data);
>>  static void qemuMonitorJSONHandleBalloonChange(qemuMonitorPtr mon, virJSONValuePtr data);
>>  static void qemuMonitorJSONHandlePMSuspendDisk(qemuMonitorPtr mon, virJSONValuePtr data);
>> @@ -94,6 +95,7 @@ static qemuEventHandler eventHandlers[] = {
>>      { "BLOCK_IO_ERROR", qemuMonitorJSONHandleIOError, },
>>      { "BLOCK_JOB_CANCELLED", qemuMonitorJSONHandleBlockJobCanceled, },
>>      { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJobCompleted, },
>> +    { "BLOCK_JOB_ERROR", qemuMonitorJSONHandleBlockJobError, },
>>      { "BLOCK_JOB_READY", qemuMonitorJSONHandleBlockJobReady, },
>>      { "DEVICE_DELETED", qemuMonitorJSONHandleDeviceDeleted, },
>>      { "DEVICE_TRAY_MOVED", qemuMonitorJSONHandleTrayChange, },
>> @@ -842,6 +844,14 @@ qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon,
>>  }
>>  
>>  static void
>> +qemuMonitorJSONHandleBlockJobError(qemuMonitorPtr mon,
>> +                                   virJSONValuePtr data)
>> +{
>> +    qemuMonitorJSONHandleBlockJobImpl(mon, data,
>> +                                      VIR_DOMAIN_BLOCK_JOB_FAILED);
>> +}
>> +
>> +static void
>>  qemuMonitorJSONHandleBlockJobReady(qemuMonitorPtr mon,
>>                                     virJSONValuePtr data)
>>  {
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index c18204b..b29ecf1 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -1110,6 +1110,20 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED,
>>                  disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN;
>>                  save = true;
>>              }
>> +        } else if (disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) {
>> +            switch ((virConnectDomainEventBlockJobStatus) status) {
>> +            case VIR_DOMAIN_BLOCK_JOB_READY:
>> +            case VIR_DOMAIN_BLOCK_JOB_COMPLETED:
> 
> "theoretically" if status == COMPLETED we shouldn't be here either since
> that the first "if" checks that...
> 
>> +                disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY;
>> +                break;
>> +            case VIR_DOMAIN_BLOCK_JOB_CANCELED:
>> +            case VIR_DOMAIN_BLOCK_JOB_FAILED:
>> +                disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_ERROR;
>> +                break;
>> +            case VIR_DOMAIN_BLOCK_JOB_LAST:
>> +                VIR_DEBUG("should not get here");
>> +                break;
> 
> Not that we'd get here, but what effect is not setting mirrorState
> 
> Is a switch overkill in this instance?

I bet compiler will evaluate what's best and generate efficient code.

Michal