[libvirt] [RFC PATCH] Add new migration flag VIR_MIGRATE_DRY_RUN

Jim Fehlig jfehlig at suse.com
Mon Nov 5 22:07:08 UTC 2018


On 11/5/18 1:46 AM, Michal Privoznik wrote:
> On 11/02/2018 11:34 PM, Jim Fehlig wrote:
>> A dry run can be used as a best-effort check that a migration command
>> will succeed. The destination host will be checked to see if it can
>> accommodate the resources required by the domain. DRY_RUN will fail if
>> the destination host is not capable of running the domain. Although a
>> subsequent migration will likely succeed, the success of DRY_RUN does not
>> ensure a future migration will succeed. Resources on the destination host
>> could become unavailable between a DRY_RUN and actual migration.
>>
>> Signed-off-by: Jim Fehlig <jfehlig at suse.com>
>> ---
>>
>> If it is agreed this is useful, my thought was to use the begin and
>> prepare phases of migration to implement it. qemuMigrationDstPrepareAny()
>> already does a lot of the heavy lifting wrt checking the host can
>> accommodate the domain. Some of it, and the remaining migration phases,
>> can be short-circuited in the case of dry run.
>>
>> One interesting wrinkle I've observed is the check for cpu compatibility.
>> AFAICT qemu is actually invoked on the dst, "filtered-features" of the cpu
>> are requested via qmp, and results are checked against cpu in domain config.
>> If cpu on dst is insufficient, migration fails in the prepare phase with
>> something like "guest CPU doesn't match specification: missing features: z y z".
>> I was hoping to avoid launching qemu in the case of dry run, but that may
>> be unavoidable if we'd like a dependable dry run result.
>>
>> Thanks for considering the idea!
>>
>> (BTW, if it is considered useful I will follow up with a V1 series that
>> includes this patch and and impl for the qemu driver.)
>>
>>   include/libvirt/libvirt-domain.h | 12 ++++++++++++
>>   src/qemu/qemu_migration.h        |  3 ++-
>>   tools/virsh-domain.c             |  7 +++++++
>>   tools/virsh.pod                  | 10 +++++++++-
>>   4 files changed, 30 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
>> index fdd2d6b8ea..6d52f6ce50 100644
>> --- a/include/libvirt/libvirt-domain.h
>> +++ b/include/libvirt/libvirt-domain.h
>> @@ -830,6 +830,18 @@ typedef enum {
>>        */
>>       VIR_MIGRATE_TLS               = (1 << 16),
>>   
>> +    /* Setting the VIR_MIGRATE_DRY_RUN flag will cause libvirt to make a
>> +     * best-effort attempt to check if migration will succeed. The destination
>> +     * host will be checked to see if it can accommodate the resources required
>> +     * by the domain. For example are the network, disk, memory, and CPU
> 
> While this is a honourable goal to achieve I don't think we can
> guarantee it (without running qemu). At least in qemu world.

I don't think it can be guaranteed even if qemu is run. That's why the rest of 
the comment warns about relying on dry run's success. Dry run succeeding should 
give the user warm fuzzies, but it can't guarantee success of a future migration.

> For instance, libvirt doesn't check if there's enough memory (nor regular
> nor hugepages) when domain is started/migrated. We just run qemu and let
> it fail. However, for network, CPU and hostdev we do run checks so these
> might work. Disks are in grey area - we check their presence but not
> their labels. And if domain is relabel=no then the only way to learn if
> qemu would succeed is to run it.

I'll have to check but I think starting qemu for dry run is a no-go if host 
resources are actually consumed. E.g. if host memory is given to the dry run 
qemu and not available for non dry run instances.

> But I don't see much problem with starting qemu in paused state. I mean,
> we can get through Prepare phase but never actually reach Perform stage.
> The API/flag would return success if Prepare succeeded.

Yep, my though exactly, along with doing less preparation in the prepare phase.

> I bet it's easier to check if migration would succeed in xen world, or?

I suppose so, if anything because it supports less options. E.g. there's only 
one type of cpu for Xen PV domains.

> The other thing is how are apps expected to use this? I mean, if an app
> wants to work without admin intervention then it would need to learn how
> to fix any possible error (missing disk, perms issue, missing hostdev,
> etc.). This is not a trivial task IMO.

That's the case today if an actual migration fails. Dry run simply allows 
checking the possible success of migration without actually performing it. Admin 
intervention can occur before there is any attempt to perform a doomed migration 
(which in worse case can result in domain not running on src or dst).

Regards,
Jim




More information about the libvir-list mailing list