[Questions] non-shared disk migration: jobs abort and bandwidth

Thu Jun 9 06:52:13 UTC 2022

On Wed, Jun 8, 2022 at 6:49 PM Peter Krempa <pkrempa at redhat.com> wrote:

> On Wed, Jun 08, 2022 at 17:32:57 +0800, Han Han wrote:
> > Hi developers,
> > Recently, I am researching migration with non-share disk(flags
> > VIR_MIGRATE_NON_SHARED_DISK and VIR_MIGRATE_NON_SHARED_INC).
> > As we know, the non-shared disk migration could have block jobs to copy
> the
> > disk image from the src host to the dst host. So here are my questions
> for
> > non-shared disk migration:
> > q1. For the API virDomainMigrate3 with the bandwidth param, could it set
> > the bandwidth of block jobs?
> > q2. For the API virDomainMigrateSetMaxSpeed, could it set the bandwidth
> of
> > block jobs?
> > q3. For the domain job abort API virDomainAbortJob, could it stop the
> block
> > job of non-shared disk migration?
> > q4. For the block job bandwidth API virDomainBlockJobSetSpeed, could it
> set
> > the block job of non-shared disk migration?
> > q5. For the block job abort API virDomainBlockJobAbort, could it stop the
> > block job of non-shared disk migration?
> >
> >
> >
> > Then I got the test results of libvirt-8.4.0-1.el9.x86_64
> > qemu-kvm-7.0.0-4.el9.x86_64:
> > q1: The bandwidth limit of virDomainMigrate3 is effective to the
> blockjob:
> > ➜  ~ virsh migrate OVMF qemu+ssh://root@hhan-rhel9--1/system --live
> --p2p
> > --tls --tls-destination hhan-rhel9--1 --copy-storage-all --disks-uri
> > tcp://hhan-rhel9--1:49156 --bandwidth 2
> > ➜  ~ virsh blockjob OVMF vda
> > Block Copy: [  0 %]    Bandwidth limit: 2097152 bytes/s (2.000 MiB/s)
>
> This is expected and desired.
>
> > q2: The virDomainMigrateSetMaxSpeed doesn't change the the bandwidth of
> > block jobs.
> > ➜  ~ virsh migrate-setspeed OVMF 8
> >
> > ➜  ~ virsh blockjob OVMF vda
> > Block Copy: [  9 %]    Bandwidth limit: 2097152 bytes/s (2.000 MiB/s)
>
> This is a bug though, setting the migration speed should, based on the
> fact that  we want to use the global migration speed flag for disks too
> , apply also to the disk migration streams.
>
File a bug here: https://bugzilla.redhat.com/show_bug.cgi?id=2095093

>
> > q3: The virDomainAbortJob could stop a block job of non-shared disk
> > migration
> > ➜  ~ virsh migrate OVMF qemu+ssh://root@hhan-rhel9--1/system --live
> --p2p
> > --tls --tls-destination hhan-rhel9--1 --copy-storage-all --disks-uri
> > tcp://hhan-rhel9--1:49156 --bandwidth 2
> > Then start a virsh event on another terminal:
> > ➜  ~ virsh event --loop --all
> >
> > Abort the domain job:
> > ➜  ~ virsh domjobabort OVMF
> >
> > The error "error: operation aborted: migration out: canceled by client"
> > appears at the terminal of "virsh migrate"
> > The terminal of "virsh event" shows the block job has been failed:
> > event 'block-job' for domain 'OVMF': Block Copy for
> > /var/lib/libvirt/images/OVMF.qcow2 failed
> > event 'block-job-2' for domain 'OVMF': Block Copy for vda failed
>
> This is again expected, the blockjobs are started by the migration thus
> when you cancel the migration we also need to cancel the blockjobs.
>
> > q4: The block job bandwidth of non-shared disk migration cannot be set by
> > virDomainBlockJobSetSpeed:
> > ➜  ~ virsh blockjob OVMF vda --bandwidth 10
> > error: Timed out during operation: cannot acquire state change lock (held
> > by monitor=remoteDispatchDomainMigratePerform3Params)
>
> This is okay, but we could take it a sa feature request to allow tuning
> of the individual blockjobs.
>
Assuming that tuning the individual blockjobs is supported, it is hard to
tell the bandwidth got from
virDomainMigrateGetMaxSpeed is the speed of  VM migration or the speed of
blockjob.
In contrast to virDomainMigrateSetMaxSpeed, the bandwidth is aimed for both
bandwidths.

I am not sure if there is such a user case: the VM migration data is
transported via sub-netA while
the block is transported via sub-netB. Then it may require to set different
bandwidth for different sub-nets.
If all the data is transported via the same net interface, just  keep it as
it is now.

BWT, what is the meaning of  "sa feature"?

>
> > q5: The block job of non-shared disk migration cannot be aborted by
> > virDomainBlockJobAbort:
> > ➜  ~ virsh blockjob OVMF vda --abort
> > error: Timed out during operation: cannot acquire state change lock (held
> > by monitor=remoteDispatchDomainMigratePerform3Params)
>
> This is expected. Same as above, we dodn't want to allow users to
> control this. In contrast to 'q4' I'd refuse a RFE to allow cancelling
> of individual jobs.
>
> > Are the results above expected?
> > Here are my personal thoughts:
> > For the bandwidth in q1 and q2, they are commented as migration
> bandwidth(
> >
> https://gitlab.com/libvirt/libvirt/-/blob/master/include/libvirt/libvirt-domain.h#L1165
> > ,
> >
> https://gitlab.com/libvirt/libvirt/-/blob/master/src/libvirt-domain.c#L9696
> > ), but one works for block jobs while one doesn't. So we should make the
> > comment clear whether they are the bandwidth of VM migration or the
> > bandwidth of migration with blockjobs. What's more, add a flag to
> > virDomainMigrateMaxSpeedFlags to support set bandwidth to the blockjobs
> in
> > migration.
> > For q4 and q5, if we will not support to change the block job of
> non-shared
> > disk migration by blockjob APIs, we should note that in the migration doc
> > or the block job doc, to present the difference between this type of
> block
> > job and the others.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20220609/f36bdc6d/attachment.htm>