[libvirt] [PATCH 00/11] Post-Copy Live Migration Support

Cristian KLEIN cristiklein at gmail.com
Thu Dec 4 10:09:40 UTC 2014


On 2014-12-04 10:40, Laine Stump wrote:
> On 12/01/2014 10:59 AM, Cristian Klein wrote:
>> Qemu currently implements pre-copy live migration. VM memory pages are
>> first copied from the source hypervisor to the destination, potentially
>> multiple times as pages get dirtied during transfer, then VCPU state
>> is migrated. Unfortunately, if the VM dirties memory faster than the
>> network bandwidth, then pre-copy cannot finish. `virsh` currently
>> includes an option to suspend a VM after a timeout, so that migration
>> may finish, but at the expense of downtime.
>>
>> A future version of qemu will implement post-copy live migration. The
>> VCPU state is first migrated to the destination hypervisor, then
>> memory pages are pulled from the source hypervisor. Post-copy has the
>> potential to do migration with zero-downtime, despite the VM dirtying
>> pages fast, with minimum performance impact. On the other hand, while
>> post-copy is in progress, any network failure would render the VM
>> unusable, as its memory is partitioned between the source and
>> destination hypervisor. Therefore, post-copy should only be used when
>> necessary.
>>
>> Post-copy migration in qemu will work as follows:
>> (1) The `x-postcopy-ram` migration capability needs to be set.
>> (2) Migration is started.
>> (3) When the user decides so, post-copy migration is activated by
>> sending the `migrate-start-postcopy` command.
>> (4) Qemu acknowledges by setting migration status to `postcopy-active`.
>
> (there are probably inaccuracies and misstatements in the following, but
> the topic does need consideration, and this seemed like a good place to
> bring it up while it's fresh in my mind...)
>
> I happened to be thinking about post-copy migration vs. guest networking
> over the weekend, and realized a potential problem related to starting
> the destination domain so quickly after it is created - if the guest is
> connected to the network via a host bridge that has STP enabled and a
> non-zero forwarding delay, the guest's network traffic could be
> interrupted until the delay timer has counted down. This points out a
> couple of things:
>
> 1) the "migrate-start-postcopy" needs to be either sent, or acknowledged
> (I'm not sure which coincides more closely with the stopping of the
> source domain and starting of the destination domain) after the
> destination domain's tap devices have existed and been connected to the
> bridge long enough to be able to forward traffic.
>
> 2) libvirt needs to have a more formal separation of the following tasks:
>
>      * allocate resources for a network device (i.e.
> networkAllocateActualDevice())
>      * create a network device (create and ifup the tap device,
>        which would start timers counting down; in the case of macvtap,
> the device
>        should be created, but not ifup'ed)
>      * activate a network device (for a tap device send a gratuitous arp
> request,
>        update the bridge's FDB for the guest's MAC address. For macvtap,
> ifup the device)
>
> It should also have the reverse of all these operations:
>
>      * deactivate (remove fdb entries for tap, ifdown for macvtap)
>      * destroy (delete the tap/macvtap device)
>      * free  (networkReleaseActualDevice())
>
> Additionally, for completeness we need "notify" which is done for each
> guest interface any time libvirtd is restarted (this already exists in
> networkNotifyActualDevice()); this just recreates libvirtd's tables of
> which host interfaces are in use by guests.
>
> Currently, libvirt does create and activate simultaneously (and also
> qemu does a gratuitous ARP request at some point, although I haven't
> checked if it happens when qemu starts or when the guest CPUs are
> started), and deactivate, destroy, and free all happen at pretty much
> the same time as well. The former leads to problems like this one
> reported by dgilbert:
>
>    https://bugzilla.redhat.com/show_bug.cgi?id=1081461
>
> This is just one of several possible variations of "some parts of the
> network have incorrect information about where MAC X is currently
> located"; when you mix in post-copy migration, and manual handling of
> the bridge FDB
> (https://www.redhat.com/archives/libvir-list/2014-December/msg00173.html),
> there are many opportunities for failure!
>
> Back to my list of operations - to make migration work smoothly,
> allocate and create should be done prior to starting the qemu process,
> but activate shouldn't be done until just before the CPUs are turned on
> (and ideally, *that* shouldn't happen until the connection to the device
> is ready to forward traffic). Likewise, deactivate should be called as
> soon as the CPUs are paused, while destroy/free should be done after
> qemu is terminated. This way, the guest's MAC will only be in one
> bridge's FDB at any given time, and it will be the FDB of the bridge
> attached to the currently running instance.
>
> Does anybody else have any thoughts/ideas on this subject? Cleaning up
> the hypervisor drivers' use of network devices has been on my mind for a
> long time, and it may be time to finally take action.

Hi Laine,

I am not sufficiently familiar with libvirt's internals to contribute 
too much to the discussion, but I'll try to share my experience with how 
networking behaves with post-copy migration.

First of all, I would strongly recommend disabling STP on bridges that 
are involved in post-copy migration. STP adds too much downtime, which 
goes pretty much against the benefits of post-copy live migration.

Second, I observed that qemu announces itself when the CPUs are resumed 
on the destination. Hence, at least from outside, it seems like the FDB 
are updated correctly.

Cristian.




More information about the libvir-list mailing list