[libvirt] [PATCH 00/11] Post-Copy Live Migration Support

Laine Stump laine at laine.org
Thu Dec 4 17:45:12 UTC 2014


On 12/04/2014 05:09 AM, Cristian KLEIN wrote:
> On 2014-12-04 10:40, Laine Stump wrote:
>> Currently, libvirt does create and activate simultaneously (and also
>> qemu does a gratuitous ARP request at some point, although I haven't
>> checked if it happens when qemu starts or when the guest CPUs are
>> started), and deactivate, destroy, and free all happen at pretty much
>> the same time as well. The former leads to problems like this one
>> reported by dgilbert:
>>
>>    https://bugzilla.redhat.com/show_bug.cgi?id=1081461
>>
>> This is just one of several possible variations of "some parts of the
>> network have incorrect information about where MAC X is currently
>> located"; when you mix in post-copy migration, and manual handling of
>> the bridge FDB
>> (https://www.redhat.com/archives/libvir-list/2014-December/msg00173.html),
>>
>> there are many opportunities for failure!

(BTW, sorry for interjecting this into your migration patches, after
this I'll create a new thread when/if I have more to say)

Another problem that I've discovered due to the haphazardness of netdev
initialization - the network "plugged" hook is called before the tap
device has been created, so the XML given to the hook will, if anything,
contain an out-of-date tap device name (and the tap device won't exist
to be manipulated by the hook anyway).


> First of all, I would strongly recommend disabling STP on bridges that
> are involved in post-copy migration. STP adds too much downtime, which
> goes pretty much against the benefits of post-copy live migration.

... as long as STP isn't required to avoid forwarding loops, and as long
as the admin is schooled enough to know they should disable it. But we
should either explicitly forbid (by logging an error if it's
encountered, or at least documenting that it doesn't work) or do
whatever we can to make it operate properly in those cases.

> Second, I observed that qemu announces itself when the CPUs are
> resumed on the destination.

Good to know.

> Hence, at least from outside, it seems like the FDB are updated correctly.

For a host bridge using the kernel's builtin flood/learning to populate
the fdb, that likely is the case. With the current state of the code,
things aren't so rosy for macvtap, nor for my new libvirt-managed fdb
updates. I need to work on that. (Mainly I posted to this thread to see
what other problems people may have encountered in this area, to make
sure all of them get handled, and to see if anyone thought my suggested
changes were crack-based.)




More information about the libvir-list mailing list