Network Stall when doing live migration

W Kern wkmail at bneit.com
Tue Sep 5 18:29:58 UTC 2023


Greetings.

I have a testbed setup of two stock Ubuntu22LTS libvirt Host installs 
using Shared Storage (MooseFS in this case, cuz it was readily available).

I have configured the MooseFS as a 'dir' pool on each machine and they 
are on the same mount /MFS using the MFS fusemount.

I am using an OVS bridge on each server to provide a live IP to the VMs. 
Each OVS installation is assigned to its own ethernet card and the two 
machines are on the same Cisco switch.

The Cisco switch is setup as a trunk with a VLAN, and Virsh connects the 
VM to the OVS instance with that VLAN tag

I can install and boot up individual VMs on each Host with no problem.

I can --offline migrate Domains from one host to the other using virsh 
migrate

Note the --unsafe which seems to be required and prevents me from using 
Cockpit for migration.


virsh migrate U22-TEST qemu+ssh://x.x.x.126/system --unsafe --offline 
--persistent  --undefinesource --abort-on-error

then

virsh start U22-TEST.   So that works fine.

So I am now trying a live migration using

virsh migrate U22-TEST qemu+ssh://x.x.x.126/system --unsafe --live 
--verbose --persistent  --undefinesource --abort-on-error

Which works as well. I see the migration percentage climbing up and at 
100% the transfer occurs with the VM down on the source and up on the 
second host. virsh console works at that point.

However, there is always a 2-3 minute period after the VM migrates (i.e. 
comes up on the destination host) when the networking is dead.

After the 3 minute wait, the VM suddenly responds to a ping, ports are 
open etc.  Most of the time any SSH connections have timed out by then.

I assume this is some sort of arp issue, but where?  Libvirt, OVS, the 
Cisco switch

Is there some sort of additional step, flag, or even IOS config 
suggestion that I can use to limit the network downtime?

As minor secondary issue, is there some additional XML flag (<shared>) I 
can pass to the storage pool XML to indicate that it really is shared 
media and doesn't need the --unsafe flag


-wk



More information about the virt-tools-list mailing list