[libvirt] [RFC]KVM VMs stay in pause state too long in finish phase during migration

Wang Rui moon.wangrui at huawei.com
Thu Jul 3 12:20:28 UTC 2014


Hi,
I started a VM on KVM environment(libvirt1.2.6 qemu1.5.1).
I found that the startup thread keeps vm lock too long.
And this would cause other VMs paused(both on src and dest) too long
during migration.

Steps to Reproduce:
1. Define and start three VMs(VMA, VMB, VMC) on source host
   with 16 NICs for each. XML configuration for NIC:

    <interface type='bridge'>
      <source bridge='br0'/>
      <model type='virtio'/>
    </interface>

2. Migrate the three VMs from source host to destination host concurrently.
3. On destination host, the three VMs may do the following operation:
   1) VMA: do qemuProcessStart(get VM lock until end) -> .. ->
      virNetDevTapCreateInBridgePort(). This function costs time 0.28s per NIC,
      and 16 NICs costs about 4s.
      The following log shows time cost for creating NICs on my host. And it
      seems that the time of creating NICs is different between hosts.

    2014-07-03 08:40:41.283+0000: 47007: info : remoteDispatchAuthList:2781 : Bypass polkit auth for privileged client pid:47635,uid:0
    2014-07-03 08:40:41.285+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:41.560+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:41.852+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:42.144+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:42.464+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:42.756+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:43.076+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:43.372+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:43.680+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:43.972+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:44.268+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:44.560+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:44.720+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:44.804+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:44.888+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:45.040+0000: 47009: info : virNetDevProbeVnetHdr:203 : Enabling IFF_VNET_HDR
    2014-07-03 08:40:45.369+0000: 47009: warning : qemuDomainObjTaint:1670 : Domain id=7 name='suse11sp3_test_7' uuid=94c4ac0b-3a6a-41aa-b16c-80aa7adbc6b8 is tainted: high-privileges
    2014-07-03 08:40:45.708+0000: 47009: info : virSecurityDACSetOwnership:227 : Setting DAC user and group on '/home/wcj/DTS/suse11sp3_test_7' to '0:0'

   2) VMB: do qemuMigrationPrepareAny() -> virDomainObjListAdd(). This function acquires
      driver->doms lock. And then virHashSearch() waits for VMA's vm lock which is hold
      by VMA's qemuProcessStart() thread.
   3) VMC: do qemuDomainMigrateFinish3 -> virDomainObjListFindByName.
      This operation waits for driver->doms lock which is hold by VMB's
      qemuMigrationPrepareAny() thread.

   As VMC is in finish phase, it's in pause state on both source host and destination host.
   In the worst case, VMC may stay in pause state for about 4s during migration.
   And the pause time increased if we migrate more VMs concurrently.

   VirNetDevTapCreateInBridgePort() which holds vm lock costs a little long time.
   IMHO it would be nice to do some lock optimization for this case. (I think it makes
   little sense to optimize the time of creating net device. Because even though
   the time is reduced to 0.1s, if vm has 16 NICs, the vm lock will be held for more
   than 1.6s.)

   Any ideas?




More information about the libvir-list mailing list