[libvirt] [PATCH 3/5] Push nwfilter update locking up to top level

Stefan Berger stefanb at linux.vnet.ibm.com
Thu Jan 23 21:13:57 UTC 2014


On 01/23/2014 07:37 AM, Daniel P. Berrange wrote:
> The NWFilter code has as a deadlock race condition between
> the virNWFilter{Define,Undefine} APIs and starting of guest
> VMs due to mis-matched lock ordering.
>
> In the virNWFilter{Define,Undefine} codepaths the lock ordering
> is
>
>    1. nwfilter driver lock
>    2. virt driver lock
>    3. nwfilter update lock
>    4. domain object lock
>
> In the VM guest startup paths the lock ordering is
>
>    1. virt driver lock
>    2. domain object lock
>    3. nwfilter update lock
>
> As can be seen the domain object and nwfilter update locks are
> not acquired in a consistent order.
>
> The fix used is to push the nwfilter update lock upto the top
> level resulting in a lock ordering for virNWFilter{Define,Undefine}
> of
>
>    1. nwfilter driver lock
>    2. nwfilter update lock
>    3. virt driver lock
>    4. domain object lock
>
> and VM start using
>
>    1. nwfilter update lock
>    2. virt driver lock
>    3. domain object lock

Makes sense... there are other paths as well:

- SIGHUP or restart of libvirt that recreates all filters
- late instantiation of filters following detection of VM's IP address


> This has the effect of serializing VM startup once again, even if
> no nwfilters are applied to the guest.
>
> Signed-off-by: Daniel P. Berrange <berrange at redhat.com>
> ---
>   src/conf/nwfilter_conf.c               | 6 ------
>   src/lxc/lxc_driver.c                   | 6 ++++++
>   src/nwfilter/nwfilter_driver.c         | 8 ++++----
>   src/nwfilter/nwfilter_gentech_driver.c | 6 ------
>   src/qemu/qemu_driver.c                 | 6 ++++++
>   src/uml/uml_driver.c                   | 4 ++++
>   6 files changed, 20 insertions(+), 16 deletions(-)
>
> diff --git a/src/conf/nwfilter_conf.c b/src/conf/nwfilter_conf.c
> index 6db8ea9..e712ca5 100644
> --- a/src/conf/nwfilter_conf.c
> +++ b/src/conf/nwfilter_conf.c
> @@ -2990,14 +2990,12 @@ virNWFilterObjAssignDef(virNWFilterObjListPtr nwfilters,
>           return NULL;
>       }
>
> -    virNWFilterLockFilterUpdates();
>
>       if ((nwfilter = virNWFilterObjFindByName(nwfilters, def->name))) {
>
>           if (virNWFilterDefEqual(def, nwfilter->def, false)) {
>               virNWFilterDefFree(nwfilter->def);
>               nwfilter->def = def;
> -            virNWFilterUnlockFilterUpdates();
>               return nwfilter;
>           }


I think you should add a comment to this function that it must be called 
with this lock held.
With the removal of the locking from this function here you have to do 
the locking in the callers. Callers of virNWFilterObjAssignDef are:

src/conf/nwfilter_conf.c::virNWFilterObjLoad : I don't see this function 
grabbing the lock. I think this is missing.
src/conf/nwfilter_driver.c::nwfilterDefineXML : ok, found it below




> diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c
> index e319234..b1f8a89 100644
> --- a/src/lxc/lxc_driver.c
> +++ b/src/lxc/lxc_driver.c
> @@ -1015,6 +1015,8 @@ static int lxcDomainCreateWithFiles(virDomainPtr dom,
>
>       virCheckFlags(VIR_DOMAIN_START_AUTODESTROY, -1);
>
> +    virNWFilterLockFilterUpdates();
> +
>       if (!(vm = lxcDomObjFromDomain(dom)))
>           goto cleanup;
>
> @@ -1053,6 +1055,7 @@ cleanup:
>       if (event)
>           virObjectEventStateQueue(driver->domainEventState, event);
>       virObjectUnref(cfg);
> +    virNWFilterUnlockFilterUpdates();
>       return ret;
>   }
>
> @@ -1109,6 +1112,8 @@ lxcDomainCreateXMLWithFiles(virConnectPtr conn,
>
>       virCheckFlags(VIR_DOMAIN_START_AUTODESTROY, NULL);
>
> +    virNWFilterLockFilterUpdates();
> +
>       if (!(caps = virLXCDriverGetCapabilities(driver, false)))
>           goto cleanup;
>
> @@ -1164,6 +1169,7 @@ cleanup:
>           virObjectEventStateQueue(driver->domainEventState, event);
>       virObjectUnref(caps);
>       virObjectUnref(cfg);
> +    virNWFilterUnlockFilterUpdates();
>       return dom;
>   }

Probably these calls are correctly placed since all other creation 
functions funnel into them.Same for UML and QEMU drivers.


> diff --git a/src/nwfilter/nwfilter_gentech_driver.c b/src/nwfilter/nwfilter_gentech_driver.c
> index 89913cf8..f0e78ed 100644
> --- a/src/nwfilter/nwfilter_gentech_driver.c
> +++ b/src/nwfilter/nwfilter_gentech_driver.c
> @@ -935,7 +935,6 @@ _virNWFilterInstantiateFilter(virNWFilterDriverStatePtr driver,
>       int ifindex;
>       int rc;
>
> -    virNWFilterLockFilterUpdates();
>
>       /* after grabbing the filter update lock check for the interface; if
>          it's not there anymore its filters will be or are being removed
> @@ -964,7 +963,6 @@ _virNWFilterInstantiateFilter(virNWFilterDriverStatePtr driver,
>                                           foundNewFilter);
>
>   cleanup:
> -    virNWFilterUnlockFilterUpdates();
>
>       return rc;
>   }

This function is called by virNWFilterInstantiateFilter and 
virNWFilterUpdateInstantiateFilter. So, in the former case this is 
called when a VM is started, in the latter case if VMs' filters are 
updated while the VM is running. I think you are covering the VM 
creation case with the above calls for lxc and further below with the 
changes to the qemu driver and the uml driver.

There is at least one other part that needs to be covered:

src/conf/nwfilter_conf.c::virNWFilterInstFiltersOnAllVMs :: kicked off 
by nwfilterStateReload upon SIGHUP. We need a lock there now.

src/conf/nwfilter_conf.c::virNWFilterTriggerVMFilterRebuild::
     - called by virNWFilterTestUnassignDef:
         - called by src/nwfilter/nwfilter/nwfilterUndefine:: You seem 
to just reorder some locks there; now we need a (writer) lock there
     - called by virNWFilterObjAssignDef: which must be called with lock 
called following above reasoning




> @@ -984,8 +982,6 @@ virNWFilterInstantiateFilterLate(virNWFilterDriverStatePtr driver,
>       int rc;
>       bool foundNewFilter = false;
>
> -    virNWFilterLockFilterUpdates();
> -
>       rc = __virNWFilterInstantiateFilter(driver,
>                                           vmuuid,
>                                           true,
> @@ -1009,8 +1005,6 @@ virNWFilterInstantiateFilterLate(virNWFilterDriverStatePtr driver,
>           }
>       }
>
> -    virNWFilterUnlockFilterUpdates();
> -
>       return rc;
>   }

This function here is called by
src/nwfilter/nwfilter_learnip.c::learnIPAddressThread
src/nwfilter/nwfilter_dhcpsnoop.c::virNWFilterSnoopIPLeaseInstallRule

They instantiate the filters once a VM's IP address has been detected. 
So this is where the *Late() comes from.

If you remove the locking from here, you have to lock it there. 
Considering what you do layer, I would keep the lock here and convert 
this into a reader lock layer on.


The farther away we do this from the calls where the actual action is 
happening, the more tricky this becomes. I would almost recommend to 
plant an assert into those far way functions that tests whether the lock 
has been grabbed.

Regards,
Stefan




More information about the libvir-list mailing list