[libvirt PATCH 0/5] nwfilter: fix deadlock with binding create/delete

Daniel P. Berrangé berrange at redhat.com
Fri Feb 25 15:30:48 UTC 2022


Historically the nwfilter driver didn't keep track of any
instances of filters. When it needed to rebuild filters it
called back to the virt drivers to ask them to re-create
the filters for their VMs. This lead to some complex locking
requirements where the virt driver needed to acquire locks
on the nwfilter driver before creating/deleting filter
instances.

Somewhat recently we introduced the virNWFilterBinding
concept which allowed the nwfilter driver to keep track of
all filter instances aka bindings. Thus it could rebuild
them without talking to the virt drivers. We never updated
the locking model when doing this though, so we were still
reliant on the virt drivers acquiring locks before creating
or deleting virNWFilterBinding objects.

When we started using the modular daemons though, the locks
acquired by the virt drivers no longer protected the nwfilter
driver as they were separate processes. Thus the race condition
fixed back in

  commit 6e5c79a1b5a8b3a23e7df7ffe58fb272aa17fbfb
  Author: Daniel P. Berrangé <berrange at redhat.com>
  Date:   Wed Jan 22 17:28:29 2014 +0000

    Push nwfilter update locking up to top level

has now re-appeared when running modular daemons. In fact
it is possible to trigger it even in libvirtd if you ignore
the virt drivers and directly call the virNWFilterBinding
APIs, though that won't hit users in practice.

The solution is surprisingly simple. Since the nwfilter
driver keeps track of virNWFilterBinding objects,  we
can simply put locking calls around the Create/Delete
methods for those objects, and remove the locking from
the virt drivers.

The only slight difference is that previous the locking
serialized the entire VM startup/shutdown sequence, while
the new locking only serializes individual NICs. This
should not matter in practice, since there's no shared
state between filters for NICs on the same VM.

This is a minimal fix suitable for the freeze. There is
scope for some significant improvements in locking which
I'm still working on. In particular we currently have a
terrible writer starvation problem with the RWLock that
protects the filter create/delete/redefine operations
that has existed forever. Repeatedly creating/deleting
virNWFilterBinding objects can prevent a virNWFilterDefineXML
call from ever running.

Daniel P. Berrangé (5):
  nwfilter: stop using recursive mutex for IP learning
  nwfilter: hold filter update lock when creating/deleting bindings
  qemu,lxc: remove use to nwfilter update lock
  nwfilter: remove decl of virNWFilterCreateVarHashmap
  nwfilter: make some gentech driver methods static

 src/lxc/lxc_driver.c                   |  6 ------
 src/nwfilter/nwfilter_driver.c         |  5 +++++
 src/nwfilter/nwfilter_gentech_driver.c |  4 ++--
 src/nwfilter/nwfilter_gentech_driver.h |  8 --------
 src/nwfilter/nwfilter_learnipaddr.c    |  2 +-
 src/qemu/qemu_driver.c                 | 18 ------------------
 src/qemu/qemu_migration.c              |  3 ---
 src/qemu/qemu_process.c                |  4 ----
 8 files changed, 8 insertions(+), 42 deletions(-)

-- 
2.35.1





More information about the libvir-list mailing list