[libvirt] [PATCH] qemu: Unlock NW filter lock earlier when not necessary

Daniel P. Berrange berrange at redhat.com
Thu Apr 30 12:47:23 UTC 2015


On Thu, Apr 30, 2015 at 02:38:21PM +0200, Michal Privoznik wrote:
> On 30.04.2015 14:04, Erik Skultety wrote:
> > The lock is dropped always at the end of an API, but for example when
> > attaching devices, there's no point in having the NW filter locked if the
> > device being attached isn't a network interface. It's always a nice
> > practice to drop unnecessary locks as soon as possible.
> > ---
> >  src/qemu/qemu_driver.c | 20 ++++++++++++++++++--
> >  1 file changed, 18 insertions(+), 2 deletions(-)
> > 
> > BZ 1181074 reported the daemon to hang (not completely true btw) after trying
> > to attach /dev/ttyX to a running domain. When going through the code I realized
> > we acquire a read lock for NW filter, although the device itself is not a network
> > interface and then releasing it at the end. First clue was that this lock which
> > won't be unlocked as the thread is blocked in reading header of /dev/ttyX causes
> > the freeze. As it turned out, the problem resides in calling virsh list afterwards
> > (Peter already does have some patches prepared), but I think the NW filter might
> > be released earlier in this case with no harm done.
> > 
> > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> > index 74dcb0a..04887e0 100644
> > --- a/src/qemu/qemu_driver.c
> > +++ b/src/qemu/qemu_driver.c
> > @@ -8764,6 +8764,7 @@ static int qemuDomainAttachDeviceFlags(virDomainPtr dom, const char *xml,
> >      virDomainDeviceDefPtr dev = NULL, dev_copy = NULL;
> >      int ret = -1;
> >      unsigned int affect, parse_flags = VIR_DOMAIN_DEF_PARSE_INACTIVE;
> > +    bool nwfilter_locked = false;
> >      virQEMUCapsPtr qemuCaps = NULL;
> >      qemuDomainObjPrivatePtr priv;
> >      virQEMUDriverConfigPtr cfg = NULL;
> > @@ -8773,6 +8774,7 @@ static int qemuDomainAttachDeviceFlags(virDomainPtr dom, const char *xml,
> >                    VIR_DOMAIN_AFFECT_CONFIG, -1);
> >  
> >      virNWFilterReadLockFilterUpdates();
> > +    nwfilter_locked = true;
> 
> Wait a while. Whenever a programmer needs to know if a lock is locked,
> something is wrong with the design. That's why POSIX doesn't define such
> function. Then, this is a Read lock, so it won't hold back other
> readers, only writers. I don't think there's a real bottle neck unless
> you are doing a testcase and attaching thousands of devices and defining
> another thousands of NWFilters.
> 
> What I'm more curious is - why the heck does this lock needs to be on
> this level? It's very high level. We lock the lock even when we don't
> know yet if the XML is right, and what device type we will work on.
> Can't the lock be moved to lower layers?

We have to be careful of the lock ordering rules - see this note
in the commit where I refactoring nwfilter locking:

commit 6e5c79a1b5a8b3a23e7df7ffe58fb272aa17fbfb
Author: Daniel P. Berrange <berrange at redhat.com>
Date:   Wed Jan 22 17:28:29 2014 +0000

    Push nwfilter update locking up to top level
    
    The NWFilter code has as a deadlock race condition between
    the virNWFilter{Define,Undefine} APIs and starting of guest
    VMs due to mis-matched lock ordering.
    
    In the virNWFilter{Define,Undefine} codepaths the lock ordering
    is
    
      1. nwfilter driver lock
      2. virt driver lock
      3. nwfilter update lock
      4. domain object lock
    
    In the VM guest startup paths the lock ordering is
    
      1. virt driver lock
      2. domain object lock
      3. nwfilter update lock
    
    As can be seen the domain object and nwfilter update locks are
    not acquired in a consistent order.
    
    The fix used is to push the nwfilter update lock upto the top
    level resulting in a lock ordering for virNWFilter{Define,Undefine}
    of
    
      1. nwfilter driver lock
      2. nwfilter update lock
      3. virt driver lock
      4. domain object lock
    
    and VM start using
    
      1. nwfilter update lock
      2. virt driver lock
      3. domain object lock
    
    This has the effect of serializing VM startup once again, even if
    no nwfilters are applied to the guest. There is also the possibility
    of deadlock due to a call graph loop via virNWFilterInstantiate
    and virNWFilterInstantiateFilterLate.
    
    These two problems mean the lock must be turned into a read/write
    lock instead of a plain mutex at the same time. The lock is used to
    serialize changes to the "driver->nwfilters" hash, so the write lock
    only needs to be held by the define/undefine methods. All other
    methods can rely on a read lock which allows good concurrency.


Though some things have changed due to removal of the virt driver lock
in QEMU, we stil have to respect the ordering of the nwfilter vs domain
object lock

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list