[libvirt] Ongoing work on lock contention in qemu driver?

Mon Jun 3 11:05:27 UTC 2013

On Fri, May 24, 2013 at 11:37:04AM -0400, Peter Feiner wrote:
> On Wed, May 22, 2013 at 7:31 PM, Peter Feiner <peter at gridcentric.ca> wrote:
> > Since some security driver operations are costly, I think it's
> > worthwhile to reduce the scope of the security manager lock or
> > increase the granularity by introducing more locks. After a cursory
> > look, the security manager lock seems to have a much broader scope
> > than necessary. The system / library calls underlying the security
> > drivers are all thread safe (e.g., defining apparmor security profiles
> > or chowning disk files), so a global lock isn't strictly necessary.
> > Moreover, since most virSecurity calls are made whilst a virDomainObj
> > lock is held and the security calls are generally domain specific,
> > *most* of the security calls are probably thread safe in the absence
> > of the global security manager lock. Obviously some work will have to
> > be done to see where the security lock actually matters and some
> > finer-grained locks will have to be introduced to handle these
> > situations.
> 
> To verify that this is worthwhile, I disabled the apparmor driver
> entirely. My 20 VM creation test ran about 10s faster (down from 35s
> to 25s).
> 
> After giving this approach a little more thought, I think an
> incremental series of patches is a good way to go. The responsibility
> of locking could be pushed down into the security drivers. At first,
> all of the drivers would lock where their managers' locked. Then each
> driver could be updated to do more fine-grained locking. I'm going to
> work on a patch to push the locking down into the drivers, then I'm
> going to work on a patch for better locking in the apparmor driver.

Yep, that sounds like a sane approach to me. Previously the security
drivers had no locking at all, since they were relying on the global
lock at the QEMU driver level. When I introduced the lock into the
security manager module, I was pessimistic and used coarse locking.
As you say, we can clearly relax this somewhat, if we have the locking
in the individual security drivers.

> > I also think it's worthwhile to eliminate locking from the the
> > virDomainObjList lookups and traversals. Since virDomainObjLists are
> > accessed in a bunch of places, I think it's a good defensive idea to
> > decouple the performance of these accesses from virDomainObj locks,
> > which are held during potentially long-running operations like domain
> > creation. An easy way to divorce virDomainObjListSearchName from the
> > virDomainObj lock would be to keep a copy of the domain names in the
> > virDomainObjList and protect that list with the virDomainObjList lock.
> 
> After removing the security driver contention, this was still a
> substantial bottleneck: virConnectDefineXML could still take a few
> seconds. I removed the contention by keeping a copy of the domain
> definition's name in the domain object. Since the name is immutable
> and the domain object is protected by the list lock, the list
> traversal can read the name without taking any additional locks. This
> patch reduced virConnectDefineXML to tens of milliseconds.

Yep, I had a patch to add a security hash table to the domain object
list, hashing based on name, but I lost the code when a disk died.
I didn't find it made any difference, but agree we should just do it
anyway, since it'll almost certainly be a problem in some scenarios.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|