[libvirt] Ongoing work on lock contention in qemu driver?

Daniel P. Berrange berrange at redhat.com
Thu May 16 17:56:14 UTC 2013


On Thu, May 16, 2013 at 06:18:57PM +0100, Daniel P. Berrange wrote:
> On Thu, May 16, 2013 at 01:00:15PM -0400, Peter Feiner wrote:
> > > How many CPU cores are you testing on ?  That's a good improvement,
> > > but I'd expect the improvement to be greater as # of core is larger.
> > 
> > I'm testing on 12 Cores x 2 HT per code. As I'm working on teasing out
> > software bottlenecks, I'm intentionally running fewer tasks (20 parallel
> > creations) than the number of logical cores (24). The memory, disk and
> > network are also well over provisioned.
> > 
> > > Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we
> > > limit a single connection to only 5 RPC calls. Beyond that calls
> > > queue up, even if libvirtd is otherwise idle. OpenStack uses a
> > > single connection for everythin so will hit this. I suspect this
> > > would be why  virConnectGetLibVersion would appear to be slow. That
> > > API does absolutely nothing of any consequence, so the only reason
> > > I'd expect that to be slow is if you're hitting a libvirtd RPC
> > > limit causing the API to be queued up.
> > 
> > I hadn't tuned libvirtd.conf at all. I have just increased
> > max_{clients,workers,requests,client_requests} to 50 and repeated my
> > experiment. As you expected, virtConnectGetLibVersion is now very fast.
> > Unfortunately, the median VM creation time didn't change.
> > 
> > > I'm not actively doing anything in this area. Mostly because I've got not
> > > clear data on where any remaining bottlenecks are.
> > 
> > Unless there are other parameters to tweak, I believe I'm still hitting a
> > bottleneck. Booting 1 VM vs booting 20 VMs in parallel, the times for libvirt
> > calls are
> > 
> > virConnectDefineXML*: 13ms vs 4.5s
> > virDomainCreateWithFlags*: 1.8s vs 20s
> > 
> > * I had said that virConnectDefineXML wasn't serialized in my first email. I
> >   based that observation on a single trace I looked at :-) In the average case,
> >   virConnectDefineXML is affected by a bottleneck.
> 
> virConnectDefineXML would at least hit the possible bottleneck on
> the virDomainObjListAddLocked method. In fact that's pretty much
> the only contended lock I'd expect it to hit. Nothing else that
> it runs has any serious locking involved.
> 
> > Note that when I took these measurements, I also monitored CPU & disk
> > utilization.
> > During the 20 VM test, both CPU & disk were well below 100% for 97% of the test
> > (i.e., 60s test duration, measured utilization with atop using a 2
> > second interval,
> > CPU was pegged for 2s).
> > 
> > > One theory I had was that the virDomainObjListSearchName method could
> > > be a bottleneck, becaue that acquires a lock on every single VM. This
> > > is invoked when starting a VM, when we call virDomainObjListAddLocked.
> > > I tried removing this locking though & didn't see any performance
> > > benefit, so never persued this further.  Before trying things like
> > > this again, I think we'd need to find a way to actually identify where
> > > the true bottlenecks are, rather than guesswork.
> > 
> > Testing your hypothesis would be straightforward. I'll add some
> > instrumentation to
> > measure the time spent waiting for the locks and repeat my 20 VM experiment. Or,
> > if there's some systematic lock profiling in place, then I can turn
> > that on and report
> > the results.
> 
> There's no lock profiling support built-in to libvirt. I'm not sure
> of the best way introduce such support without it impacting the very
> thing we're trying to test.  Suggestions welcome
> 
> Perhaps a systemtap script would do a reasonable job at it though.
> eg record any stack traces associated with long futex_wait() system
> calls or something like that.

Oh someone has already written such a systemtap script

http://sourceware.org/systemtap/examples/process/mutex-contention.stp

I think that is preferrable to trying to embed special code in
libvirt for this task.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list