[augeas-devel] improving performance of aug_get() and aug_match() with large datasets

Dominic Cleal dcleal at redhat.com
Thu Sep 24 09:27:34 UTC 2015


On 22/09/15 20:18, Laine Stump wrote:
> It was bound to happen eventually. Someone created a host with 514 vlan 
> interfaces each connected to a host bridge, then started up 
> virt-manager. virt-manager likes to learn the status of all the network 
> interfaces on a host by calling libvirt (the equivalent of "virsh 
> iface-list --all" followed by "virsh iface-dumpxml bobloblaw" for each 
> interface). libvirt makes some calls to the netcf library, which queries 
> the interface config on disk using augeas (what amounts to aug_get() and 
> aug_match() calls). Too bad that when you have 514 vlan+bridge combos, 
> this operation takes ~20 minutes on good hardware!
> 
> I looked into the libvirt part of it and there were some obvious 
> inefficiencies (the function netcfConnectListAllInterfaces() ends up 
> calling ncf_if_mac_string() and ncf_if_name() multiple times for each 
> interface, when it could 1) call ncf_if_mac_string() once, and 2) never 
> call ncf_if_name() at all), but even fixing those only eliminates about 
> 20% of the total time. I then looked at removing all of the ncf_* calls 
> in the libvirt function (after the first call to receive a simple list 
> of interfaces) and found that we're still left with about 40% of the 
> total time. So there is a lot that can be done in libvirt, but 40% of 
> the time is still spent in netcf, with the majority of that in calls to 
> aug_get() and aug_match().
> 
> I have two questions based on this:
> 
> 1) has anyone thought about/looked into optimizing/changing the data 
> structure used to store nodes in augeas to scale better with larger 
> datasets (execution time seems to increase at > linear)?

Yes, I've seen something similar before - it was reported to us in the
context of a Puppet provider working on a huge file with many Nagios
service definitions.  When lots of nodes with the same name, but
different index (e.g. service[1], service[2]) exist then Augeas is
extremely slow to traverse paths with a high index value.

I spent a while profiling it and found a couple of very inefficient
memory operations - here's my branch:
https://github.com/hercules-team/augeas/compare/master...domcleal:ns-filter-perf3

The problem is that it broke a couple of tests and I ran out of time
before I could find the root cause.

I'd love to know if that speeds up your aug_* calls, possibly at the
expense of a few things breaking.  I don't know when I'll get to revisit
it, but may try again soon if it's useful.

The other really simple optimisation was to change the lens itself from
an indexed set of entries to `seq`, which sets labels of the form 1, 2,
3...n.  Since they were named differently, it was very efficient to
traverse - at the expense of breaking compatibility for lens users.

> 2) I recall that a long time ago augeas put in code to re-read/parse 
> files only if they had been modified. netcf (and thus libvirt) could 
> take advantage of this info if it was available in the augeas API - the 
> first time it retrieved the info for an interface it would take a hit, 
> but all subsequent times could be much quicker.

The mtime of the file is stored under /augeas, e.g.
/augeas/files/etc/hosts/mtime = "1418632339".  IIRC, it's compared when
aug_load's called but you could also get it easily.

-- 
Dominic Cleal
Red Hat Engineering




More information about the augeas-devel mailing list