[augeas-devel] improving performance of aug_get() and aug_match() with large datasets

Laine Stump laine at redhat.com
Thu Oct 1 18:44:55 UTC 2015


On 09/22/2015 03:18 PM, Laine Stump wrote:
> It was bound to happen eventually. Someone created a host with 514 
> vlan interfaces each connected to a host bridge, then started up 
> virt-manager. [blah blah boring blah removed]
To update those not included in a separate thread on the topic in 
netcf-devel (I'll try to keep all discussion here from now on):

Dan Berrange pointed out that netcf was calling aug_load() on each entry 
to a public netcf API, and libvirt was calling netcf APIs multiple times 
for each interface. Even though aug_load() checks the mtime of files it 
has already loaded, and avoids re-loading those that haven't been 
modified (in this case none have been modified), it turns out that just 
doing a stat() of 1100 files takes a significant amount of time. So I 
modified netcf to only call aug_load() to do this check if it has been 
at least 1 second since the last time it was called. This made a very 
large improvement, especially when running the upstream versions of all 
involved packages (virt-manager --> libvirt --> netcf --> augeas). But 
when running the versions that are included in RHEL6, it wasn't so rosy. 
A test setup of 514 bridge+vlan interfaces which took around 30 minutes 
(!!) to complete a full startup of virt-manager (which calls 
netcf/augeas to list all interfaces, then get the XML config for them) 
now takes 13 minutes with netcf modified to call aug_load() only once 
per second. (the same operation takes "only" 8 minutes using all 
upstream code).

But 13 (or even 8) minutes is still a very long time, so I played around 
a bit in gdb and found that most of the time now seems to be spent in 
one call to aug_match():


   r = aug_match(aug, path, "/files/etc/sysconfig/network-scripts/*[ 
DEVICE = 'br1' or BRIDGE = 'br1' or MASTER = 'br1' or MASTER = 
../*[BRIDGE = 'br1']/DEVICE ]/DEVICE");

(this is the result of a call to netcf's aug_fmt_match() in the netcf 
function aug_get_xml_for_nif())

When I step over that call to aug_match(), there is a very noticeable 
pause before the gdb prompt comes back, while continuing from that point 
all the way through virt-manager's "get all interfaces" loop back to the 
next call to aug_get_xml_for_nif() (including several other calls to 
aug_match() that have much simpler search expressions) seems to happen 
instantly.

So apparently doing a match against all ifcfg files based on this 
complex match expression is really slowing us down. Any ideas on how to 
either make this expression simpler, or alternately how to get augeas 
doing the search more quickly?


> I have two questions based on this:
>
> 1) has anyone thought about/looked into optimizing/changing the data 
> structure used to store nodes in augeas to scale better with larger 
> datasets (execution time seems to increase at > linear)?
>
> 2) I recall that a long time ago augeas put in code to re-read/parse 
> files only if they had been modified. netcf (and thus libvirt) could 
> take advantage of this info if it was available in the augeas API - 
> the first time it retrieved the info for an interface it would take a 
> hit, but all subsequent times could be much quicker.

About this one - I'm wondering how well it would work out for augeas to 
use inotify to learn about modifications to files (including the 
directory that the ifcfg files live in, in case a new file is created). 
It works okay for netcf to avoid calling aug_load() (as mentioned 
above), but it does make me a bit uncomfortable that we sometimes have a 
mistaken view of the config.




More information about the augeas-devel mailing list