[augeas-devel] improving performance of aug_get() and aug_match() with large datasets

Laine Stump laine at redhat.com
Fri Oct 2 18:50:14 UTC 2015


On 10/02/2015 02:32 PM, David Lutterkort wrote:
> On Thu, Oct 1, 2015 at 11:44 AM, Laine Stump <laine at redhat.com
> <mailto:laine at redhat.com>> wrote:
>
>     On 09/22/2015 03:18 PM, Laine Stump wrote:
>
>         It was bound to happen eventually. Someone created a host with
>         514 vlan interfaces each connected to a host bridge, then
>         started up virt-manager. [blah blah boring blah removed]
>
>     To update those not included in a separate thread on the topic in
>     netcf-devel (I'll try to keep all discussion here from now on):
>
>     Dan Berrange pointed out that netcf was calling aug_load() on each
>     entry to a public netcf API, and libvirt was calling netcf APIs
>     multiple times for each interface. Even though aug_load() checks the
>     mtime of files it has already loaded, and avoids re-loading those
>     that haven't been modified (in this case none have been modified),
>     it turns out that just doing a stat() of 1100 files takes a
>     significant amount of time. So I modified netcf to only call
>     aug_load() to do this check if it has been at least 1 second since
>     the last time it was called. This made a very large improvement,
>     especially when running the upstream versions of all involved
>     packages (virt-manager --> libvirt --> netcf --> augeas). But when
>     running the versions that are included in RHEL6, it wasn't so rosy.
>     A test setup of 514 bridge+vlan interfaces which took around 30
>     minutes (!!) to complete a full startup of virt-manager (which calls
>     netcf/augeas to list all interfaces, then get the XML config for
>     them) now takes 13 minutes with netcf modified to call aug_load()
>     only once per second. (the same operation takes "only" 8 minutes
>     using all upstream code).
>
>     But 13 (or even 8) minutes is still a very long time, so I played
>     around a bit in gdb and found that most of the time now seems to be
>     spent in one call to aug_match():
>
>
>        r = aug_match(aug, path, "/files/etc/sysconfig/network-scripts/*[
>     DEVICE = 'br1' or BRIDGE = 'br1' or MASTER = 'br1' or MASTER =
>     ../*[BRIDGE = 'br1']/DEVICE ]/DEVICE");
>
>     (this is the result of a call to netcf's aug_fmt_match() in the
>     netcf function aug_get_xml_for_nif())
>
>     When I step over that call to aug_match(), there is a very
>     noticeable pause before the gdb prompt comes back, while continuing
>     from that point all the way through virt-manager's "get all
>     interfaces" loop back to the next call to aug_get_xml_for_nif()
>     (including several other calls to aug_match() that have much simpler
>     search expressions) seems to happen instantly.
>
>     So apparently doing a match against all ifcfg files based on this
>     complex match expression is really slowing us down. Any ideas on how
>     to either make this expression simpler, or alternately how to get
>     augeas doing the search more quickly?
>
>
> Was that with the performance stuff I did a few days ago ? (You'd need
> Augeas HEAD for that)

No, I am running the augeas that comes with Fedora 22 (1.4.0-1) (or 
alternately, the one that comes with RHEL6.7 - an ancient 1.0.0). Let me 
see if I can successfully make augeas rpms from upstream (in the middle 
of "make distcheck right now) and see if there's a difference with the 
latest code.

>
> Alternatively, can you send me your /etc/sysconfig/network-scripts ?

I actually have a set of scripts that create and destroy as many 
vlan+bridge pairs as I want, based on a 3 line .config file in the same 
directory. I'll tar that up and send it separately (I figure nobody 
would appreciate a binary attachment sent to the list :-)

> (Fair warning: I will have no time to look into this next week)
>
>
>         I have two questions based on this:
>
>         1) has anyone thought about/looked into optimizing/changing the
>         data structure used to store nodes in augeas to scale better
>         with larger datasets (execution time seems to increase at > linear)?
>
>
>  From what Dominic turned up, the problem doesn't seem to be so much the
> data structure for the tree, as the fact that there was some O(n^2)
> behavior in building intermediate data structures.
>
>         2) I recall that a long time ago augeas put in code to
>         re-read/parse files only if they had been modified. netcf (and
>         thus libvirt) could take advantage of this info if it was
>         available in the augeas API - the first time it retrieved the
>         info for an interface it would take a hit, but all subsequent
>         times could be much quicker.
>
>
>     About this one - I'm wondering how well it would work out for augeas
>     to use inotify to learn about modifications to files (including the
>     directory that the ifcfg files live in, in case a new file is
>     created). It works okay for netcf to avoid calling aug_load() (as
>     mentioned above), but it does make me a bit uncomfortable that we
>     sometimes have a mistaken view of the config.
>
>
> It would definitely be a possibilty - we would still need to queue
> notifications from inotify and only act on them when the user calls
> aug_load to avoid things like destroying changes the user made; IOW, it
> still needs to stay predictable when the tree changes based on changes
> in the FS. It's been a while since I've looked at inotify, but I think
> it would also introduce a Linux dependency; we could work around that by
> only using it where available, and falling back to today's behavior.




More information about the augeas-devel mailing list