[augeas-devel] improving performance of aug_get() and aug_match() with large datasets
Laine Stump
laine at redhat.com
Fri Oct 2 18:50:14 UTC 2015
On 10/02/2015 02:32 PM, David Lutterkort wrote:
> On Thu, Oct 1, 2015 at 11:44 AM, Laine Stump <laine at redhat.com
> <mailto:laine at redhat.com>> wrote:
>
> On 09/22/2015 03:18 PM, Laine Stump wrote:
>
> It was bound to happen eventually. Someone created a host with
> 514 vlan interfaces each connected to a host bridge, then
> started up virt-manager. [blah blah boring blah removed]
>
> To update those not included in a separate thread on the topic in
> netcf-devel (I'll try to keep all discussion here from now on):
>
> Dan Berrange pointed out that netcf was calling aug_load() on each
> entry to a public netcf API, and libvirt was calling netcf APIs
> multiple times for each interface. Even though aug_load() checks the
> mtime of files it has already loaded, and avoids re-loading those
> that haven't been modified (in this case none have been modified),
> it turns out that just doing a stat() of 1100 files takes a
> significant amount of time. So I modified netcf to only call
> aug_load() to do this check if it has been at least 1 second since
> the last time it was called. This made a very large improvement,
> especially when running the upstream versions of all involved
> packages (virt-manager --> libvirt --> netcf --> augeas). But when
> running the versions that are included in RHEL6, it wasn't so rosy.
> A test setup of 514 bridge+vlan interfaces which took around 30
> minutes (!!) to complete a full startup of virt-manager (which calls
> netcf/augeas to list all interfaces, then get the XML config for
> them) now takes 13 minutes with netcf modified to call aug_load()
> only once per second. (the same operation takes "only" 8 minutes
> using all upstream code).
>
> But 13 (or even 8) minutes is still a very long time, so I played
> around a bit in gdb and found that most of the time now seems to be
> spent in one call to aug_match():
>
>
> r = aug_match(aug, path, "/files/etc/sysconfig/network-scripts/*[
> DEVICE = 'br1' or BRIDGE = 'br1' or MASTER = 'br1' or MASTER =
> ../*[BRIDGE = 'br1']/DEVICE ]/DEVICE");
>
> (this is the result of a call to netcf's aug_fmt_match() in the
> netcf function aug_get_xml_for_nif())
>
> When I step over that call to aug_match(), there is a very
> noticeable pause before the gdb prompt comes back, while continuing
> from that point all the way through virt-manager's "get all
> interfaces" loop back to the next call to aug_get_xml_for_nif()
> (including several other calls to aug_match() that have much simpler
> search expressions) seems to happen instantly.
>
> So apparently doing a match against all ifcfg files based on this
> complex match expression is really slowing us down. Any ideas on how
> to either make this expression simpler, or alternately how to get
> augeas doing the search more quickly?
>
>
> Was that with the performance stuff I did a few days ago ? (You'd need
> Augeas HEAD for that)
No, I am running the augeas that comes with Fedora 22 (1.4.0-1) (or
alternately, the one that comes with RHEL6.7 - an ancient 1.0.0). Let me
see if I can successfully make augeas rpms from upstream (in the middle
of "make distcheck right now) and see if there's a difference with the
latest code.
>
> Alternatively, can you send me your /etc/sysconfig/network-scripts ?
I actually have a set of scripts that create and destroy as many
vlan+bridge pairs as I want, based on a 3 line .config file in the same
directory. I'll tar that up and send it separately (I figure nobody
would appreciate a binary attachment sent to the list :-)
> (Fair warning: I will have no time to look into this next week)
>
>
> I have two questions based on this:
>
> 1) has anyone thought about/looked into optimizing/changing the
> data structure used to store nodes in augeas to scale better
> with larger datasets (execution time seems to increase at > linear)?
>
>
> From what Dominic turned up, the problem doesn't seem to be so much the
> data structure for the tree, as the fact that there was some O(n^2)
> behavior in building intermediate data structures.
>
> 2) I recall that a long time ago augeas put in code to
> re-read/parse files only if they had been modified. netcf (and
> thus libvirt) could take advantage of this info if it was
> available in the augeas API - the first time it retrieved the
> info for an interface it would take a hit, but all subsequent
> times could be much quicker.
>
>
> About this one - I'm wondering how well it would work out for augeas
> to use inotify to learn about modifications to files (including the
> directory that the ifcfg files live in, in case a new file is
> created). It works okay for netcf to avoid calling aug_load() (as
> mentioned above), but it does make me a bit uncomfortable that we
> sometimes have a mistaken view of the config.
>
>
> It would definitely be a possibilty - we would still need to queue
> notifications from inotify and only act on them when the user calls
> aug_load to avoid things like destroying changes the user made; IOW, it
> still needs to stay predictable when the tree changes based on changes
> in the FS. It's been a while since I've looked at inotify, but I think
> it would also introduce a Linux dependency; we could work around that by
> only using it where available, and falling back to today's behavior.
More information about the augeas-devel
mailing list