[augeas-devel] improving performance of aug_get() and aug_match() with large datasets

David Lutterkort lutter at watzmann.net
Sat Oct 3 22:05:01 UTC 2015


On Thu, Oct 1, 2015 at 11:44 AM, Laine Stump <laine at redhat.com> wrote:

But 13 (or even 8) minutes is still a very long time, so I played around a
> bit in gdb and found that most of the time now seems to be spent in one
> call to aug_match():
>
>
>   r = aug_match(aug, path, "/files/etc/sysconfig/network-scripts/*[ DEVICE
> = 'br1' or BRIDGE = 'br1' or MASTER = 'br1' or MASTER = ../*[BRIDGE =
> 'br1']/DEVICE ]/DEVICE");
>

Whoever wrote that code must have thought they were incredibly clever with
this query ;)

There's a few ways in which I think this can be sped up: for one, rather
than use 'or', we can build an intermediate nodeset for the first three
nodesets by matching

(1) /files/etc/sysconfig/network-scripts/*[(DEVICE|BRIDGE|MASTER) =
'br1']/DEVICE

The last term in that 'or' is very expensive since it constitutes a nested
loop, with "/files/etc/sysconfig/network-scripts/*" being the outer loop
("for each ifcfg file") and "../*[BRIDGE = 'br1']/DEVICE" being the inner
loop ("for each ifcfg file see if it is a BRIDGE and return its DEVICE").
That can be made a little more targetted by using

(2) /files/etc/sysconfig/network-scripts/*/MASTER[ . = ../*[BRIDGE =
'br1']/DEVICE ]

so that we only trigger the inner loop for ifcfg files that actually have a
MASTER entry. This helps if you don't have bonds - I suspect, if there are
any bonds on the system, the query will still be very expensive.

Making these two changes brings the time for the aug_match down from 680ms
to ~ 40ms on my machine, using NUMVLANS=514. The query that I ran for the
latter was ('|' produces the union of two nodesets)

(3) (/files/etc/sysconfig/network-scripts/*[(DEVICE|BRIDGE|MASTER) =
'brvlan42']|/files/etc/sysconfig/network-scripts/*/MASTER[ . = ../*[BRIDGE
= 'brvlan42']/DEVICE ])/DEVICE

Even better would be if we knew whether we need the whole MASTER business -
my recollection of this is dim, but I believe this query tries to find the
bond device for which the bridge is a slave. It might be faster for netcf
to run a query for that separately and then instead of query (2) do
something like

(4) /files/etc/sysconfig/network-scripts/*/MASTER[ . = '$master1' or . =
'$master2' ...]/DEVICE

Mocking this up with a query that assumes there are 'bond0' and 'bond1' on
the system brings the time for the query from ~ 40ms to 4ms on my machine.

Be warned: my memory of ifcfg-* files especially around bonding is kinda
hazy, and I might have screwed up these queries ...

Attached is a file of Augeas commands that I ran through 'augtool -e -r
/var/tmp/bridges-root'; all timings were from changing aug_match to print
the time taken from just after calling api_entry() to just before
api_exit() against current HEAD.

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/augeas-devel/attachments/20151003/e57ff7dd/attachment.htm>
-------------- next part --------------
# Original
match /files/etc/sysconfig/network-scripts/*[ DEVICE = 'brvlan42' or BRIDGE = 'brvlan42' or MASTER = 'brvlan42' or MASTER = ../*[BRIDGE = 'brvlan42']/DEVICE ]/DEVICE
#
#
# Turn first three 'or' terms into an inetrmediate nodeset, and trigger
# the inner loop for MASTER only if there actually are bonds
match (/files/etc/sysconfig/network-scripts/*[(DEVICE|BRIDGE|MASTER) = 'brvlan42']|/files/etc/sysconfig/network-scripts/*/MASTER[ . = ../*[BRIDGE = 'brvlan42']/DEVICE ])/DEVICE
#
#
# Assuming we have two bonds on the system
match (/files/etc/sysconfig/network-scripts/*[(DEVICE|BRIDGE|MASTER) = 'brvlan42']|/files/etc/sysconfig/network-scripts/*/MASTER[ . = 'bond0' or . = 'bond1' ])/DEVICE
-------------- next part --------------
citron:augeas (master)>./src/try -e -r /var/tmp/bridges-for-vlans/root/         augtool> # Original
augtool> match /files/etc/sysconfig/network-scripts/*[ DEVICE = 'brvlan42' or BRIDGE = 'brvlan42' or MASTER = 'brvlan42' or MASTER = ../*[BRIDGE = 'brvlan42']/DEVICE ]/DEVICE
aug_match(/files/etc/sysconfig/network-scripts/*[ DEVICE = 'brvlan42' or BRIDGE = 'brvlan42' or MASTER = 'brvlan42' or MASTER = ../*[BRIDGE = 'brvlan42']/DEVICE ]/DEVICE) = 2
Time: 661ms
/files/etc/sysconfig/network-scripts/ifcfg-p14p1.42/DEVICE = p14p1.42
/files/etc/sysconfig/network-scripts/ifcfg-brvlan42/DEVICE = brvlan42
augtool> #
augtool> #
augtool> # Turn first three 'or' terms into an inetrmediate nodeset, and triggeraugtool> # the inner loop for MASTER only if there actually are bonds
augtool> match (/files/etc/sysconfig/network-scripts/*[(DEVICE|BRIDGE|MASTER) = 'brvlan42']|/files/etc/sysconfig/network-scripts/*/MASTER[ . = ../*[BRIDGE = 'brvlan42']/DEVICE ])/DEVICE
aug_match((/files/etc/sysconfig/network-scripts/*[(DEVICE|BRIDGE|MASTER) = 'brvlan42']|/files/etc/sysconfig/network-scripts/*/MASTER[ . = ../*[BRIDGE = 'brvlan42']/DEVICE ])/DEVICE) = 2
Time: 36ms
/files/etc/sysconfig/network-scripts/ifcfg-p14p1.42/DEVICE = p14p1.42
/files/etc/sysconfig/network-scripts/ifcfg-brvlan42/DEVICE = brvlan42
augtool> #
augtool> #
augtool> # Assuming we have two bonds on the system
augtool> match (/files/etc/sysconfig/network-scripts/*[(DEVICE|BRIDGE|MASTER) = 'brvlan42']|/files/etc/sysconfig/network-scripts/*/MASTER[ . = 'bond0' or . = 'bond1' ])/DEVICE
aug_match((/files/etc/sysconfig/network-scripts/*[(DEVICE|BRIDGE|MASTER) = 'brvlan42']|/files/etc/sysconfig/network-scripts/*/MASTER[ . = 'bond0' or . = 'bond1' ])/DEVICE) = 2
Time: 4ms
/files/etc/sysconfig/network-scripts/ifcfg-p14p1.42/DEVICE = p14p1.42
/files/etc/sysconfig/network-scripts/ifcfg-brvlan42/DEVICE = brvlan42
augtool>


More information about the augeas-devel mailing list