[augeas-devel] improving performance of aug_get() and aug_match() with large datasets

Laine Stump laine at redhat.com
Tue Sep 22 19:18:14 UTC 2015


It was bound to happen eventually. Someone created a host with 514 vlan 
interfaces each connected to a host bridge, then started up 
virt-manager. virt-manager likes to learn the status of all the network 
interfaces on a host by calling libvirt (the equivalent of "virsh 
iface-list --all" followed by "virsh iface-dumpxml bobloblaw" for each 
interface). libvirt makes some calls to the netcf library, which queries 
the interface config on disk using augeas (what amounts to aug_get() and 
aug_match() calls). Too bad that when you have 514 vlan+bridge combos, 
this operation takes ~20 minutes on good hardware!

I looked into the libvirt part of it and there were some obvious 
inefficiencies (the function netcfConnectListAllInterfaces() ends up 
calling ncf_if_mac_string() and ncf_if_name() multiple times for each 
interface, when it could 1) call ncf_if_mac_string() once, and 2) never 
call ncf_if_name() at all), but even fixing those only eliminates about 
20% of the total time. I then looked at removing all of the ncf_* calls 
in the libvirt function (after the first call to receive a simple list 
of interfaces) and found that we're still left with about 40% of the 
total time. So there is a lot that can be done in libvirt, but 40% of 
the time is still spent in netcf, with the majority of that in calls to 
aug_get() and aug_match().

I have two questions based on this:

1) has anyone thought about/looked into optimizing/changing the data 
structure used to store nodes in augeas to scale better with larger 
datasets (execution time seems to increase at > linear)?

2) I recall that a long time ago augeas put in code to re-read/parse 
files only if they had been modified. netcf (and thus libvirt) could 
take advantage of this info if it was available in the augeas API - the 
first time it retrieved the info for an interface it would take a hit, 
but all subsequent times could be much quicker.




More information about the augeas-devel mailing list