[augeas-devel] [PATCH 0 of 4] Some performance improvements

David Lutterkort dlutter at redhat.com
Fri Aug 8 22:46:22 UTC 2008


As twoerner and raphink noticed on IRC, Augeas is pretty slow when
processing large files, e.g. an /etc/hosts with 10000 lines.

These patches address some of the slowness by eliminating some quadratic
behavior and reducing general overhead in a few places.

To try them, I did two tests: 'augtool quit' and doing 'set
/files/etc/hosts/10/alias[1] newalias' and then 'save', the first to test
the speed of parsing, the second to test the speed of a complete roundtrip,
including writing a changed file out.

Each test was done on /etc/hosts files of varying sizes; the first column
in the tables below is the number of lines in those files, where each line
had an IP address, a canonical name and one alias.

Before applying these patches, I got the following times on my laptop
(T60). Note that I built with -O2 for the tests - optimization seems to
double the performance of Augeas in general.


    parse only             parse + save

    64   0.06s              64   0.09s
   128   0.04s             128   0.11s
   256   0.09s             256   0.18s
   512   0.07s             512   0.34s
  1024   0.11s            1024   0.61s
  2048   0.19s            2048   1.18s
  4096   0.41s            4096   2.60s
  8192   0.97s            8192   7.49s
 16384   2.65s           16384  36.25s
 32768   8.80s           32768  > 200s

After applying them, I get

    parse only             parse + save

    64   0.06s              64   0.10s
   128   0.05s             128   0.07s
   256   0.05s             256   0.09s
   512   0.08s             512   0.15s
  1024   0.09s            1024   0.25s
  2048   0.15s            2048   0.51s
  4096   0.28s            4096   1.13s
  8192   0.53s            8192   2.93s
 16384   1.03s           16384  11.72s
 32768   2.13s           32768 108.02s

That's still not as good as I would like it (especially for
saving). There's still two fairly obvious ways to optimize further:

  (1) The internal 'dict' data structure needs to be turned into a hash
      table (instead of being a linked list)
  (2) The regexp matcher is called way too often - we throw away a lot of
      information and regenerate that later by calling the matcher
      again. It would be much cleaner to change the internals so that they
      first construct an explicit parse tree and then process it, rather
      than the current way of interleaving the two

David

5 files changed, 98 insertions(+), 38 deletions(-)
src/get.c    |   53 +++++++++++++++++++++++++++++++++--------------------
src/list.h   |   30 ++++++++++++++++++++++++++++++
src/put.c    |   41 +++++++++++++++++++++++------------------
src/regexp.c |    8 ++++++++
src/syntax.h |    4 ++++




More information about the augeas-devel mailing list