[augeas-devel] Re: [PATCH] Add PHP module and associated (basic) test file

Sat Jul 26 17:01:18 UTC 2008

On Sat, 2008-07-26 at 12:52 -0400, Nate Foster wrote:
> > I am pretty confident that switching from taking the union of a lot of
> > lenses to a union of strings/regexps will take care of the slowness. The
> > difference between 'l(r1) | l(r2) | .. | l(rn)' vs 'l(r1|r2|..|rn)' is
> > enormous in terms of the internal processing - when parsing the file,
> > the first version requires n regexp matches, whereas the second just
> > requires one, plus the regexps for the first version are _much_ bigger
> > than for the second.
> 
> A trick we do in Boomerang, which may be useful if you really do need
> a lens union and can't push it down into a union of regexps, is to
> parse
> 
> (l1 | l2 | l3 | l4)
> 
> as
> 
> ( ( l1 | l2 ) | ( l3 | l4 ) )
> 
> instead of
> 
> ( l1 | ( l2 | ( l3 | l4 ) ) )

I actually represent the union lens now as an array of sublenses, and I
could find out which branch to use with one regexp match (the glibc
matcher tells me which group matched); I just haven't implemented that.

Either way though, you wind up with much bigger regexps for the union of
lenses than for the lens of a regexp union; I suspect that that's the
real reason why things slow down - because the regexp matcher allocates
enormous data structures.

David