[augeas-devel] httpd.conf and beyond

Wed Mar 17 21:08:11 UTC 2010

On Wed, 2010-03-17 at 15:02 -0400, Francis Giraldeau wrote:
> Le vendredi 19 février 2010 à 15:19 -0800, David Lutterkort a écrit :
> > 
> >         let directives = (directive1|directive2|...|directiveN)*
> > 
> > Because there are so many directives, the typechecker runs out of
> > memory, even if you give it a lot of memory. This is particularly
> > annoying since it's perfectly fine to _use_ the lens, i.e. even though
> > you can't typecheck the lens, you can use them for parsing without a
> > noticeable performance problem (if you're willing to assume you have no
> > type errors ;).
> 
> If I understand correctly, typechecking union involves computing the
> intersection of two automaton, which has complexity of n^4 and each lens
> have to be tested against each other, that is n^2, the total runtime
> complexity is n^6, is that right?

Possibly ;) The limitation is more in the constants than the overall
runtime behavior, since we actually run out of memory.

The issue isn't so much with intersections (what Anders calls vertical
ambiguity), but with concatenation (horizontal ambiguity in Anders'
nomenclature): when you write down a lens l*, the typechecker checks
that the repetition of l is unambiguous, which boils down to checking
that the concatenation l . l* is unambiguous.

Horizontal ambiguity is expensive since it involves three intersections
of automata that are roughly twice the size of the initial automata. The
OOM happens in fa_ambig_example at line 2835 'amb = fa_intersect(b1,
b2)'

There are other things that slow down the typechecker unnecessarily: for
example, there's no memoization, i.e., if you write 'l*' in several
places, they each get checked separately - you can work around that by
hand by assigning those constructs to a variable, and using those
instead.

> > I've been trying to address that by playing various tricks with reguar
> > expressions inside libfa (the OOM happens in fa_ambig_example) - the
> > next thing to try, when I get some time, is to convert regular
> > expressions to finite automata using a derivative-based approach[1] I've
> > got some patches to do the canonization of regular expressions their
> > construction requires, I just haven't had a chance to implement their
> > DFA construction.
> 
> As I understand, we could get smaller automaton from building them from
> a derivative-base approach. Do you think that this method will generate
> significantly smaller automaton? 

The proof of the pudding is in the eating ;) The comparisons in Turon's
paper look encouraging, and Nate has had good experience with the
derivative approach in Boomerang; ultimately, it's something we'll only
find out by trying it.

> Would it be possible to assign priority to lenses? The order in which
> lenses are written could be their priority order. 

Wouldn't that implicitly turn 'l1|l2' into 'l1|(l2-l1)' ? The real
problem is also in checking horizontal ambiguity, where priority won't
help.

The quickest way forward if probably a httpd lens that is a little too
lax, i.e. that allows nesting of any section in any other section in
httpd.conf, and refine that slowly; very similar to the lens attached to
bug #100.

David