[augeas-devel] Support for context-free languages

David Lutterkort lutter at redhat.com
Thu Feb 5 23:11:48 UTC 2009


On Thu, 2009-02-05 at 15:15 -0500, Francis Giraldeau wrote:
> I just would like to know if there are news about support for
> context-free languages within Augeas. Last summer, I assisted to the
> Linux Symposium presentation, and it was one specific limitation that
> should worked on. I'm conducting a master thesis and searching for a
> good topic ;)

It's still an open issue, and not much has happened on that front since
OLS. If you're interested in doing this for a master's thesis, I'd be
more than happy to assist you.

Just to give you a rough idea how I think this should be done: we'd add
recursive lenses to the lens language. As a contrived example, assume
you want to process the language of palindromes over 'ab', so that each
'a' becomes a new node in the tree, and each 'b' is dropped. A context
free grammar for palindromes is

        P := aPa | bPb | a | b | eps
        
As a lens, you'd write that something like

        let rec p = [ key /a/ . p . del /a/ "a" ]
                  | del /b/ /b/ . p . del /b/ "b"
                  | [ key /a/ ]
                  | del /b?/ "b"

So that p.get on "aabaa" would give you a tree { "a" { "a" } }

There's a few things that need to be done to support this

      * Expand the language to accept 'let rec' statements
      * Approximate the language that a recursive lens (both in the get
        and put direction) accepts with a regular language
      * Typechecking can be done as it is done so far using the regular
        approximations
      * Either use the existing parser with the regular approximations
        or write a CF parser, probably best an Earley-style parser, and
        use it to split input in the get and put directions. Either
        would probably involve converting the internal representation of
        a lens into something resembling a CF grammar more closely
        (though I am hoping there's a smarter way to compute the
        approximation)
      * It might be necessary to encode the trees into a string in a
        much smarter way than is done currently to make CF lenses useful
        in practice (but I'd consider that an additional step)

For background, the Boomerang paper[1] explains the theory behind lenses
and some of the lingo; Anders Moeller's paper[2] explains how CF
languages can be approximated by regular languages, in particular the
Mohri-Nederhof approximation he cites. Anders also has Java code[3] that
implements a good deal of this - of course, Augeas would need a C
implementation.

Let me know if you are interested, I'd be mroe than happy to explain
some more of the details and fill out the sketch about.

David

[1] http://www.cis.upenn.edu/~bcpierce/papers/boomerang.pdf
[2] http://www.brics.dk/~amoeller/papers/ambiguity/journal.pdf
[3] http://www.brics.dk/grammar/





More information about the augeas-devel mailing list