[augeas-devel] brics grammar

Francis Giraldeau francis.giraldeau at usherbrooke.ca
Mon Sep 27 19:39:32 UTC 2010


> > I do have two problems. The first is that regexp have few differences on
> > how they are escaped. For example, the single char "-" must be escaped,
> > and it's not the case with print_regexp. I don't know the impact of
> > changing the escaped_chars...
> 
> Do you mean escaped_chars in fa.c ? (There's also an escape in
> internal.c, but that's concerned with transforming a string from/to the
> equivalent C-like string format)
> 
> libfa uses extended POSIX regexp syntax. According to regex(7),
> unnecessarily escaping characters outside of character classes should be
> fine (since "-" and "\\-" both match only "-")

I narrowed down the problem. 

In fa.c, the dash should be escaped. In a character range, the dash and
the bracket must be escaped also. Double escape must be avoided. The
regexp must not be enclosed in slash. 

augeas     | brics
--------------------------
/[+-]/     | [+\-]
/\\{/      | \{

> It might be a little confusing in error messages since users will
> enter /a-b/ and error messages will tell them that something's wrong
> with /a\-b/

Yeah, I think the normal output should stay, and the brics format used
only for outputing the grammar for ambiguity analysis. At this time, I
did a quick and dirty patch to make brics checker happy, but it brakes
23 unit tests.

> Nice stuff ... I think the ability to hook brics' grammar ambiguity
> checker in will be a great help for hairier cases of cf grammars.

When playing with it, I found something anoying, because some ambiguity
are reported and they are not. 

Be this small grammar:

LETTER = [a]+ (MAX)

S[s1] : S E
S[s2] :
E[e1] : <LETTER>

The error is:
checking horizontal ambiguities...
horizontal check: S[s1] at index 1
*** horizontal ambiguity: S[s1]: S <--> E
ambiguous string: "aa"
matched as "" <--> "aa" or "a" <--> "a"
the grammar is ambiguous!


The language generated by this grammar is a^n, and since the MAX
argument is used for the regexp, there is no way to split the string in
two. The MAX argument is not taken into account, and it's a known
limitation [1]. How could we make the MAX keyword disambiguate this
grammar? 

Francis

---- 

[1] http://www.brics.dk/grammar/notation.html 
The ambiguity analyzer currently does not support unordered productions,
equality entities, and MAX regexps. 








More information about the augeas-devel mailing list