[augeas-devel] brics grammar
Francis Giraldeau
francis.giraldeau at usherbrooke.ca
Mon Sep 27 19:39:32 UTC 2010
> > I do have two problems. The first is that regexp have few differences on
> > how they are escaped. For example, the single char "-" must be escaped,
> > and it's not the case with print_regexp. I don't know the impact of
> > changing the escaped_chars...
>
> Do you mean escaped_chars in fa.c ? (There's also an escape in
> internal.c, but that's concerned with transforming a string from/to the
> equivalent C-like string format)
>
> libfa uses extended POSIX regexp syntax. According to regex(7),
> unnecessarily escaping characters outside of character classes should be
> fine (since "-" and "\\-" both match only "-")
I narrowed down the problem.
In fa.c, the dash should be escaped. In a character range, the dash and
the bracket must be escaped also. Double escape must be avoided. The
regexp must not be enclosed in slash.
augeas | brics
--------------------------
/[+-]/ | [+\-]
/\\{/ | \{
> It might be a little confusing in error messages since users will
> enter /a-b/ and error messages will tell them that something's wrong
> with /a\-b/
Yeah, I think the normal output should stay, and the brics format used
only for outputing the grammar for ambiguity analysis. At this time, I
did a quick and dirty patch to make brics checker happy, but it brakes
23 unit tests.
> Nice stuff ... I think the ability to hook brics' grammar ambiguity
> checker in will be a great help for hairier cases of cf grammars.
When playing with it, I found something anoying, because some ambiguity
are reported and they are not.
Be this small grammar:
LETTER = [a]+ (MAX)
S[s1] : S E
S[s2] :
E[e1] : <LETTER>
The error is:
checking horizontal ambiguities...
horizontal check: S[s1] at index 1
*** horizontal ambiguity: S[s1]: S <--> E
ambiguous string: "aa"
matched as "" <--> "aa" or "a" <--> "a"
the grammar is ambiguous!
The language generated by this grammar is a^n, and since the MAX
argument is used for the regexp, there is no way to split the string in
two. The MAX argument is not taken into account, and it's a known
limitation [1]. How could we make the MAX keyword disambiguate this
grammar?
Francis
----
[1] http://www.brics.dk/grammar/notation.html
The ambiguity analyzer currently does not support unordered productions,
equality entities, and MAX regexps.
More information about the augeas-devel
mailing list