[augeas-devel] regex syntax

David Lutterkort dlutter at redhat.com
Thu Jul 17 18:03:59 UTC 2008


On Thu, 2008-07-17 at 11:54 -0500, Greg_Swift at aotx.uscourts.gov wrote:
> augparse is helpful, although after that help I quickly got to the point
> where that was successful, but parse still failed.   To be quite honest, i
> think I like looking in augtool at /augeas/files/*/error and /lens more.
> Maybe that's just me.  Anyways.  I realize I need to write a test, and I
> will go and start learning that after I write this e-mail.

The simplest test you can write is something like

        module Test_informix
          let s = "<cut and paste your config file here>"
        
          test Informix.lns get s = ?
        
and then run that through augparse and look at the output; you'll
probably get parse failures the first time around, and often it is
easier to reduce the string you're testing with to smaller fragments of
the config file, since that makes it easier to understand parse errors
in detail.

> My question goes into why + over *.  I ask this because I tried manually
> playing with the regex that is generated by augeas at the cli from the
> lenses and taking them down to the smallest individual bits, and most of my
> matching, like [ \t]+, don't work. But if I put the * instead of the +,
> they work.  Now admittedly I haven't gotten my lens to fully parse yet
> anyways.

How exactly did you do this ? Augeas uses extended POSIX regexp
syntax[1] - that syntax is also used by some command line tools. For
playing with individual regexps, it's sometimes useful to play e.g. with
sed and do 'sed -r -e 's/MYREGEXP/FOO/' to see exactly what a regexp
matches ... like 'sed -r -e 's/[ \t]*/<spaces>/' will replace
whitespaces on an input line with the literal string '<spaces>'.

> I know * is zero or more occurrences, whereas I think + is one or more
> occurrences, correct?

Yes, that is correct.

> The current aliases.aug takes the following 2 formats ito account in
> /etc/aliases:
> 
> alias1:     target
> alias2:     target1, target2, target3

It really only take one format: the RHS is always a list of targets,
just that for some aliases the list has only one element. That's a litle
different from the discussion we had yesterday, where you were trying to
differentiate between single values and lists of values.

> Whereas myself, the mail server, and probably others might handle a list in
> this format:
> alias3:     target4,target5,target6
> 
> The line (13) looks like this:
>  let comma = del /,[ \t]+(\n[ \t]+)?/ ", "
> 
> but I think it should be:
>  let comma = del /,[ \t]*(\n[ \t]+)?/ ", "

Yeah, that seems to be wrong: there isn't really any need to have the
comma followed by a whitespace; I'll fix aliases.aug accordingly.

David

[1] http://www.regular-expressions.info/posix.html





More information about the augeas-devel mailing list