[augeas-devel] [PATCH] Copying whitespace to new nodes

David Lutterkort lutter at redhat.com
Fri Apr 13 00:12:47 UTC 2012


Hi Dominic,

On Sun, 2012-04-08 at 14:22 +0100, Dominic Cleal wrote:
> I've put together an experimental patch that aims to copy whitespace
> from previous lines when creating new nodes.

I like the idea behind this patch; ultimately, this is about variations
on the del lens that use different things for the default, which is
currently a fixed string. IIRC, the original lens papers by Foster and
Pierce didn't define a del primitive; instead they had a more general
'<=>'/exchange primitive.

Whatever we do though, I think we should leave del to work as before and
introduce a new lens primitive to do this.

Incidentally, Boomerang has some lenses to handle alignment and similar
fun - might be worth checking out how they did that.

> Generally this is just to
> make files prettier, but we saw in the Cobbler/YAML case it's required
> as whitespace affects parsing.

In the YAML case though, don't we need to be able to express some sort
of nesting, so that we can correctly treat horrors like

        foo:
          bar1: 1
          bar2: 2
          bar3:
              baz1: 1
              baz2: 2
          bar4: 4

i.e., correctly insert entries in the above, even though they use
different levels of indentations at each level. And since hashes can be
nested arbitrarily deep, we need a mechanism that allows storing an
arbitrarily deep stack of 'old indents' - though I have toadmit, I
haven't thought too much what a generic YAML lens would look like.

> When a del lens is processed in the put direction, it now stores the
> last string that it restored (matched by RE).  If the lens is then used
> to create, this string is restored instead of the default string given
> by STR.  The default string STR is only used when no previous put has
> occurred.
>
> Since del is used for whitespace and any other characters that don't
> make it into the tree, this has the effect that they're "copied" from
> the last occurrence that del lens, so the file looks more consistent.
> 
> This mostly seems to work in practice.  The main problem with it is that
> lenses - especially for whitespace - are reused for many different
> fields in a file.  If for example Util.del_ws was used for /etc/fstab,
> the same del lens would be used five times for different field widths
> throughout a single line.  If a new entry was created, this patch would
> start restoring the width from the last deleted section on the last row,
> rather than the first deleted section.

Yes, the current impl of storing the last value in the lens will not be
understandable by users, since it highly depends on when and how lenses
are created.

> To fix this would mean duplicating a lot of del lenses, one for each
> use.

We can't expose this level of detail to the user.

>   Perhaps a better way would be to create a set of registers (like
> vim, or counter/seq) where deleted strings are stored and optionally
> assigned a name so the right one can be restored.  This would mean a new
> lens primitive, or addition to "del", so many changes would be needed to
> existing file lenses to use the functionality.

I highly favor a new lens primitive; at a minimum it should support
named 'registers'; looking through some of the lens papers makes me
think we should really have the user mark chunks, similar to how they
mark tree nodes, and use that as the basis for alignment; something like

        let l = << erase / */ . key /[a-z]+/ . erase / *= */ . store /[a-z]+/ . del "\n" "\n" >>*
        
And looking at this some more, I wonder if we shouldn't just overload
the [ .. ] operator as the boundary for what constitutes 'last' with
this new erase lens; we would store these last values in the struct skel
for an erase (generated in the parse_* functions in get.c) and then
(handwaving) percolate them up to the struct skel for the subtree;
during put, when we enter the subtree, we set those erase defaults in
the struct state, and use it when we encounter create_erase.

Actually, the difference between del and erase is so small that
internally we might want to represent it with the same L_DEL tag in
struct lens, and set a flag on the struct lens to distinguish between
the two.

David





More information about the augeas-devel mailing list