A change to string encoding

Tue Mar 10 18:51:34 UTC 2009

On Tue, 2009-03-10 at 11:07 +0000, Matthew Booth wrote:
> The problem with current string encoding is that it is parsable, but
> non-human readable. It also complicates parsing by requiring 2 different
> decoding methods to be implemented.
> 
> It occurs to me that a URL encoding scheme would also meet the parsing
> requirements. Additionally:
> 
> 1. It is always human readable.
> 2. There is only 1 encoding scheme.
> 3. Substring matching on encoded strings will always succeed.
> 
> URL encoding is just one way to achieve this, and has the advantage of
> being widely implemented. However, the minimal requirements would be a
> scheme which encoded only separator characters (whitespace in this case)
> without the use of those separators.
> 
> I'm sure this has been considered before. Given that it's a road I'm
> considering heading down, what were the reasons for not doing it?

Lack of code.  And history I guess.  What we have is fast and easy.  Any
encoding scheme must meet both of those.  It's come up before with the
basic agreement that what we have isn't great.  It works, is about the
best thing you can say about it.

Backwards compatibility is a big issue.  Any new code (in the kernel at
least) has to allow us to continue outputting the way we do for some
time.  I've said it before and I'll say it again, I'm willing to
entertain a new string encoding system in the kernel but I don't have
the time to write it.

There was talk that someone in the IPA project was going to write an
audit plugin that would re-encode strings to something they liked, but I
haven't seen it.

As long as you have some way to maintain backwards compatibility and
have the time to write it, I think just about any other string encoding
scheme would make people happier than what we have today...

-Eric