[Freeipa-devel] [PATCH] 0018 - Simplify CSV escaping syntax

Petr Viktorin pviktori at redhat.com
Tue Feb 28 09:20:13 UTC 2012


On 02/27/2012 11:11 PM, Rob Crittenden wrote:
> Petr Viktorin wrote:
>> This depends on my patch 0015.
>> Since CSV escaping was entirely broken before that patch (however we
>> decide to fix the problem), let's also fix the escaping syntax itself,
>> without worrying about backwards compatibility.
>>
>> I tried to solve this according to Rob's comment on
>> https://fedorahosted.org/freeipa/ticket/2227:
>> > Whatever we come up with we need a more consistent way to handle
>> > escape characters so it works the same on single and multi-value
>> > attributes.
>>
>> My goal is that it will just do the right thing for 99.9% of uses, and
>> the 0.1% (or people who want to write robust scripts¹) should find a
>> short, clear paragraph in the documentation.
>>
>> Of course, values without backslashes, double quotes and newlines are
>> again completely unaffected.
>>
>>
>>
>> Commit message:
>>
>> Instead of the non-obvious semantics of Python's CSV module, just split
>> on commas that aren't escaped by a backslash.
>> Then, replace any commas that *are* escaped by backslashes by just
>> commas, and remove initial whitespace as needed.
>>
>>
>> This simpler approach fits our use case better: most importantly, it
>> doesn't require literal backslashes to be escaped, and doesn't "eat"
>> backslashes (unless they precede a comma). This means the CSV multivalue
>> format is, except for commas, the same as for single values.
>> It also does away with CSV's double-quoting syntax, making the format
>> more predictable (=scriptable).
>>
>> A drawback is that this format doesn't allow values that end with a
>> backslash, unless they're given at the end. The workaround is to either
>> give them at the end, or if more are needed,just use multiple options:
>> --someopt='val1,val2\' --someopt='val3\'
>>
>> Values that end with backslashes are so rare, this shouldn't be a
>> problem in practice.
>>
>>
>> Also, update the tests.
>>
>>
>> Fixed as a side effect:
>> https://fedorahosted.org/freeipa/ticket/2402
>> https://fedorahosted.org/freeipa/ticket/2417
>>
>
> I think I'd prefer to see this rebased into patch 15 so we have one
> cohesive thing to review.

Well, where the parsing happens and the CSV syntax are completely 
orthogonal issues, so wanted to keep the discussion separate.

> I'm not sure we can arbitrarily say that strings ending with \ are rare.
> That is very close to "who will need more than 640k of RAM?

No. You can still *use* >640k of RAM here, you just type a bit more. The 
CSV syntax is just a shortcut, so you can say `--someopt=a,b` instead of 
`--someopt=a --someopt=b`.
It's just the shortcut that doesn't work; you can still do 
`--someopt='a\' --someopt='b\'`. Or even `--someopt={a\\,b\\}` in Bash.

For this you get a syntax that works the same on single and multi-value 
attributes (unless commas are involved, of course).

> It doesn't look this handle someone purposely wanting an escaped comma
> in the string?

That's '\\,' -- the first '\' stays as it is, the '\,' turns into a comma.

> The reason we do the parsing on both sides is to save a round trip in
> case of errors. If we do it only one one side then it must be done only
> on the server side. We can't assume well-formed data coming from any
> client.

This is really for patch 15. But anyway:

Since only the client does the parsing, and both do validation, there's 
no round trip in case of parsing and validation errors.

We cannot parse CSV only on the server side, since then the client could 
not validate the data.

I'm not assuming well-formed data from the client. The CSV is split and 
unescaped on the client. (Also it's validated to prevent round-trips.) 
This operation results in a list of arbitrary strings. Just a list of 
strings, not a list of well-formed strings in some syntax.
This is then re-validated by the server, but this time without splitting 
CSV because we already have a list of raw strings.

-- 
Petr³




More information about the Freeipa-devel mailing list