[Freeipa-devel] [PATCH] 0015 Only split CSV strings once

Rob Crittenden rcritten at redhat.com
Wed Mar 21 17:14:09 UTC 2012


Petr Viktorin wrote:
> On 03/21/2012 04:12 PM, Dmitri Pal wrote:
>> On 03/21/2012 10:28 AM, Petr Viktorin wrote:
>>> On 03/20/2012 10:08 PM, Rob Crittenden wrote:
>>>> Petr Viktorin wrote:
>>>>> On 03/16/2012 12:55 PM, Petr Viktorin wrote:
>>>>>> On 03/15/2012 08:55 PM, Rob Crittenden wrote:
>>>>>>> Petr Viktorin wrote:
>>>>>>>> https://fedorahosted.org/freeipa/ticket/2227 (Unable to add certain
>>>>>>>> sudo
>>>>>>>> commands to groups). What an interesting bug to get :)
>>>>>>>>
>>>>>>>>
>>>>>>>> One problem with our CSV splitting is that it's not idempotent
>>>>>>>> (baskslashes are "eaten" when there are escaped commas), but
>>>>>>>> when we
>>>>>>>> forward a call it gets done on both the client and the server. The
>>>>>>>> attached patch fixes that in a somewhat hacky way. It's not a
>>>>>>>> complete
>>>>>>>> fix for the issue though, for that I need to ask what to do.
>>>>>>>>
>>>>>>>>
>>>>>>>> Some Params use the CSV format when we just need a list of
>>>>>>>> comma-separated values. Our flavor of CSV does escaping in two
>>>>>>>> different
>>>>>>>> ways. This is pretty bad for predictability, least-surprise,
>>>>>>>> documentation, ...
>>>>>>>>
>>>>>>>> To recap, the two ways are escaping (escaped commas are ignored,
>>>>>>>> backslashes also need to be escaped) and quoting (values are
>>>>>>>> optionally
>>>>>>>> enclosed in double quotes, to include a '"' in a quoted string, use
>>>>>>>> '""').
>>>>>>>> Escaping is good because users are used to it, but currently
>>>>>>>> literal
>>>>>>>> backslashes need to be doubled, making multivalue syntax different
>>>>>>>> from
>>>>>>>> single-value syntax, even if there are no commas in the value.
>>>>>>>> Quoting is weird, but complete by itself so it doesn't also need
>>>>>>>> escaping.
>>>>>>>>
>>>>>>>> Do we need to use both? Is this documented somewhere? Some of our
>>>>>>>> tests
>>>>>>>> check only for one, some assume the other. Users are probably even
>>>>>>>> more
>>>>>>>> confused.
>>>>>>>>
>>>>>>>>
>>>>>>>> If we only keep one of those, the fix for #2227 should be quite
>>>>>>>> easy.
>>>>>>>> If not (backwards compatibility), we need to document this
>>>>>>>> properly,
>>>>>>>> test all the corner cases, and fix the UI to handle CSV escaping.
>>>>>>>> Since it's subtly broken in lots of places, maybe it's best to
>>>>>>>> break
>>>>>>>> backwards compatibility and go for a simple solution now.
>>>>>>>>
>>>>>>>> Anyway, if I want to do a proper fix, CSV handling should be only
>>>>>>>> done
>>>>>>>> on the client. Unfortunately turning off CSV handling in the server
>>>>>>>> will
>>>>>>>> break the UI, which at places uses `join(',')` (no escaping
>>>>>>>> commas, no
>>>>>>>> single place to change it), even though it can just send a list.
>>>>>>>
>>>>>>> I'm with Honza, not too keen on the _forwarded_call part. I'd rather
>>>>>>> you
>>>>>>> do a mix of comparing self.env.in_server and whether the value is a
>>>>>>> basestring to determine if we need to split it.
>>>>>>
>>>>>> This was a hack necessary because the WebUI relied on server-side
>>>>>> splitting (in a few cases, and it did not escape correctly). Now that
>>>>>> Petr Vobornik's patch 99 is in, it is unnecessary.
>>>>>> This discussion thread has an updated patch (0015-03); what I'm
>>>>>> attaching now builds on top of that to add several new changes in
>>>>>> tests.
>>>>>> To summarize it: the splitting is only done in the client; from the
>>>>>> RPC
>>>>>> call on no CSV magic is involved at all.
>>>>>>
>>>>>> Please tell me your reasons for basestring checking. I think it's
>>>>>> entirely unnecessary and just adds complexity.
>>>>>> In the rest of the framework a string, as a parameter value, has the
>>>>>> same effect as a list of that one string. Why break this?
>>>>>> We are concerned about the API here, not the command line.
>>>>>> Shortcuts to
>>>>>> save the user's keystrokes in the common case should *not* win over
>>>>>> predictability and making sure the easier way is the correct,
>>>>>> robust way
>>>>>> to do it. Compare -- with basestring checking:
>>>>>>
>>>>>> some_command(arg=escape_commas(some_string))
>>>>>> - is correct
>>>>>>
>>>>>> some_command(arg=[some_string])
>>>>>> - is also correct
>>>>>>
>>>>>> some_command(arg=some_string)
>>>>>> - is WRONG because the string is not escaped
>>>>>> - but is the easiest solution!
>>>>>>
>>>>>> The problem is much worse because it is subtle: it only affects
>>>>>> values
>>>>>> with commas and backslashes, so *the wrong solution would work* in
>>>>>> most
>>>>>> of the cases.
>>>>>> And I can assure you plugin writers *would* pick the wrong way if it
>>>>>> would usually work. Our own web UI had examples.
>>>>>>
>>>>>> Contrast to my proposed behavior:
>>>>>> some_command(arg=some_string)
>>>>>> - is correct
>>>>>>
>>>>>> some_command(arg=[some_string])
>>>>>> - is also correct
>>>>>>
>>>>>> some_command(arg=escape_commas(some_string))
>>>>>> - is wrong, but doesn't make much sense because at this level you
>>>>>> don't
>>>>>> have to worry about CSV at all
>>>>>>
>>>>>> Basestring checking would add unnecessary complexity for both us
>>>>>> *and*
>>>>>> the plugin writer. The only case it would make it easier for the
>>>>>> plugin
>>>>>> writer is if the plugin took CSV values ­-- but why force a random
>>>>>> format, tailored for our specific frontend, into RPC clients?
>>>>>>
>>>>>> The client should translate the command line into a RPC call.
>>>>>> There's no
>>>>>> reason to arbitrarily expect the server to do part of the job.
>>>>>>
>>>>>>
>>>>>>> I'm not sure why this is broken out of normalize. Isn't this
>>>>>>> normalizing
>>>>>>> the incoming values into something more sane?
>>>>>>
>>>>>> "Normalization" implies it's an idempotent operation, which CSV
>>>>>> splitting is not (at least its current incarnation, without the
>>>>>> basestring checking):
>>>>>> r"a,b\,c" splits to ["a", "b,c"] which splits to ["a", "b", "c"]
>>>>>>
>>>>>>
>>>>>
>>>>> Just to make things clear: freeipa-pviktori-0015-04 is the latest
>>>>> version of this patch.
>>>>> It's complemented by freeipa-pviktori-0021-03 which adds the tests for
>>>>> this (and other things) (but contains one more refactor).
>>>>>
>>>>>
>>>>> If and when that's all reviewed and pushed, I'd like to discuss
>>>>> freeipa-pviktori-0018 (Simplify CSV escaping syntax). I'm not
>>>>> saying my
>>>>> solution there is perfect, but the flavor of Python's CSV parsing
>>>>> we're
>>>>> using now is quite quirky. It would be nice to get the syntax
>>>>> right, or
>>>>> at least better, while we're fixing this from a totally-broken state.
>>>>> Later there'll be backwards compatibility problems.
>>>>> But, don't let that hold back this patch.
>>>>>
>>>>
>>>> I'm still not completely convinced. This patch looks ok and I'm sold on
>>>> client side only parsing but this doesn't actually fix any of the CSV
>>>> bugs we have open.
>>>>
>>>> I'd like to evaluate this in more context.
>>>>
>>>> I think the simplest fix is to drop escapechar processing in the
>>>> csvreader and have users quote strings with commas instead, this is how
>>>> I originally intended it to work. The UI won't have to do anything
>>>> special since it will already be split and it isn't a lot to ask to put
>>>> quotes around a value on the CLI.
>>>>
>>>> This is my proposed fix on top of your patch:
>>>>
>>>> diff --git a/ipalib/parameters.py b/ipalib/parameters.py
>>>> index 80b175d..ec6020e 100644
>>>> --- a/ipalib/parameters.py
>>>> +++ b/ipalib/parameters.py
>>>> @@ -702,7 +702,7 @@ class Param(ReadOnly):
>>>> # csv.py doesn't do Unicode; encode temporarily as UTF-8:
>>>> csv_reader = csv.reader(self.__utf_8_encoder(unicode_csv_data),
>>>> dialect=dialect,
>>>> - delimiter=self.csv_separator, escapechar='\\',
>>>> + delimiter=self.csv_separator, quotechar='"',
>>>> skipinitialspace=self.csv_skipspace,
>>>> **kwargs)
>>>> for row in csv_reader:
>>>>
>>>> With this change:
>>>> - all the self-tests pass
>>>> - https://fedorahosted.org/freeipa/ticket/2417 is fixed
>>>> - https://fedorahosted.org/freeipa/ticket/2227 is fixed
>>>>
>>>> I tested with:
>>>>
>>>> $ ipa sudocmdgroup-add-member test --sudocmd='/bin/chown -R
>>>> apache\:developers /var/www/*/shared/log'
>>>>
>>>> $ ipa role-add-privilege --privileges='"u,r stuff"' test
>>>>
>>>> rob
>>>
>>> Yes, that works well. ACK on your change; here it is squashed in.
>>>
>>> https://fedorahosted.org/freeipa/ticket/2402 can wait for 3.0, then?
>>
>> Is it risky to take it in?
>>
>
>
> That issue can be solved by feeding individual lines to the CSV reader,
> and concatenating all lines into the output. I don't think it's too
> risky, but of course it is riskier than just changing a parser argument.
> Plus it's a minor issue.
>
> Rob?
>
>
> diff --git a/ipalib/parameters.py b/ipalib/parameters.py
> index 2b07c8f..f509ca5 100644
> --- a/ipalib/parameters.py
> +++ b/ipalib/parameters.py
> @@ -732,14 +732,15 @@ class Param(ReadOnly):
> if self.csv:
> if type(value) not in (tuple, list):
> value = (value,)
> - newval = ()
> + newval = []
> for v in value:
> if isinstance(v, basestring):
> - csvreader = self.__unicode_csv_reader([unicode(v)])
> - newval += tuple(csvreader.next()) #pylint: disable=E1101
> + lines = unicode(v).splitlines()
> + for row in self.__unicode_csv_reader(lines):
> + newval.extend(row)
> else:
> - newval += (v,)
> - return newval
> + newval.append(v)
> + return tuple(newval)
> else:
> return value

This approach seems ok, can you submit it as a formal patch review? I 
think this is probably acceptable as-is but the probability of someone 
actually doing the type paste to generate this problem is probably 
pretty low :-)

ACK on 0015-05. Pushed to master and ipa-2-2

rob




More information about the Freeipa-devel mailing list