[Freeipa-devel] [PATCH] 0015 Only split CSV strings once

Petr Viktorin pviktori at redhat.com
Wed Mar 21 15:21:07 UTC 2012


On 03/21/2012 04:12 PM, Dmitri Pal wrote:
> On 03/21/2012 10:28 AM, Petr Viktorin wrote:
>> On 03/20/2012 10:08 PM, Rob Crittenden wrote:
>>> Petr Viktorin wrote:
>>>> On 03/16/2012 12:55 PM, Petr Viktorin wrote:
>>>>> On 03/15/2012 08:55 PM, Rob Crittenden wrote:
>>>>>> Petr Viktorin wrote:
>>>>>>> https://fedorahosted.org/freeipa/ticket/2227 (Unable to add certain
>>>>>>> sudo
>>>>>>> commands to groups). What an interesting bug to get :)
>>>>>>>
>>>>>>>
>>>>>>> One problem with our CSV splitting is that it's not idempotent
>>>>>>> (baskslashes are "eaten" when there are escaped commas), but when we
>>>>>>> forward a call it gets done on both the client and the server. The
>>>>>>> attached patch fixes that in a somewhat hacky way. It's not a
>>>>>>> complete
>>>>>>> fix for the issue though, for that I need to ask what to do.
>>>>>>>
>>>>>>>
>>>>>>> Some Params use the CSV format when we just need a list of
>>>>>>> comma-separated values. Our flavor of CSV does escaping in two
>>>>>>> different
>>>>>>> ways. This is pretty bad for predictability, least-surprise,
>>>>>>> documentation, ...
>>>>>>>
>>>>>>> To recap, the two ways are escaping (escaped commas are ignored,
>>>>>>> backslashes also need to be escaped) and quoting (values are
>>>>>>> optionally
>>>>>>> enclosed in double quotes, to include a '"' in a quoted string, use
>>>>>>> '""').
>>>>>>> Escaping is good because users are used to it, but currently literal
>>>>>>> backslashes need to be doubled, making multivalue syntax different
>>>>>>> from
>>>>>>> single-value syntax, even if there are no commas in the value.
>>>>>>> Quoting is weird, but complete by itself so it doesn't also need
>>>>>>> escaping.
>>>>>>>
>>>>>>> Do we need to use both? Is this documented somewhere? Some of our
>>>>>>> tests
>>>>>>> check only for one, some assume the other. Users are probably even
>>>>>>> more
>>>>>>> confused.
>>>>>>>
>>>>>>>
>>>>>>> If we only keep one of those, the fix for #2227 should be quite
>>>>>>> easy.
>>>>>>> If not (backwards compatibility), we need to document this properly,
>>>>>>> test all the corner cases, and fix the UI to handle CSV escaping.
>>>>>>> Since it's subtly broken in lots of places, maybe it's best to break
>>>>>>> backwards compatibility and go for a simple solution now.
>>>>>>>
>>>>>>> Anyway, if I want to do a proper fix, CSV handling should be only
>>>>>>> done
>>>>>>> on the client. Unfortunately turning off CSV handling in the server
>>>>>>> will
>>>>>>> break the UI, which at places uses `join(',')` (no escaping
>>>>>>> commas, no
>>>>>>> single place to change it), even though it can just send a list.
>>>>>>
>>>>>> I'm with Honza, not too keen on the _forwarded_call part. I'd rather
>>>>>> you
>>>>>> do a mix of comparing self.env.in_server and whether the value is a
>>>>>> basestring to determine if we need to split it.
>>>>>
>>>>> This was a hack necessary because the WebUI relied on server-side
>>>>> splitting (in a few cases, and it did not escape correctly). Now that
>>>>> Petr Vobornik's patch 99 is in, it is unnecessary.
>>>>> This discussion thread has an updated patch (0015-03); what I'm
>>>>> attaching now builds on top of that to add several new changes in
>>>>> tests.
>>>>> To summarize it: the splitting is only done in the client; from the
>>>>> RPC
>>>>> call on no CSV magic is involved at all.
>>>>>
>>>>> Please tell me your reasons for basestring checking. I think it's
>>>>> entirely unnecessary and just adds complexity.
>>>>> In the rest of the framework a string, as a parameter value, has the
>>>>> same effect as a list of that one string. Why break this?
>>>>> We are concerned about the API here, not the command line.
>>>>> Shortcuts to
>>>>> save the user's keystrokes in the common case should *not* win over
>>>>> predictability and making sure the easier way is the correct,
>>>>> robust way
>>>>> to do it. Compare -- with basestring checking:
>>>>>
>>>>> some_command(arg=escape_commas(some_string))
>>>>> - is correct
>>>>>
>>>>> some_command(arg=[some_string])
>>>>> - is also correct
>>>>>
>>>>> some_command(arg=some_string)
>>>>> - is WRONG because the string is not escaped
>>>>> - but is the easiest solution!
>>>>>
>>>>> The problem is much worse because it is subtle: it only affects values
>>>>> with commas and backslashes, so *the wrong solution would work* in
>>>>> most
>>>>> of the cases.
>>>>> And I can assure you plugin writers *would* pick the wrong way if it
>>>>> would usually work. Our own web UI had examples.
>>>>>
>>>>> Contrast to my proposed behavior:
>>>>> some_command(arg=some_string)
>>>>> - is correct
>>>>>
>>>>> some_command(arg=[some_string])
>>>>> - is also correct
>>>>>
>>>>> some_command(arg=escape_commas(some_string))
>>>>> - is wrong, but doesn't make much sense because at this level you
>>>>> don't
>>>>> have to worry about CSV at all
>>>>>
>>>>> Basestring checking would add unnecessary complexity for both us *and*
>>>>> the plugin writer. The only case it would make it easier for the
>>>>> plugin
>>>>> writer is if the plugin took CSV values ­-- but why force a random
>>>>> format, tailored for our specific frontend, into RPC clients?
>>>>>
>>>>> The client should translate the command line into a RPC call.
>>>>> There's no
>>>>> reason to arbitrarily expect the server to do part of the job.
>>>>>
>>>>>
>>>>>> I'm not sure why this is broken out of normalize. Isn't this
>>>>>> normalizing
>>>>>> the incoming values into something more sane?
>>>>>
>>>>> "Normalization" implies it's an idempotent operation, which CSV
>>>>> splitting is not (at least its current incarnation, without the
>>>>> basestring checking):
>>>>> r"a,b\,c" splits to ["a", "b,c"] which splits to ["a", "b", "c"]
>>>>>
>>>>>
>>>>
>>>> Just to make things clear: freeipa-pviktori-0015-04 is the latest
>>>> version of this patch.
>>>> It's complemented by freeipa-pviktori-0021-03 which adds the tests for
>>>> this (and other things) (but contains one more refactor).
>>>>
>>>>
>>>> If and when that's all reviewed and pushed, I'd like to discuss
>>>> freeipa-pviktori-0018 (Simplify CSV escaping syntax). I'm not saying my
>>>> solution there is perfect, but the flavor of Python's CSV parsing we're
>>>> using now is quite quirky. It would be nice to get the syntax right, or
>>>> at least better, while we're fixing this from a totally-broken state.
>>>> Later there'll be backwards compatibility problems.
>>>> But, don't let that hold back this patch.
>>>>
>>>
>>> I'm still not completely convinced. This patch looks ok and I'm sold on
>>> client side only parsing but this doesn't actually fix any of the CSV
>>> bugs we have open.
>>>
>>> I'd like to evaluate this in more context.
>>>
>>> I think the simplest fix is to drop escapechar processing in the
>>> csvreader and have users quote strings with commas instead, this is how
>>> I originally intended it to work. The UI won't have to do anything
>>> special since it will already be split and it isn't a lot to ask to put
>>> quotes around a value on the CLI.
>>>
>>> This is my proposed fix on top of your patch:
>>>
>>> diff --git a/ipalib/parameters.py b/ipalib/parameters.py
>>> index 80b175d..ec6020e 100644
>>> --- a/ipalib/parameters.py
>>> +++ b/ipalib/parameters.py
>>> @@ -702,7 +702,7 @@ class Param(ReadOnly):
>>> # csv.py doesn't do Unicode; encode temporarily as UTF-8:
>>> csv_reader = csv.reader(self.__utf_8_encoder(unicode_csv_data),
>>> dialect=dialect,
>>> - delimiter=self.csv_separator, escapechar='\\',
>>> + delimiter=self.csv_separator, quotechar='"',
>>> skipinitialspace=self.csv_skipspace,
>>> **kwargs)
>>> for row in csv_reader:
>>>
>>> With this change:
>>> - all the self-tests pass
>>> - https://fedorahosted.org/freeipa/ticket/2417 is fixed
>>> - https://fedorahosted.org/freeipa/ticket/2227 is fixed
>>>
>>> I tested with:
>>>
>>> $ ipa sudocmdgroup-add-member test --sudocmd='/bin/chown -R
>>> apache\:developers /var/www/*/shared/log'
>>>
>>> $ ipa role-add-privilege --privileges='"u,r stuff"' test
>>>
>>> rob
>>
>> Yes, that works well. ACK on your change; here it is squashed in.
>>
>> https://fedorahosted.org/freeipa/ticket/2402 can wait for 3.0, then?
>
> Is it risky to take it in?
>


That issue can be solved by feeding individual lines to the CSV reader, 
and concatenating all lines into the output. I don't think it's too 
risky, but of course it is riskier than just changing a parser argument.
Plus it's a minor issue.

Rob?


diff --git a/ipalib/parameters.py b/ipalib/parameters.py
index 2b07c8f..f509ca5 100644
--- a/ipalib/parameters.py
+++ b/ipalib/parameters.py
@@ -732,14 +732,15 @@ class Param(ReadOnly):
          if self.csv:
              if type(value) not in (tuple, list):
                  value = (value,)
-            newval = ()
+            newval = []
              for v in value:
                  if isinstance(v, basestring):
-                    csvreader = self.__unicode_csv_reader([unicode(v)])
-                    newval += tuple(csvreader.next()) #pylint: 
disable=E1101
+                    lines = unicode(v).splitlines()
+                    for row in self.__unicode_csv_reader(lines):
+                        newval.extend(row)
                  else:
-                    newval += (v,)
-            return newval
+                    newval.append(v)
+            return tuple(newval)
          else:
              return value




-- 
Petr³




More information about the Freeipa-devel mailing list