[Freeipa-devel] [PATCH] 971 detect binary LDAP data

Tue Feb 28 17:58:58 UTC 2012

Jan Cholasta wrote:
> On 28.2.2012 18:02, Petr Viktorin wrote:
>> On 02/28/2012 04:45 PM, Rob Crittenden wrote:
>>> Petr Viktorin wrote:
>>>> On 02/28/2012 04:02 AM, Rob Crittenden wrote:
>>>>> Petr Viktorin wrote:
>>>>>> On 02/27/2012 05:10 PM, Rob Crittenden wrote:
>>>>>>> Rob Crittenden wrote:
>>>>>>>> Simo Sorce wrote:
>>>>>>>>> On Mon, 2012-02-27 at 09:44 -0500, Rob Crittenden wrote:
>>>>>>>>>> We are pretty trusting that the data coming out of LDAP matches
>>>>>>>>>> its
>>>>>>>>>> schema but it is possible to stuff non-printable characters into
>>>>>>>>>> most
>>>>>>>>>> attributes.
>>>>>>>>>>
>>>>>>>>>> I've added a sanity checker to keep a value as a python str type
>>>>>>>>>> (treated as binary internally). This will result in a base64
>>>>>>>>>> encoded
>>>>>>>>>> blob be returned to the client.
>
> I don't like the idea of having arbitrary binary data where unicode
> strings are expected. It might cause some unexpected errors (I have a
> feeling that --addattr and/or --delattr and possibly some plugins might
> not handle this very well). Wouldn't it be better to just throw away the
> value if it's invalid and warn the user?

This isn't for user input, it is for data stored in LDAP. User's are 
going to have no way to provide binary data to us unless they use the 
API themselves in which case they have to follow our rules.

>
>>>>>>>>>
>>>>>>>>> Shouldn't you try to parse it as a unicode string and catch
>>>>>>>>> TypeError to
>>>>>>>>> know when to return it as binary ?
>>>>>>>>>
>>>>>>>>> Simo.
>>>>>>>>>
>>>>>>>>
>>>>>>>> What we do now is the equivalent of unicode(chr(0)) which returns
>>>>>>>> u'\x00' and is why we are failing now.
>>>>>>>>
>>>>>>>> I believe there is a unicode category module, we might be able to
>>>>>>>> use
>>>>>>>> that if there is a category that defines non-printable characters.
>>>>>>>>
>>>>>>>> rob
>>>>>>>
>>>>>>> Like this:
>>>>>>>
>>>>>>> import unicodedata
>>>>>>>
>>>>>>> def contains_non_printable(val):
>>>>>>> for c in val:
>>>>>>> if unicodedata.category(unicode(c)) == 'Cc':
>>>>>>> return True
>>>>>>> return False
>>>>>>>
>>>>>>> This wouldn't have the exclusion of tab, CR and LF like using ord()
>>>>>>> but
>>>>>>> is probably more correct.
>>>>>>>
>>>>>>> rob
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Freeipa-devel mailing list
>>>>>>> Freeipa-devel at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/freeipa-devel
>>>>>>
>>>>>> If you're protecting the XML-RPC calls, it'd probably be better to
>>>>>> look
>>>>>> at the XML spec directly: http://www.w3.org/TR/xml/#charsets
>>>>>>
>>>>>> Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
>>>>>> [#x10000-#x10FFFF]
>>>>>>
>>>>>> I'd say this is a good set for CLI as well.
>>>>>>
>>>>>> And you can trap invalid UTF-8 sequences by catching the
>>>>>> UnicodeDecodeError from decode().
>>>>>>
>
> I don't think we should care about XML-RPC in LDAP-specific code at all.
> If you want to do some additional checks, do them in XML-RPC-specific code.

We are trusting that the data in LDAP matches its schema. This is just 
belt and suspenders verifying that it is the case.

rob