[Freeipa-devel] [PATCH] 971 detect binary LDAP data

Wed Mar 28 22:20:36 UTC 2012

Jan Cholasta wrote:
> On 29.2.2012 15:45, Rob Crittenden wrote:
>> Jan Cholasta wrote:
>>> On 28.2.2012 18:58, Rob Crittenden wrote:
>>>> Jan Cholasta wrote:
>>>>> On 28.2.2012 18:02, Petr Viktorin wrote:
>>>>>> On 02/28/2012 04:45 PM, Rob Crittenden wrote:
>>>>>>> Petr Viktorin wrote:
>>>>>>>> On 02/28/2012 04:02 AM, Rob Crittenden wrote:
>>>>>>>>> Petr Viktorin wrote:
>>>>>>>>>> On 02/27/2012 05:10 PM, Rob Crittenden wrote:
>>>>>>>>>>> Rob Crittenden wrote:
>>>>>>>>>>>> Simo Sorce wrote:
>>>>>>>>>>>>> On Mon, 2012-02-27 at 09:44 -0500, Rob Crittenden wrote:
>>>>>>>>>>>>>> We are pretty trusting that the data coming out of LDAP
>>>>>>>>>>>>>> matches
>>>>>>>>>>>>>> its
>>>>>>>>>>>>>> schema but it is possible to stuff non-printable characters
>>>>>>>>>>>>>> into
>>>>>>>>>>>>>> most
>>>>>>>>>>>>>> attributes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've added a sanity checker to keep a value as a python str
>>>>>>>>>>>>>> type
>>>>>>>>>>>>>> (treated as binary internally). This will result in a base64
>>>>>>>>>>>>>> encoded
>>>>>>>>>>>>>> blob be returned to the client.
>>>>>
>>>>> I don't like the idea of having arbitrary binary data where unicode
>>>>> strings are expected. It might cause some unexpected errors (I have a
>>>>> feeling that --addattr and/or --delattr and possibly some plugins
>>>>> might
>>>>> not handle this very well). Wouldn't it be better to just throw away
>>>>> the
>>>>> value if it's invalid and warn the user?
>>>>
>>>> This isn't for user input, it is for data stored in LDAP. User's are
>>>> going to have no way to provide binary data to us unless they use the
>>>> API themselves in which case they have to follow our rules.
>>>
>>> Well my point was that --addattr and --delattr cause an LDAP search for
>>> the given attribute and plugins might get the result of a LDAP search in
>>> their post_callback and I'm not sure if they can cope with binary data.
>>
>> It wouldn't be any different than if we had the value as a unicode.
>
> Let's see what happens if the mail attribute of a user contains invalid
> UTF-8 (ff 62 30 72 6b 65 64):
>
> $ ipa user-find jdoe
> --------------
> 1 user matched
> --------------
> User login: jdoe
> First name: John
> Last name: Doe
> Home directory: /home/jdoe
> Login shell: /bin/sh
> Email address: /2IwcmtlZA==
> UID: 526
> GID: 526
> Account disabled: False
> Password: False
> Kerberos keys available: False
> ----------------------------
> Number of entries returned 1
> ----------------------------
>
> $ ipa user-mod jdoe --addattr mail=jdoe at example.com
> ipa: ERROR: an internal error has occurred
>
> The internal error is:
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line
> 302, in wsgi_execute
> result = self.Command[name](*args, **options)
> File "/usr/lib/python2.7/site-packages/ipalib/frontend.py", line 438, in
> __call__
> ret = self.run(*args, **options)
> File "/usr/lib/python2.7/site-packages/ipalib/frontend.py", line 696, in
> run
> return self.execute(*args, **options)
> File "/usr/lib/python2.7/site-packages/ipalib/plugins/baseldap.py", line
> 1217, in execute
> ldap, dn, entry_attrs, attrs_list, *keys, **options
> File "/usr/lib/python2.7/site-packages/ipalib/plugins/user.py", line
> 532, in pre_callback
> entry_attrs['mail'] = self.obj._normalize_email(entry_attrs['mail'])
> File "/usr/lib/python2.7/site-packages/ipalib/plugins/user.py", line
> 338, in _normalize_email
> norm_email.append(m + u'@' + config['ipadefaultemaildomain'][0])
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0:
> invalid start byte
>
> $ ipa user-mod jdoe --delattr mail=/2IwcmtlZA==
> ipa: ERROR: mail does not contain '/2IwcmtlZA=='
>
> $ ipa user-mod jdoe --delattr mail=`echo 'ff 62 30 72 6b 65 64' | xxd -p
> -r`
> ipa: ERROR: UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in
> position 5: invalid start byte
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1242, in run
> sys.exit(api.Backend.cli.run(argv))
> File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1024, in run
> kw = self.parse(cmd, argv)
> File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1049, in parse
> return dict(self.parse_iter(cmd, kw))
> File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1058, in
> parse_iter
> yield (key, self.Backend.textui.decode(value))
> File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 136, in decode
> return value.decode(encoding)
> File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
> return codecs.utf_8_decode(input, errors, True)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 5:
> invalid start byte
> ipa: ERROR: an internal error has occurred
>
> I'm sure there is a lot more places in the code where things will break
> when you feed them arbitrary data.
>
>>
>> We treat the python type str as binary data. Anything that is a str gets
>> based64 encoded before json or xml-rpc transmission.
>>
>> The type unicode is considered a "string" and goes in the clear.
>>
>> We determine what this type should be not from the data but from the
>> schema. This is a big assumption. Hopefully this answer's Petr's point
>> as well.
>>
>> We decided long ago that str means Binary and unicode means String. It
>> is a bit clumsy perhaps python handles it well. It will be more clear
>> when we switch to Python 3.0 and we'll have bytes and str instead as
>> types.
>
> Well, this is all super-obvious and I'm not really sure why do you bring
> it up here...
>
> My concern is that with your approach, you can no longer be sure that an
> attribute value is a valid unicode string. This creates a
> maintainability burden, as you *always* have to keep in mind that you
> first have to check whether the value is valid or not before touching
> it. You can easily forget to include the check in new code and there is
> a lot of places in old code that need to be changed because of this (as
> you can see in the mail example above).
>
> Let me quote John Dennis
> (<http://www.redhat.com/archives/freeipa-devel/2012-January/msg00075.html>):
>
>> I'm very interested in this work. I too have recognized that we have a
>> very real structural problem with how we handle the types we use
>> internally, especially when it corresponds to an external type and
>> requires conversion. In fact we do a remarkably bad job in this area, we
>> have conversions which are done ad hoc all over the place. There is no
>> well established place for the conversions to occur and it's difficult
>> to know at any point in the code what type to expect.
>
> Your patch adds yet another occurence of "it's difficult to know at any
> point in the code what type to expect". IMO this is very bad and we
> should not do this at all in new code.
>
> I'm not sure if just dropping bad values (as I have suggested before) is
> the best idea, but IMHO anything is better than letting these bad values
> into the bowels of IPA.
>
>>
>>>> We are trusting that the data in LDAP matches its schema. This is just
>>>> belt and suspenders verifying that it is the case.
>>>
>>> Sure, but I still think we should allow any valid unicode data to come
>>> from LDAP, not just what is valid in XML-RPC.
>>
>> This won't affect data coming out of LDAP, only the way it is
>> transmitted to the client.
>
> It will affect the data flowing through IPA, after we get it from LDAP,
> before it is transmitted through XML-RPC.
>
> In other words, I think it is more correct to do this:
>
> LDAP -> check unicode validity -> (rest of IPA) -> check XML-RPC
> validitity -> XML-RPC
>
> than this:
>
> LDAP -> check unicode validity -> check XML-RPC validity -> (rest of
> IPA) -> XML-RPC
>
> (Let's keep things isolated please.)
>
>>
>> rob
>
> Honza
>

I'm using a much narrower scope. I'm not trying to make it easy to 
manage non-printable characters, just not blow things up if they exist. 
Limiting to the XML-RPC supported set is for convenience, these are 
unprintable characters in any context. This is just a two-fer.

Petr was right, I need to encode to unicode before doing this comparison 
in order to compare invalid unicode characters so I moved that first.

I added a very basic unit test.

If you're wondering where this data might come from, two ways are via AD 
sync and migration.

Yes, the user will be left in a situation where they'll need to use 
--setattr or ldapmodify to manage the data in the field. The UI doesn't 
show the value at all but instead shows [object Object] (no errors in 
console, not sure where this is coming from). It is possible to 
highlight this and insert a new value though.

rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: freeipa-rcrit-971-3-binary.patch
Type: text/x-patch
Size: 3998 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/freeipa-devel/attachments/20120328/6c425323/attachment.bin>