[Freeipa-devel] [PATCH] 971 detect binary LDAP data

Thu Mar 29 15:43:32 UTC 2012

On 03/29/2012 12:20 AM, Rob Crittenden wrote:
> Jan Cholasta wrote:
>> On 29.2.2012 15:45, Rob Crittenden wrote:
>>> Jan Cholasta wrote:
>>>> On 28.2.2012 18:58, Rob Crittenden wrote:
>>>>> Jan Cholasta wrote:
>>>>>> On 28.2.2012 18:02, Petr Viktorin wrote:
>>>>>>> On 02/28/2012 04:45 PM, Rob Crittenden wrote:
>>>>>>>> Petr Viktorin wrote:
>>>>>>>>> On 02/28/2012 04:02 AM, Rob Crittenden wrote:
>>>>>>>>>> Petr Viktorin wrote:
>>>>>>>>>>> On 02/27/2012 05:10 PM, Rob Crittenden wrote:
>>>>>>>>>>>> Rob Crittenden wrote:
>>>>>>>>>>>>> Simo Sorce wrote:
>>>>>>>>>>>>>> On Mon, 2012-02-27 at 09:44 -0500, Rob Crittenden wrote:
>>>>>>>>>>>>>>> We are pretty trusting that the data coming out of LDAP
>>>>>>>>>>>>>>> matches
>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>> schema but it is possible to stuff non-printable characters
>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>> most
>>>>>>>>>>>>>>> attributes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've added a sanity checker to keep a value as a python str
>>>>>>>>>>>>>>> type
>>>>>>>>>>>>>>> (treated as binary internally). This will result in a base64
>>>>>>>>>>>>>>> encoded
>>>>>>>>>>>>>>> blob be returned to the client.
>>>>>>
>>>>>> I don't like the idea of having arbitrary binary data where unicode
>>>>>> strings are expected. It might cause some unexpected errors (I have a
>>>>>> feeling that --addattr and/or --delattr and possibly some plugins
>>>>>> might
>>>>>> not handle this very well). Wouldn't it be better to just throw away
>>>>>> the
>>>>>> value if it's invalid and warn the user?
>>>>>
>>>>> This isn't for user input, it is for data stored in LDAP. User's are
>>>>> going to have no way to provide binary data to us unless they use the
>>>>> API themselves in which case they have to follow our rules.
>>>>
>>>> Well my point was that --addattr and --delattr cause an LDAP search for
>>>> the given attribute and plugins might get the result of a LDAP
>>>> search in
>>>> their post_callback and I'm not sure if they can cope with binary data.
>>>
>>> It wouldn't be any different than if we had the value as a unicode.
>>
>> Let's see what happens if the mail attribute of a user contains invalid
>> UTF-8 (ff 62 30 72 6b 65 64):
>>
>> $ ipa user-find jdoe
>> --------------
>> 1 user matched
>> --------------
>> User login: jdoe
>> First name: John
>> Last name: Doe
>> Home directory: /home/jdoe
>> Login shell: /bin/sh
>> Email address: /2IwcmtlZA==
>> UID: 526
>> GID: 526
>> Account disabled: False
>> Password: False
>> Kerberos keys available: False
>> ----------------------------
>> Number of entries returned 1
>> ----------------------------
>>
>> $ ipa user-mod jdoe --addattr mail=jdoe at example.com
>> ipa: ERROR: an internal error has occurred
>>
>> The internal error is:
>> Traceback (most recent call last):
>> File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line
>> 302, in wsgi_execute
>> result = self.Command[name](*args, **options)
>> File "/usr/lib/python2.7/site-packages/ipalib/frontend.py", line 438, in
>> __call__
>> ret = self.run(*args, **options)
>> File "/usr/lib/python2.7/site-packages/ipalib/frontend.py", line 696, in
>> run
>> return self.execute(*args, **options)
>> File "/usr/lib/python2.7/site-packages/ipalib/plugins/baseldap.py", line
>> 1217, in execute
>> ldap, dn, entry_attrs, attrs_list, *keys, **options
>> File "/usr/lib/python2.7/site-packages/ipalib/plugins/user.py", line
>> 532, in pre_callback
>> entry_attrs['mail'] = self.obj._normalize_email(entry_attrs['mail'])
>> File "/usr/lib/python2.7/site-packages/ipalib/plugins/user.py", line
>> 338, in _normalize_email
>> norm_email.append(m + u'@' + config['ipadefaultemaildomain'][0])
>> UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0:
>> invalid start byte
>>
>> $ ipa user-mod jdoe --delattr mail=/2IwcmtlZA==
>> ipa: ERROR: mail does not contain '/2IwcmtlZA=='
>>
>> $ ipa user-mod jdoe --delattr mail=`echo 'ff 62 30 72 6b 65 64' | xxd -p
>> -r`
>> ipa: ERROR: UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in
>> position 5: invalid start byte
>> Traceback (most recent call last):
>> File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1242, in run
>> sys.exit(api.Backend.cli.run(argv))
>> File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1024, in run
>> kw = self.parse(cmd, argv)
>> File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1049, in
>> parse
>> return dict(self.parse_iter(cmd, kw))
>> File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1058, in
>> parse_iter
>> yield (key, self.Backend.textui.decode(value))
>> File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 136, in
>> decode
>> return value.decode(encoding)
>> File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
>> return codecs.utf_8_decode(input, errors, True)
>> UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 5:
>> invalid start byte
>> ipa: ERROR: an internal error has occurred
>>
>> I'm sure there is a lot more places in the code where things will break
>> when you feed them arbitrary data.
>>
>>>
>>> We treat the python type str as binary data. Anything that is a str gets
>>> based64 encoded before json or xml-rpc transmission.
>>>
>>> The type unicode is considered a "string" and goes in the clear.
>>>
>>> We determine what this type should be not from the data but from the
>>> schema. This is a big assumption. Hopefully this answer's Petr's point
>>> as well.
>>>
>>> We decided long ago that str means Binary and unicode means String. It
>>> is a bit clumsy perhaps python handles it well. It will be more clear
>>> when we switch to Python 3.0 and we'll have bytes and str instead as
>>> types.
>>
>> Well, this is all super-obvious and I'm not really sure why do you bring
>> it up here...
>>
>> My concern is that with your approach, you can no longer be sure that an
>> attribute value is a valid unicode string. This creates a
>> maintainability burden, as you *always* have to keep in mind that you
>> first have to check whether the value is valid or not before touching
>> it. You can easily forget to include the check in new code and there is
>> a lot of places in old code that need to be changed because of this (as
>> you can see in the mail example above).
>>
>> Let me quote John Dennis
>> (<http://www.redhat.com/archives/freeipa-devel/2012-January/msg00075.html>):
>>
>>
>>> I'm very interested in this work. I too have recognized that we have a
>>> very real structural problem with how we handle the types we use
>>> internally, especially when it corresponds to an external type and
>>> requires conversion. In fact we do a remarkably bad job in this area, we
>>> have conversions which are done ad hoc all over the place. There is no
>>> well established place for the conversions to occur and it's difficult
>>> to know at any point in the code what type to expect.
>>
>> Your patch adds yet another occurence of "it's difficult to know at any
>> point in the code what type to expect". IMO this is very bad and we
>> should not do this at all in new code.
>>
>> I'm not sure if just dropping bad values (as I have suggested before) is
>> the best idea, but IMHO anything is better than letting these bad values
>> into the bowels of IPA.
>>
>>>
>>>>> We are trusting that the data in LDAP matches its schema. This is just
>>>>> belt and suspenders verifying that it is the case.
>>>>
>>>> Sure, but I still think we should allow any valid unicode data to come
>>>> from LDAP, not just what is valid in XML-RPC.
>>>
>>> This won't affect data coming out of LDAP, only the way it is
>>> transmitted to the client.
>>
>> It will affect the data flowing through IPA, after we get it from LDAP,
>> before it is transmitted through XML-RPC.
>>
>> In other words, I think it is more correct to do this:
>>
>> LDAP -> check unicode validity -> (rest of IPA) -> check XML-RPC
>> validitity -> XML-RPC
>>
>> than this:
>>
>> LDAP -> check unicode validity -> check XML-RPC validity -> (rest of
>> IPA) -> XML-RPC
>>
>> (Let's keep things isolated please.)
>>
>>>
>>> rob
>>
>> Honza
>>
>
> I'm using a much narrower scope. I'm not trying to make it easy to
> manage non-printable characters, just not blow things up if they exist.
> Limiting to the XML-RPC supported set is for convenience, these are
> unprintable characters in any context. This is just a two-fer.
>
> Petr was right, I need to encode to unicode before doing this comparison
> in order to compare invalid unicode characters so I moved that first.
>
> I added a very basic unit test.
>
> If you're wondering where this data might come from, two ways are via AD
> sync and migration.
>
> Yes, the user will be left in a situation where they'll need to use
> --setattr or ldapmodify to manage the data in the field. The UI doesn't
> show the value at all but instead shows [object Object] (no errors in
> console, not sure where this is coming from). It is possible to
> highlight this and insert a new value though.

"[object Object]" is an output of standard object's toString() method.

so when the data are binary they are converted to base64 and JSON looks 
like this:

var foo = {
    "__base64__": "FOOBAR"
}

foo.toString() //you get "[object Object]"

UI in most cases expects that it will receive simple value (string, 
number, boolean) or array of simple values or nothing, not an object 
with base64 string in it.

So I should probably file an UI ticket to handle such cases, right?

>
> rob
>
>
> _______________________________________________
> Freeipa-devel mailing list
> Freeipa-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/freeipa-devel

-- 
Petr Vobornik