[Freeipa-devel] [PATCH] 971 detect binary LDAP data
Jan Cholasta
jcholast at redhat.com
Thu Mar 1 14:48:08 UTC 2012
On 29.2.2012 15:45, Rob Crittenden wrote:
> Jan Cholasta wrote:
>> On 28.2.2012 18:58, Rob Crittenden wrote:
>>> Jan Cholasta wrote:
>>>> On 28.2.2012 18:02, Petr Viktorin wrote:
>>>>> On 02/28/2012 04:45 PM, Rob Crittenden wrote:
>>>>>> Petr Viktorin wrote:
>>>>>>> On 02/28/2012 04:02 AM, Rob Crittenden wrote:
>>>>>>>> Petr Viktorin wrote:
>>>>>>>>> On 02/27/2012 05:10 PM, Rob Crittenden wrote:
>>>>>>>>>> Rob Crittenden wrote:
>>>>>>>>>>> Simo Sorce wrote:
>>>>>>>>>>>> On Mon, 2012-02-27 at 09:44 -0500, Rob Crittenden wrote:
>>>>>>>>>>>>> We are pretty trusting that the data coming out of LDAP
>>>>>>>>>>>>> matches
>>>>>>>>>>>>> its
>>>>>>>>>>>>> schema but it is possible to stuff non-printable characters
>>>>>>>>>>>>> into
>>>>>>>>>>>>> most
>>>>>>>>>>>>> attributes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've added a sanity checker to keep a value as a python str
>>>>>>>>>>>>> type
>>>>>>>>>>>>> (treated as binary internally). This will result in a base64
>>>>>>>>>>>>> encoded
>>>>>>>>>>>>> blob be returned to the client.
>>>>
>>>> I don't like the idea of having arbitrary binary data where unicode
>>>> strings are expected. It might cause some unexpected errors (I have a
>>>> feeling that --addattr and/or --delattr and possibly some plugins might
>>>> not handle this very well). Wouldn't it be better to just throw away
>>>> the
>>>> value if it's invalid and warn the user?
>>>
>>> This isn't for user input, it is for data stored in LDAP. User's are
>>> going to have no way to provide binary data to us unless they use the
>>> API themselves in which case they have to follow our rules.
>>
>> Well my point was that --addattr and --delattr cause an LDAP search for
>> the given attribute and plugins might get the result of a LDAP search in
>> their post_callback and I'm not sure if they can cope with binary data.
>
> It wouldn't be any different than if we had the value as a unicode.
Let's see what happens if the mail attribute of a user contains invalid
UTF-8 (ff 62 30 72 6b 65 64):
$ ipa user-find jdoe
--------------
1 user matched
--------------
User login: jdoe
First name: John
Last name: Doe
Home directory: /home/jdoe
Login shell: /bin/sh
Email address: /2IwcmtlZA==
UID: 526
GID: 526
Account disabled: False
Password: False
Kerberos keys available: False
----------------------------
Number of entries returned 1
----------------------------
$ ipa user-mod jdoe --addattr mail=jdoe at example.com
ipa: ERROR: an internal error has occurred
The internal error is:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py", line
302, in wsgi_execute
result = self.Command[name](*args, **options)
File "/usr/lib/python2.7/site-packages/ipalib/frontend.py", line 438,
in __call__
ret = self.run(*args, **options)
File "/usr/lib/python2.7/site-packages/ipalib/frontend.py", line 696,
in run
return self.execute(*args, **options)
File "/usr/lib/python2.7/site-packages/ipalib/plugins/baseldap.py",
line 1217, in execute
ldap, dn, entry_attrs, attrs_list, *keys, **options
File "/usr/lib/python2.7/site-packages/ipalib/plugins/user.py", line
532, in pre_callback
entry_attrs['mail'] = self.obj._normalize_email(entry_attrs['mail'])
File "/usr/lib/python2.7/site-packages/ipalib/plugins/user.py", line
338, in _normalize_email
norm_email.append(m + u'@' + config['ipadefaultemaildomain'][0])
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0:
invalid start byte
$ ipa user-mod jdoe --delattr mail=/2IwcmtlZA==
ipa: ERROR: mail does not contain '/2IwcmtlZA=='
$ ipa user-mod jdoe --delattr mail=`echo 'ff 62 30 72 6b 65 64' | xxd -p -r`
ipa: ERROR: UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in
position 5: invalid start byte
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1242, in run
sys.exit(api.Backend.cli.run(argv))
File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1024, in run
kw = self.parse(cmd, argv)
File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1049, in
parse
return dict(self.parse_iter(cmd, kw))
File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 1058, in
parse_iter
yield (key, self.Backend.textui.decode(value))
File "/usr/lib/python2.7/site-packages/ipalib/cli.py", line 136, in
decode
return value.decode(encoding)
File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 5:
invalid start byte
ipa: ERROR: an internal error has occurred
I'm sure there is a lot more places in the code where things will break
when you feed them arbitrary data.
>
> We treat the python type str as binary data. Anything that is a str gets
> based64 encoded before json or xml-rpc transmission.
>
> The type unicode is considered a "string" and goes in the clear.
>
> We determine what this type should be not from the data but from the
> schema. This is a big assumption. Hopefully this answer's Petr's point
> as well.
>
> We decided long ago that str means Binary and unicode means String. It
> is a bit clumsy perhaps python handles it well. It will be more clear
> when we switch to Python 3.0 and we'll have bytes and str instead as types.
Well, this is all super-obvious and I'm not really sure why do you bring
it up here...
My concern is that with your approach, you can no longer be sure that an
attribute value is a valid unicode string. This creates a
maintainability burden, as you *always* have to keep in mind that you
first have to check whether the value is valid or not before touching
it. You can easily forget to include the check in new code and there is
a lot of places in old code that need to be changed because of this (as
you can see in the mail example above).
Let me quote John Dennis
(<http://www.redhat.com/archives/freeipa-devel/2012-January/msg00075.html>):
> I'm very interested in this work. I too have recognized that we have a
> very real structural problem with how we handle the types we use
> internally, especially when it corresponds to an external type and
> requires conversion. In fact we do a remarkably bad job in this area, we
> have conversions which are done ad hoc all over the place. There is no
> well established place for the conversions to occur and it's difficult
> to know at any point in the code what type to expect.
Your patch adds yet another occurence of "it's difficult to know at any
point in the code what type to expect". IMO this is very bad and we
should not do this at all in new code.
I'm not sure if just dropping bad values (as I have suggested before) is
the best idea, but IMHO anything is better than letting these bad values
into the bowels of IPA.
>
>>> We are trusting that the data in LDAP matches its schema. This is just
>>> belt and suspenders verifying that it is the case.
>>
>> Sure, but I still think we should allow any valid unicode data to come
>> from LDAP, not just what is valid in XML-RPC.
>
> This won't affect data coming out of LDAP, only the way it is
> transmitted to the client.
It will affect the data flowing through IPA, after we get it from LDAP,
before it is transmitted through XML-RPC.
In other words, I think it is more correct to do this:
LDAP -> check unicode validity -> (rest of IPA) -> check XML-RPC
validitity -> XML-RPC
than this:
LDAP -> check unicode validity -> check XML-RPC validity -> (rest of
IPA) -> XML-RPC
(Let's keep things isolated please.)
>
> rob
Honza
--
Jan Cholasta
More information about the Freeipa-devel
mailing list