[Freeipa-devel] python-ldap has unicode issues

Fri Aug 17 22:40:19 UTC 2007

On Fri, 2007-08-17 at 15:21 -0700, Kevin McCarthy wrote:
> Apparently this is a well known (and debated) issue.  See
> http://sourceforge.net/mailarchive/message.php?msg_id=71fe4e760707200552w648b6fc8v21770939525c14b9%40mail.gmail.com
> 
> > Today passing unicode argument to ldap functions raise an exception,
> > then no accidents is possible :-) On the other side, with unicode
> > support, things could accidentally work as expected.  But this is only
> > speculation about witch inconvenient is the worst.
> 
> If I translate the strings using s = s.encode('utf-8'), it makes it into
> the ldap database just fine:
> 
> http://pbnj.usersys.redhat.com:8080/usershow?uid=haoren
> 
> The strings that come back from ldap are utf-8 encoded, and will "work"
> execpt that the length will be wrong.  e.g.: 好 will report a length of
> 3 instead of one, until we encode that back to unicode strings using
>   s = unicode(s, 'utf-8')

Caveat: I only *briefly* looked at the archive issue.

Perhaps I can be a help here. I recently had to dig into how python
handles i18n strings in gory detail to fix problems in setroubleshoot.
In the process I wrote up what I learned and can post it.

The issue really turns on what external libraries are expecting. If all
the libraries are UTF-8 you're golden.

In general you never want to use Python's unicode strings, instead
encode everything in UTF-8 and continue to use the standard 8-bit Python
'str' strings.

If you're using GNU gettext and are binding the function _() you need to
tell gettext to return UTF-8, not unicode.

I'm trying to wrap up for the day now so I'm being brief, but on Monday
I can work with you to solve these issues, dig up my documentation, etc.

HTH,
-- 
John Dennis <jdennis at redhat.com>