[Freeipa-devel] python-ldap has unicode issues
John Dennis
jdennis at redhat.com
Fri Aug 17 22:40:19 UTC 2007
On Fri, 2007-08-17 at 15:21 -0700, Kevin McCarthy wrote:
> Apparently this is a well known (and debated) issue. See
> http://sourceforge.net/mailarchive/message.php?msg_id=71fe4e760707200552w648b6fc8v21770939525c14b9%40mail.gmail.com
>
> > Today passing unicode argument to ldap functions raise an exception,
> > then no accidents is possible :-) On the other side, with unicode
> > support, things could accidentally work as expected. But this is only
> > speculation about witch inconvenient is the worst.
>
> If I translate the strings using s = s.encode('utf-8'), it makes it into
> the ldap database just fine:
>
> http://pbnj.usersys.redhat.com:8080/usershow?uid=haoren
>
> The strings that come back from ldap are utf-8 encoded, and will "work"
> execpt that the length will be wrong. e.g.: 好 will report a length of
> 3 instead of one, until we encode that back to unicode strings using
> s = unicode(s, 'utf-8')
Caveat: I only *briefly* looked at the archive issue.
Perhaps I can be a help here. I recently had to dig into how python
handles i18n strings in gory detail to fix problems in setroubleshoot.
In the process I wrote up what I learned and can post it.
The issue really turns on what external libraries are expecting. If all
the libraries are UTF-8 you're golden.
In general you never want to use Python's unicode strings, instead
encode everything in UTF-8 and continue to use the standard 8-bit Python
'str' strings.
If you're using GNU gettext and are binding the function _() you need to
tell gettext to return UTF-8, not unicode.
I'm trying to wrap up for the day now so I'm being brief, but on Monday
I can work with you to solve these issues, dig up my documentation, etc.
HTH,
--
John Dennis <jdennis at redhat.com>
More information about the Freeipa-devel
mailing list