[Pulp-list] i18n input

Nick Coghlan ncoghlan at redhat.com
Wed Oct 3 09:59:15 UTC 2012

On 10/03/2012 06:40 AM, Jason Connor wrote:
> Hi All,
> Lately we've been struggling with a rash of bugs related to i18n input in Pulp. Python 2's unicode support is only so-so and whenever we get non-ascii or non-utf-8 encoded strings, we tend to run into trouble (the most common is problematic encoding seems to be latin-1). Given that Python's str type is really just a byte array with some built in smarts, it isn't really possible to guess what the encoding might actually be.
> To address this issue, I propose that we make string encoding as utf-8 a hard requirement on the server. To enforce this, we'll try to decode all strings from utf-8 and any failures will get a 400 server response with some sort of standardized message: utf-8 encoded strings only (dummy), or something similar.


Boundary validation is the only way to ensure Unicode sanity in Python 2
(same goes for Python 3, it's just a lot harder to omit it
accidentally). You'll still need to figure out what to do with repos
that already contain non-ASCII entries with an unknown encoding though.


Nick Coghlan
Red Hat Infrastructure Engineering & Development, Brisbane

More information about the Pulp-list mailing list