[Freeipa-devel] not ascii, not utf-8, what's a parser supposed to do?

John Dennis jdennis at redhat.com
Tue Jan 26 22:28:31 UTC 2010


I've run into a small problem with xgettext. By default xgettext expects 
all strings in an input file to be encoded in ascii. It will also allow 
you to override that by specifying the strings in the input file are utf-8.

In ipappython/ipautil.py line 296 is the following string:

SAFE_STRING_PATTERN = '(^(\000|\n|\r| |:|<)|[\000\n\r\200-\377]+|[ ]+$)'

In it's default ascii mode xgettext throws an error claiming the string 
is not ascii. In fact xgettext is correct, the string is not ascii. (You 
may be wondering why xgettext even cares since it's not marked as 
translatable, but xgettext fully parses the input before deciding what 
is marked as translatable, bottom line: all strings get parsed and decoded).

If I override the default ascii input by telling xgettext the input 
strings are encoded in utf-8 xgettext stops complaining, the string is 
properly skipped.

But ... the string isn't really utf-8 either and I'm not sure how 
comfortable I feel about telling xgettext every string in IPA is encoded 
in utf-8 (when it isn't) just to get around this failure, especially 
since the offending string isn't even utf-8. (However, maybe we should 
allow utf-8 as an input format since ascii is a subset of utf-8, we 
might want to use utf-8 in the future and we can just hold our noses 
with respect to the above regular expression).

Do we have a stake in the ground as to what our input strings are 
encoded in?

Can you think of another way to express the offending string such that 
it doesn't trigger the non-ascii error? The only thing I could think of 
and get to work was this:

SAFE_STRING_PATTERN='%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c' 
% \
(40,94,40,0,124,10,124,13,124,32,124,58,124,60,41,124,91,0,10,13,128,45,255,93,43,124,91,32,93,43,36,41)

Which is pretty unreadable, but with sufficient comments could be 
acceptable.


-- 
John Dennis <jdennis at redhat.com>

Looking to carve out IT costs?
www.redhat.com/carveoutcosts/




More information about the Freeipa-devel mailing list