Pseudo-locales for i18n testing by English speakers

Sean Flanigan sflaniga at redhat.com
Tue Oct 7 00:04:35 UTC 2008


Martin Langhoff wrote:
> 2008/10/2 Sean Flanigan <sflaniga at redhat.com>:
>> I have a simple Ant task which can generate pseudo-translations like the
>> one above from a gettext POT files,
> 
> I am after a few sets of "latin-lookalike" character tables I can use.
> Have you (or anyone) got pointers to good tables?

Well, I've made up a couple of simple ones (also attached as UTF-8):
ASCII:
"abcdefghijklmnopqrstuvwxyz"
BMP only:
"åЬçđéϝցⱨîﺩⱪŀოňøÞᕴяšŧմⱱשẋŷż"
BMP+SMP:
"åЬçđ𝖾ϝցⱨî𝚓ⱪŀოňøÞᕴяšŧմⱱשẋŷż"

You could also try googling for "LATIN SMALL LETTER {A,B,C,...} WITH",
which should turn up all sorts of modified latin characters, such as
LATIN SMALL LETTER V WITH RIGHT HOOK.

Another option is the Wikipedia Unicode pages
http://en.wikipedia.org/wiki/List_of_Unicode_characters
has several sections for extended latin scripts, and the Unicode mapping
tables down the bottom are handy if you want to go directly to a certain
Unicode range (eg to get away from the BMP).

> The simple example phrase you provided hit a bug in moodle (php
> webapp) straight away - I think a few webapps have trouble with that
> funny 'e' (U+1D5BE). Interestingly, it's also present in Jira
> (Java-based webapp). Might be an iconv issue.

I chose that 'e' specifically because it wasn't part of the BMP, but
apparently the mathematical alphanumeric symbols are a bit of a special
case - I'm not sure if systems are expected to provide font substitution
for them.

Zimbra (written in Java) had trouble with the 'e' too - it just removed
it entirely.  I think a lot of programs have trouble with characters
that don't fit into 16-bit Unicode.  My text editors and Thunderbird can
show the 'e' character, but the cursor handling is all wrong on those lines.


-- 
Sean Flanigan

Senior Software Engineer
Engineering - Internationalisation
Red Hat
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: latinesque_table_utf8.txt
URL: <http://listman.redhat.com/archives/fedora-devel-list/attachments/20081007/dcfac116/attachment.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 551 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/fedora-devel-list/attachments/20081007/dcfac116/attachment.sig>


More information about the fedora-devel-list mailing list