Why is "LANG=en_US.UTF-8" the default in Fedora

Shahms King shahms at shahms.com
Fri May 21 18:28:51 UTC 2004


> > Why is the default US.UTF-8?

So characters that are not in ASCII can be displayed and you get
*correct* sorting, date and numeric formatting.

> So do many open source man pages. Frankly, US.UTF-8 bites goat rocks. It
> consistently messes up sorting, for example, since there is no way in theat
> locale to get sorting to be case sensitive.

If by "messes up sorting" you mean "sorts correctly" then yes,
en_US.UTF-8 messes up sorting.  By no stretch of the imagination is:
A
B
C
a
b
c

a "correct" sort, except numerically.  Additionally, stop saying that
the sorting is case insensitive.  It is *not*.  It most certainly is
case sensitive according to the defined locale.  If it were not case
sensitive the list: b,c,B,A,a,C would not have a determinate sort order:
A,a,B,b,C,c.  And the list a,A,b,B,c,C would not change at all.  I'm
sorry, but in English 'Z' does not come before 'a'.  The problems with
American English in the 'C' locale are minor.  The problems with
non-American English are slightly worse.  The problems with non-English
characters are significant. 

This is at least the second time I've seen someone claim that sorting in
these locales is not "case sensitive".  Stop making this assertion, it
is flat out wrong.  There is nothing at all correct about it.  Nothing.
So stop saying it.  You can argue about whether or not the sorting is
correct (it is), but you cannot claim it is case insensitive (it is
not).

> Frankly, the default locale should be "C" or "POSIX", both of which do
> sorting correctly and both of which are far more robust than a great deal of
> the unfortunately mis-handled Unicode currently in use, especially for
> documentation.

Wrong, both "C" and "POSIX" sort incorrectly (numerically rather than
lexicographically).  Seriously, go read a dictionary; if you find
'Zapatista' before 'aardvark' I'll eat my hat, words, shoes and anything
else you can think of.  The correct solution is to *fix the programs
that are broken*.  Hell, even a wrapper script that breaks^Wsets the
locale will work as a temporary workaround.  Given that ASCII is a
subset of UTF-8 (for the most part) the applications that are breaking
on UTF-8 are simply buggy.  The solution to these bugs is to fix the
apps, not break every single application that handles UTF-8 correctly
(which is most of them these days).

-- 
Shahms King <shahms at shahms.com>





More information about the fedora-test-list mailing list