How to set the default system (locale?) encoding

Tim ignored_mailbox at yahoo.com.au
Thu May 3 04:10:42 UTC 2007


On Wed, 2007-05-02 at 20:58 -0500, Steven P. Ulrick wrote:
> I have used the Sword Project's "diatheke" command to output the Bible
> into plain text files, divided by chapter and book.

That may be the problem.  Is it creating real UTF-8 text, as Fedora
usually defaults to?  What do you get if you do a "file" on those text
files.

e.g. [tim at serge ~]$ file example.text
example.text: UTF-8 Unicode text

Also, what's your locale set up to?

e.g. [tim at serge ~]$ locale
LANG=en_AU.UTF-8
LC_CTYPE="en_AU.UTF-8"
LC_NUMERIC="en_AU.UTF-8"
LC_TIME="en_AU.UTF-8"
LC_COLLATE="en_AU.UTF-8"
LC_MONETARY="en_AU.UTF-8"
LC_MESSAGES="en_AU.UTF-8"
LC_PAPER="en_AU.UTF-8"
LC_NAME="en_AU.UTF-8"
LC_ADDRESS="en_AU.UTF-8"
LC_TELEPHONE="en_AU.UTF-8"
LC_MEASUREMENT="en_AU.UTF-8"
LC_IDENTIFICATION="en_AU.UTF-8"
LC_ALL=

If they're working differently from each other, you're in for some
troubles, as you've discovered.

It generally is best if everything uses UTF-8, you have one encoding for
everything, instead of this being in ISO-8859-1 and that in ISO-8859-9,
because they had different characters that couldn't be done in the
other, and so on.  UTF-8 covers almost everything, in one scheme.

For Fedora, I've found the easiest way to set this was when logging into
an X session.  The logon screen has a "language" option that sets just
about all the parameters in one go.

> If (as an example) I open up "01-Genesis.txt" in KDE's KWrite, Genesis
> 4:22 looks like this (in the screenshots that follow, keep
> an eye on the name "Tubal–cain"):
> http://www.afolkey2.net/Projects/Genesis422-ss-001.jpg

That looks fine.  You've got an em or en dash between those words, not a
hyphen.  Being unfamilar with the terms in the quote, I don't know if
that is a hyphenated word, or two words that should be joined by a
*proper* dash.

> If I do "Insert | File" from within OpenOffice.org, I get the following:
> http://www.afolkey2.net/Projects/Genesis422-ss-002.jpg

That looks like a character encoding issue (e.g. you see that sort of
thing when importing an UTF-8 file, when the application thinks that the
encoding is something like ISO-8859-1).

How are you importing the file?  There's a selection list of different
file types you can import files as, in the import requester.  Importing
some UTF-8 text worked fine, for me, without picking anything (the
default worked fine).

> If I open OpenOffice.org and just open the same file referred to in all
> of these examples, it looks like this:
> http://www.afolkey2.net/Projects/Genesis422-ss-003.jpg

Same issue (character encoding), different font.

> Then, if I copy and paste the sample verse, book, entire Bible,
> whatever from KWrite (which displays all occurrences of this
> "hyphen-like" character correctly) into a new OpenOffice.org text
> document, it looks like this:
> http://www.afolkey2.net/Projects/Genesis422-ss-004.jpg

As it should...  Kwrite managed to determine the encoding, and cut and
paste between programs used the default/locale encoding scheme (probably
UTF-8), and things "just work".

> The verse also correctly displays in gedit.  BUT, if I display the same
> file using abiword, vi, emacs or less, it does not display correctly.

It sounds like your CLI encoding is not UTF-8.  Hence why text-based
tools aren't managing it, and many GUI tools are (they often get their
settings in another way).  OpenOffice.org, by default, reads the default
locale, and tries to work using the same scheme.  It can be custom
configured, like you've done with kwrite, but I think that's working
around the problem, rather than fixing it.

> In "Tools | Encoding" in KWrite, I have it set to "utf8"

That'd be why it could handle the file, it was presuming something
because you specifically told it to.  Others presume a different
default.

> So I guess the question is, "What SHOULD my default encoding be set to,
> and how and where do I set it so that it is respected by all
> applications?"

Usually UTF-8.  And, in general, you usually want it to be the same as
everything else (if it's UTF-8 or something else)

-- 
(This box runs FC6, my others run FC4 & FC5, in case that's
 important to the thread.)

Don't send private replies to my address, the mailbox is ignored.
I read messages from the public lists.





More information about the fedora-list mailing list