Character encoding

Björn Persson bjorn at xn--rombobjrn-67a.se
Sun Sep 7 17:23:02 UTC 2008


Adil Drissi wrote:
> the result of echo $LANG is the following: en_CA.UTF-8

Then you don't need to change the locale.

> Before when i was using windows i was an editor that allows to save in
> utf-8. Now after modifying some files using vi, vim or kate, i am finding
> that some files are encoded in us-ascii, some others don't show the type of
> encoding, so i'm really lost.

Do they look wrong if you read them as UTF-8? If all the characters are right 
then there is no problem.

You should know that the program "file" can't really know the character 
encoding of a file. I suppose it reads the file and tries to guess the 
encoding.

> I can code a bash script that can convert from us-ascii to utf-8 for all
> the files of my website

Converting from ASCII to UTF-8 is very simple: Just declare that it is UTF-8. 
UTF-8 is designed so that all ASCII characters are encoded the same way in 
ASCII and UTF-8, so you can take any ASCII text and treat it as UTF-8, and if 
you have a UTF-8 text that doesn't use any non-ASCII characters then it is in 
practice ASCII.

Now, if a text is actually not 7-bit ASCII but one of the 8-bit encodings that 
are sometimes called "ASCII", then it needs to be transcoded to become UTF-8.

> but for the files that don't show the current 
> encoding i don't know what to do.

Open them and try different encodings. Try UTF-8 first, ISO 8859-1 second and 
ISO 8859-15 third. Then continue with other encodings. When you find one that 
makes the text look right, convert the file from that encoding to UTF-8.

Björn Persson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/fedora-list/attachments/20080907/2fb217ec/attachment-0001.sig>


More information about the fedora-list mailing list