Character Encoding in general

Alexandre Strube surak at casa.surak.eti.br
Thu Feb 19 01:31:20 UTC 2004


Em Qua, 2004-02-18 às 17:08, Thomas escreveu:

> stream. Still, i have not the slightest clue what  "UTF-8", 
> "ISO-8859-1", "CP12??" etc. means. The only thing i notice, is that 

1 byte=8 bits. this means that you have only 256 different combinations
for characters (a little less because of special control characters). Of
course you cannot put every language on this small universe.

In fact, some years ago, most of operating systems used to have only 7
bits for characters... which was enough only for the non-accented chars
and some symbols (!@#$%"and so on).

The remaining 128 characters were used to draw window corners, some
symbols and other stuff, including some accents. That's the 8-bit ascii
character table.

Then, someone had the nice idea of changing those chars bigger than the
first half on the ascii table, to support national characters. Every
different table was called CODEPAGE.

When it was obvious that this wouldn't work well, Unicode was released.
Its purpose is to create a single code for each character, using 16 bits
(65536 possible chars) for represent each one.

However, there's still necessary to represent this content on computers
with 8-bit bytes, so UTF-8 exists (to represent 16-bit chars on 8-bit).

Iso-8859-1 is the Latin windows characters. It was released before
unicode, which seems to be the standard now.

CP850 is the latin os/2 and dos characters. Its almost not used now.


Its not that well explained, but can give you an idea.

-- 
[]s

Alexandre Ganso 
500 FOUR vermelha - Diretor Steel Goose Moto Group





More information about the fedora-list mailing list