problems changing character encoding of files

Tim ignored_mailbox at yahoo.com.au
Thu Aug 20 15:44:50 UTC 2009


On Thu, 2009-08-20 at 19:16 +0400, Hiisi wrote:
> I have a folder full of html files. Each file has this tag in head 
> section:
> <meta http-equiv="Content-Type" content="text/html; charset=koi8-r">

The browser will then try that encoding, rather than the default.  But
only browsers will do that.  Other programs either expect the file to be
in the local encoding, or try other tricks to determine an encoding to
use.  Browsers play all sorts of peculiar guessing games.

The HTML "tidy" program can be used to convert HTML files, and fix up
some HTML messiness, at the same time.  I haven't checked if it supports
that encoding scheme, though, I don't see it listed.

NB:  The content type in the meta statement could be wrong.  People
frequently use wrong ones.  And browsers may save files in another
scheme (transcoding them), but without changing the statement.

> Here's output of file command:
> file -bi one_of_files
> text/html; charset=iso-8859-1

It's not always right at guessing the content, either.  In some cases
you can't.  Various clues are needed, that aren't always there.

> There's cyrillic symbols in files and I'm having troubles when trying to 
> view or edit files. Instead of cyrillic symbols I can only see series of 
> <EF><C4><C5><D6><C4><C1> or something like that..

What's your local encoding scheme, and what program are you editing
with?  Some editors allow you to switch between schemes.

> I tried
> $ iconv -f utf8 -t koi8 one_of_files

Convert from utf8 to koi8?  Isn't that the opposite of what you want?
The default encoding for Fedora is usually UTF-8.  Usually, you'd
convert foreign files to the local one, so you can edit them without
hassles.

> Is there a way to change encoding for every file in the folder?

Using the iconv command in a script would appear to be the way.





More information about the fedora-list mailing list