UTF8 settings (was: Can scp be used to update a directory?)

James Wilkinson fedora at westexe.demon.co.uk
Fri Mar 24 13:34:04 UTC 2006


Anne Wilson wrote:
> My i18n file contains
> 
> LANG="en_US.UTF-8"
> SYSFONT="latarcyrheb-sun16"
> SUPPORTED="en_US.UTF-8:en_US:en"
> 
> I presume the font line stays unchanged, and the top line seems 
> straightforward, but what does your SUPPORTED: line look like?

Odd. I don't have one on this system, and I haven't noticed any
problems...

A backup from an FC3 machine listed
SUPPORTED="en_GB.UTF-8:en_GB:en:en_US.UTF-8:en_US:en"
although I doubt both en references are strictly necessary.

I can't find any references as to where it is used...

I asked (about files that were apparently incorrectly named)
> So from where did you get those files? Were they generated on another
> computer?

Anne replied:
> Some on this computer, some on what is now the server box.  The email I 
> mentioned arrived in kmail, showing the same symptoms, a couple of days ago.
> 
> Here's a sample -
> 
> ../Mp3/marisa_monte/rose_and_charcoal/06_dan�_da_solid�.mp3
> 
> The title should read 
> 
> 06_dança_da_solidäo.mp3

That's actually a different symptom of the same problem. UTF8 takes two
bytes to store most common non-ASCII characters, whereas the ISO-8859
family always uses one byte.

What you first described was seeing the two UTF8 bytes in an ISO-8859
program, so each accented character shows as two ISO-8859 characters
(some of which will probably be "illegal", so you'll see spaces or
something similar there).

What you've just illustrated is an ISO-8859 name viewed in an UTF-8
environment, where two ISO-8859 characters are interpreted as one
illegal UTF-8 character.

My first reaction is to blame the generating program (what was it?) In
my experience, many MP3 programs, following Winamp's example, have gone
flat-out for skins and custome text-handling. Too many of them don't
support UTF8 in $LANG properly.

Alternatively, what did the server box use to run? How did you transfer
the files? Red Hat went to UTF-8 early, and many other distros took a
lot longer to upgrade. And transferring files might not get the
conversion right.

(You used to use Mandriva, didn't you? I'm not sure when they adopted
UTF-8...)

I wrote:
> As for the single e-mail -- I'd blame the other end, personally.

Anne said:
> Maybe.  Maybe he has the same problem as I do.

Um. Mail clients have no business not knowing which encoding they're
using. And if they know that, they've no business not putting it into
the headers of outgoing e-mail properly.

We've proved that your e-mail client can receive UTF-8. I suppose
there's still the chance that your correspondent used a weird encoding
that your client didn't understand. But you're not going to get the
"right" message anyway in those situations, except by blind luck.

James.

-- 
E-mail address: james | In the Royal Air Force a landing's OK,
@westexe.demon.co.uk  | If the pilot gets out and can still walk away.
                      | But in the Fleet Air Arm the outlook is grim,
                      | If your landings are duff and you've not learnt to
                      | swim.




More information about the fedora-list mailing list