Where is the $LANG variable defined?
Björn Persson
listor1.rombobeorn at comhem.se
Tue Nov 9 01:43:31 UTC 2004
Matthew Miller wrote:
> On Sun, Nov 07, 2004 at 07:50:00PM +0100, Björn Persson wrote:
>
>>>And something does -- set the LANG variable right, and it'll work, right?
>>
>>If it's that easy, why doesn't SSH set LANG? Why should I have a lot of
>>trouble doing this manually every time? And how does that help with
>>filenames?
>
> Answers to your questions in order. :)
>
> 1. Because it has no idea what to set it to.
It doesn't know because it doesn't bother to look, but that's the wrong
answer. The answer is that it's *not* that easy. The encoding has to be
set for the terminal program where I run SSH, and to do it through LANG,
LANG must be set before the terminal program is started. So instead of
just typing "ssh otherbox" I have to set LANG, launch a new terminal
window and run SSH there.
> 2. You shouldn't have to go to a lot of trouble. And you shouldn't need to
> do it manually each time -- at the very least, you can script it.
So everyone and their dog should write a specialized script for every
combination of local and remote box? Fat chance! We need a solution that
will work out of the box so it can be packaged and distributed with
operating systems.
I *could* write a program that connects with SSH, looks up the remote
system's encoding, disconnects, opens a new terminal window with the
right encoding set, and runs SSH in that window, but it wouldn't work in
text mode, it wouldn't work with chained SSH sessions, and it still
wouldn't help with file transfers.
> 3. The filename encoding problem is kinda sticky. Devising a workable
> on-the-fly transcoding solution seems like a lot of work on the
> _symptoms_. Instead, let's work on getting everything to work well with
> UTF-8.
You don't seem to fully understand the extent of the problem. That's not
surprising as you're apparently a USian and seldom see the character
encoding problems I see daily. If you had been regularly forced to spell
your name "Mutthew" because "a" wasn't a valid character, you might have
a different view.
What you're suggesting is that everyone should use UTF-8 everywhere so
that there would only be one character encoding. That just isn't going
to happen. I'd love to go UTF-8 myself and get access to all the world's
written languages, but it's not feasible. That's not because of the
filenames. I could transcode all my filenames easily enough. The real
problem is the files' contents. Since there's only one big global locale
setting I have to convert everything or nothing. I've got heaps of text
files full of non-English letters and they're all encoded in Latin 1.
(Actually, many of them are probably in Windows 1252, but the extra
characters in that encoding don't seem to have gotten used very often,
so they can pass as Latin 1.) Some of them are plain text. They could
and would have to be transcoded. Others are XML or HTML. They could be
left as they are but should be transcoded so they could be opened in
text editors. If they are transcoded the embedded encoding
specifications would have to be updated. Still others are source code.
Transcoding those would constitute changes to the programs and could
require several other changes. Then there are the files that aren't text
and mustn't be transcoded by mistake. There's no reliable way of
recognizing the different kinds of files automatically, so I'd have to
go through them all manually and decide what to do with each one. No thanks!
Then there are the various people and computers I need to cooperate with
and share files with - coworkers, Sourceforge projects and the like.
These people share files with other people who in turn cooperate with
still other people. What's the chance of getting all these people to
switch character encodings at the same time?
While I'm typing this, Bittorrent is downloading Fedora 3 for me. I'm
going to do a fresh install on an unused partition. The first thing I'll
do after installing is to edit /etc/sysconfig/i18n to change from UTF-8
to Latin 1. Sure it would be nice if everyone would use the same
character encoding, but Unicode was created some 50 years too late and
now we have to live with the consequences. Myself I'm stuck with Latin 1.
Björn Persson
More information about the fedora-list
mailing list