UTF-8 and filenames

Callum Lerwick seg at haxxed.com
Wed Mar 14 22:03:13 UTC 2007


On Wed, 2007-03-14 at 00:01 -0700, Toshio Kuratomi wrote:
> The thing is we control the filenames to some extent.  If we decide that
> every filename in one of our packages has to be utf-8 then we'll never
> have a filename enter the database that isn't utf-8.  If we decide that
> it's okay for fedora packages to contain files whose names are not
> encoded in utf-8 then the tools will have to cope with it.

I'm seeing two issues here.

Unix systems have supported arbitrary bitstreams for filenames (well,
except for '/'...) since the beginning of time. Any low level tool that
falls over because the filename contains whitespace or high-ascii or
utf-8 or whatever is broken. Period.

Now interpreting the meaning of these bitstreams is a higher level
display problem. The great thing about having a "case sensitive"
filesystem is the kernel doesn't have to care about encodings. That
bloat is pushed to userspace. Its just a bunch of bits as far as the
kernel and low-level libc are concerned. (Except the kernel DOES have to
know about encodings in order to implement vfat, SMB, ntfs and whatnot,
because microsoft sux...)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/fedora-maintainers/attachments/20070314/0e20586f/attachment.sig>


More information about the Fedora-maintainers mailing list