[GuidelinesChange] UTF8 filenames

Toshio Kuratomi a.badger at gmail.com
Wed Apr 11 00:07:55 UTC 2007


On Wed, 2007-04-11 at 00:37 +0200, Nicolas Mailhot wrote:
> Le mardi 10 avril 2007 à 15:16 -0700, Toshio Kuratomi a écrit :
> > On Tue, 2007-04-10 at 19:17 +0200, Nicolas Mailhot wrote:
> > > Le mardi 10 avril 2007 à 10:08 -0700, Toshio Kuratomi a écrit :
> > > > A new Guideline has been added to the Encoding section:
> > > > 
> > > > '''
> > > > Non-ASCII Filenames
> > > > Filenames that contain non-ASCII characters must be encoded as UTF-8.
> > > > Since there's no way to note which encoding the filename is in, using
> > > > the same encoding for all filenames is the best way to ensure users can
> > > > read the filenames properly. If upstream ships filenames that are not
> > > > encoded in UTF-8 you can use a utility like convmv (from the convmv
> > > > package) to convert the filename in your %install section.
> > > > '''
> > > > 
> > > > This change was approved by the Fedora Packaging Committee and ratified 
> > > > by FESCO.
> > > 
> > > Shouldn't this be clarified as 7-bit ASCII ? Many people think ASCII ~
> > > 8-bit ISO-8859-1
> > > 
> > I think of ASCII != ISO-8859-1 but if that's not a common way of
> > thinking then I am more than willing to clarify.
> > 
> > I notice that we use the term US-ASCII in the outer section::
> >   http://fedoraproject.org/wiki/Packaging/Guidelines#PackageEncoding
> > 
> > Would changing non-ASCII to non-US-ASCII characters be suffcient?
> 
> Why don't you just say:
> 
> Every filename must be encoded as UTF-8. Filenames using characters
> outside the range 0000–007F as defined in page 2 of
> http://www.unicode.org/charts/PDF/U0000.pdf may need conversion.

Because I think more people will understand what ASCII means than
U0000.pdf?  I (don't like this either but) would rather leave it as is
and link to the wikipedia article for ASCII.

I like the simplicity of "Every filename must be encoded as UTF-8."  But
if we just leave it at that some people in "ASCII speaking countries"
are sure to ask, "So how do I convert ASCII to UTF-8?"  Maybe something
like:

'''
Encoding
Unless you need to use characters outside the
[:http://en.wikipedia.org/wiki/ASCII: ASCII repertoire] (the 128
characters that consist of letters, numbers and punctuation used in
English), you will not need to be concerned about the encoding of the
spec file. If you do need non-ASCII characters, save your spec files as
UTF-8.

Non-ASCII Filenames
Similarly, filenames that contain non-ASCII characters must be encoded
as UTF-8. Since there's no way to note which encoding the filename is
in, using the same encoding for all filenames is the best way to ensure
users can read the filenames properly. If upstream ships filenames that
are not encoded in UTF-8 you can use a utility like convmv (from the
convmv package) to convert the filename in your %install section. 
'''

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/fedora-maintainers/attachments/20070410/e59a4db1/attachment.sig>


More information about the Fedora-maintainers mailing list