[Fedora-packaging] file-not-utf8 complaints

Toshio Kuratomi a.badger at gmail.com
Sun Jun 1 20:18:02 UTC 2008

Patrice Dumas wrote:
> On Sun, Jun 01, 2008 at 10:17:32AM -0700, Toshio Kuratomi wrote:
>> Patrice Dumas wrote:
>>> On Sat, May 31, 2008 at 04:09:25PM -0700, Toshio Kuratomi wrote:
>>>> However, the flipside of this is if a program has an xml config file  
>>>> that the user is expected to edit manually in a text editor and the   
>>>> program will adapt to multiple encodings (for instance, by using 
>>>> libxml2  to parse the file[1]_) having it exist in utf-8 is much 
>>>> better than  having it exist in SOME_EXOTIC_ENCODING.  In this case 
>>>> it's the program  
>>> I disagree. It is not an obvious choice and should be left to the
>>> maintainer. It depends on the user target of the software, for instance.
>> Please state your counter example.  I'm laying out the parameters by  
>> which we could relax the current rule.  If we don't lay out the  
>> boundaries correctly the replacement rule will end up still being too  
>> restrictive.
> I may be wrong, but it seems to me that there is no current rule? Except
> that rpmlint warning/errors should be handled if possible, but there is
> nothing about that in the guidelines (spec file and filename should be
> utf8, though).
My bad, I must have been recalling the debates over the filename's must 
be utf-8 guideline.  If there's no current guideline then I'm not sure 
we need a new one.

> Here is a wording that would seem right to me:
> Files that don't carry information about their encoding should be
> converted to UTF-8. It is typically useful for NEWS files with author
> names with acceented characters. There may be exceptions, for example a
> README.cn file written in chinese may be encoded in a popular chinese
> encoding like Big5.
I could go either way on this but lean towards this should be utf-8. 
ShiftJS, Big5, etc have benefits over UTF-8 and the people who use those 
are the consumers of this file.  OTOH, for Fedora to truly support the 
UTF-8 locale out of the box, these kinds of files (which don't specify 
an encoding and aren't used by the program) have to be UTF-8.  How can 
we ship with a UTF-8 locale by default knowing that the README.cn isn't 
readable by people who stick with our default?

> Files that carry over their encoding (xml, tex, info...) may also be 
> converted to UTF-8, but the decision is left to the package maintainer. 
> It may be especially relevant for files that are to be edited by the
> user, since it may be difficult to edit a file not in UTF-8, while UTF-8 
> should be handled by most editors automatically, as the default for 
> fedora is an UTF-8 locale.

This part seems quite reasonable as a recommendation.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/fedora-packaging/attachments/20080601/3bca1246/attachment.sig>

More information about the Fedora-packaging mailing list