[Fedora-packaging] file-not-utf8 complaints
Hans de Goede
j.w.r.degoede at hhs.nl
Sat May 31 06:06:53 UTC 2008
Toshio Kuratomi wrote:
> Jason L Tibbitts III wrote:
>> Normally we fix up non-utf8 documentation and such with a quick call
>> to iconv. It seems that this is problematic for some; see
>> https://bugzilla.redhat.com/show_bug.cgi?id=226079
>>
>> Any comments on how much we actually care about this, especially in
>> the case that it might not actually be as easy as a call to iconv
>> (such as a changelog file with a pile of random encodings in it).
>>
> Well... The reason that all files must be UTF-8 is exactly the problem
> that the ChangeLog exhibits so I don't have a lot of sympathy there.
+1,
Although I fully agree with Daniel that blindly converting text-ish
files which actually specify an encoding in their headers is both wrong
and dangerous as that actually breaks stuff, normal text files, esp.
ones in %doc should be in UTF-8, so that when opened they display correctly.
Indeed the changelog is a perfect example of why all plain text files
must be UTF-8, had it always been UTF-8 the problems between part being
in west-european encoding and parts in east-european encoding would not
exist.
Also I think its worth noting that Fedora is not the only distro doing
this, Debian for example also tries to have all text files in the distro
in UTF-8.
I'll also put a comment to this extend in the review.
Regards,
Hans
The
> names and special characters in that file are already corrupted since
> there's no common encoding and none is recorded with the names.
> Dropping it from the package, as Daniel expressed is certainly an option
> as there's no requirement that ChangeLogs need to be in a package and it
> is not something that must be changed.
>
> Reencoding the xml files that specify an encoding isn't strictly
> necessary. We should probably ask upstream whether they are amenable to
> changing to utf-8. Since libxml2 deals with utf-8 internally and the
> upstream author made a nice writeup about why he made that choice,
> upstream might be amenable to that. If upstream is not amenable, we
> should consider changing the Packaging Guidelines to reflect that xml
> files which specify their encoding do not have to be re-encoded utf-8.
> (Although we then have to ask ourselves if we should be checking that
> the xml files actually use the encoding that they specify :-(
>
> NEWS and other files that are neither specifying an encoding nor mixed
> up in such a way that they are hopelessly corrupted WRT the original
> characters should definitely be converted to utf-8. If Daniel wants to
> hold open the Merge Review until that has gone in upstream, that is his
> perogative.
>
> The most chilling aspect of that review is that the maintainer does not
> seem to think that it's his responsibility to take issues with the
> upstream source to upstream. Since Daniel is upstream, I'm not certain
> I can see why he feels that someone else should be reporting it upstream
> before he deals with it.
>
> -Toshio
>
>
> ------------------------------------------------------------------------
>
> --
> Fedora-packaging mailing list
> Fedora-packaging at redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-packaging
More information about the Fedora-packaging
mailing list