[Bug 226079] Merge Review: libxml2

Sat May 31 11:40:12 UTC 2008

Please do not reply directly to this email. All additional
comments should be made in the comments box of this bug report.

Summary: Merge Review: libxml2

https://bugzilla.redhat.com/show_bug.cgi?id=226079

------- Additional Comments From j.w.r.degoede at hhs.nl  2008-05-31 07:40 EST -------
(In reply to comment #14)
> Okay, I will drop ChangeLog and News from the package, after reading this
> and the thread it looks apparently more important to the Fedora Packaging
> guys to follow blindly a "This has to be UTF-8" rule than understanding
> that people may have preferences set in their viewing and editing tools.

I'm sorry, my prefered viewing and editing tools (less and joe) don't have any
encoding settings, and even if I were to use tools which do, they default to the
system locale and system locale's are UTF-8 on Fedora (and many other modern
distros). ISO-8859-X is dead, not as dead as it should be but it is dead, this
is not a dogma, we want files to show up correctly using the default locale's,
thus they should be encoded usings the default locale settings.

I'm especially disappointed in you staying to your stance that UTF-8 is wrong,
given that I have offered todo the work for you.

> I would think it is more important to be able to read the ChangeLog missing
> some characters than not seeing anything.

I agree, but it would be even better to see the full changelog including those
chars!

> W.r.t. comment #13 , the ChangeLog is 19000+ lines how would you make sure
> you actually guessed the encoding of characters in a name correctly ?

I'm european and as such know many european names / dialects, atleast enough to
have a high chance of guessing correctly, then again in some cases I might guess
wrong, but then again to quote you: "I would think it is more important to be
able to read the ChangeLog missing some characters than not seeing anything."
Well my intend is to make the number of missing / wrong characters smaller,
preferably 0 but atleast much much smaller, I might get one are two names wrong,
bad luck atleast the other names will be correct.

> IMHO
> you just can't unless chasing each people name and making sure it matches.
> That's history, I prefer to keep it that way. If the history annoys Fedora
> I will just drop it from the package. 

This is not history, this is just plain wrong, as is the file will not look
correct in any encoding. This file is the perfect example of why UTF-8 has
become the default and why UTF-8 is a blessing (allowing both special western
and east european chars in one file, and even more exotic chars in the same file).

> The argument for it having to be UTF-8 just doesn't hold, you're picking
> one encoding while,

I'm not picking one encoding, Fedora has picked one encoding, and I believe they
have made a good choice

> the problem is that 1/ you can't garantee there is only
> one encoding in a text file

If there is more then one encoding in a plain text, then the file is broken, we
have a name for things like this, we call it a bug and usually we fix those were
we can (and have the resources).

> 2/ even if it's the case you have no metadata
> to indicate what encoding was used for authoring.

Actually, if its in English, but there are some names of people in there which
are French and those names are the only one containing non ascii codes, then I
can make a pretty good bet. After that I can check if the resulting name is a
valid French name.

You know I maintain close to 200 packages in Fedora, and as such I've even
written a script to locate any non ascii chars in text files for me (used when
rpmlint complains about them not being non utf-8. Did you know that in the
libxml2 changelog there are only 48 lines which contain non ascii code, after
replacing "St\?phane Bidoul" with "Stephane Bidoul" (I will fix this to the
proper UTF-8 name later) I've only 16 non-ascii containing lines left.

Really this is fixable just _fine_, the problem is you being unwilling to accept
a fix for it.

> if I were to write a README.cn
> I would use a BIG5 encoding in it, not UTF-8, because that's not what the
> concerned people would expect

Ofcourse there will always be exceptions to the rule, but were not talking about
Chinese files here, ChangeLog is an English file, with some non English person
names in there which use chars beyond ascii

. Get yourself a filesystem with metadata if you
> really want to solve the problem

Well thats not going to help with multiple encodings in one file, like you have
with ChangeLog now is it, or are we going to put en encoding per line in the
metadata, or maybe an encoding per char?

> Still w.r.t. #13 it's not so much that I think I have higher priorities to do,
> it's that I see things imposed like a dogma, that they look broken to me, and
> that I feel that trying to change the minds of people around about this is 
> a lost battle. It's getting a religious thing, and my belief is that changing
> files coming from upstream is fundamentally wrong.

So fixing bugs, which last time I checked requires changing files, (and then
sending the fix back upstream) is fundamentally wrong?

Thats an interesting stance, weird but interesting.

The only fundamentally broken thing I see is the current libxml2 ChangeLog file,
which currently will render wrong no matter which encoding you choose in your
viewer.

-- 
Configure bugmail: https://bugzilla.redhat.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.