RFC: Description text in packages

Tue Dec 16 20:21:42 UTC 2008

Le mardi 16 décembre 2008 à 22:02 +0200, Nikolay Vladimirov a écrit :
> 2008/12/16 Nicolas Mailhot <nicolas.mailhot at laposte.net>:
> > Le mardi 16 décembre 2008 à 20:38 +0200, Nikolay Vladimirov a écrit :
> >
> >> >> >  Currently, I'm opposed to having a Guideline that mandates UTF-8 over ASCII.
> >> +1
> >>
> >> It's not piles of quirks it's a simple parser.
> >
> > ROTFL. Sorry.
> 
> uh?
> It's a development discussion forum if you think I said something
> stupid or plain braindamadge. Explain why it's stupid. Don't ROTFL and
> stuff, please.

There are not enough hours in my free time to expand on all the problems
trying to work around 7-bit encoding limitations without going Unicode
causes.

There is abundant literature about it (both in paper and digital form).

It's all “simple” problems with “easy” solutions and when you pile up
all the resulting quirks you put the Egyptians pyramids to shame.

As a bonus this stuff is highly non-standard and incompatible by
contruction.

So when you claim this is a “simple” problem-space I'll ROTFL. Better
people than you and me hit a wall trying to solve it properly. In the
end after many failures they wrote the Unicode standard.

> >> Take wiki syntax for example. All wikis(i know) rely on simple
syntax
> >> that uses ASCII characters to display somewhat structured and
> >> formatted content.
> >
> > Wiki support UTF-8 just fine
> 
> yes. i ment the actual syntax like it uses "*" to mark unordered list
> but it displays "unicode bullet" ( i can't find it on my layout)
> Like it detects depth with ident and replaces '*' to look nicely.

You are free to write specs using wiki, tex, or sgml entities markup.
You are free to use a text editor, a word processor, an hexadecimal, or
whatever.

We expect you to convert this syntax to correct plain text in UTF-8
encoding before publication by Fedora. This is the common ground all our
package manipulating tools understand and there is no way we're going to
teach them some other markup just because you can't find some UTF-8
symbol in your layout.

If you can't write UTF-8 bullets, and can't be bothered to launch
something like gucharmap, use a sed of awk or whatever converter on your
spec files before pushing them Fedora-side, don't ask others to add code
to many tools to workaround your problems.

> >> And it's common to use "*" to mark unordered list.
> >
> > And it's common to use xml tags and all kinds of other stuff but
that's
> > irrelevant because spec syntax is plain text not wiki markup.
> >
> 
> I'm not talking about syntax in specs, 

The discussion is about the human text found in spec files.

> So  if i'm using some really outdated client to connect to my fedora
> host ( like telnet or serial or something)
> And I use command-line tools to browse packages and read summaries I
> will not see these UTF symbols.

So what? Your choice. The distro English encoding is UTF-8. If you want
things to work perfectly in the default locale use UTF-8 aware tools. We
can't be responsible for your choice to use buggy, legacy, outdated
tools. Any ASCII-oriented tool will mostly sort-of work, and anything
outside the mostly sort-of perimeter is the price you pay for using
limited tools.

If you want an ASCII-only mode (a real one, not the “I know it's UTF-8
but please use only ASCII so I don't hit bugs in my tools”) open an
English ASCII localization group.

> As I understand the problem is that PackageKit can't display stuff.

The problem is people wrote garbage in some spec files using
ASCII-limited tools, and the PK maintainer would like this garbage
corrected so normal users are not exposed to ASCII quirks.

> So
> let's make stuff more standard and leave PackageKit to do all the
> friendly displaying.

PK is not the only tool involved, the standard already exists, and is
feature-complete without needing workaround code.

> >> And also different languages have different types of quotes for
> >> example in Bulgarian the quoted text looks like this : ,, quote " .
> >
> > Converting text to the appropriate typography rules and symbols is
part
> > of the job of the translator (just like applying the correct grammar
and
> > syntax ordering rules). Translating has never been limited to
> > word-by-word conversion.
> >
> 
> Yes. But why do I have to do all the stuff when a machine can do it.

If a machine can do it get a machine to output proper UTF-8 encoded
specfile text your side.

> It is representation since the encoding will be changed

The encoding won't be changed

>  mainly because
> in the current summaries don't look so great 

Because they have mistakes. Their authors just have to correct their
mistakes. This is no different from grammar errors and you don't expect
PK to fix the text grammar errors don't you?

> and are pretty chaotic in
> style. Using UTF isn't going to make them more structured and
> standard. Maintainers will.

Which is the whole point.
This is not a technical problem.
The current UTF-8 technical infrastructure is sufficient and there is
not need to write magic text correcting code.
People just have to do some editorial work and proof and correct their
text themselves, making sure it displays fine using simple standard
UTF-8 text rendering engines.

-- 
Nicolas Mailhot
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Ceci est une partie de message num?riquement sign?e
URL: <http://listman.redhat.com/archives/fedora-devel-list/attachments/20081216/2b4b6336/attachment.sig>