From fangqq at gmail.com Sat Dec 1 00:23:45 2007 From: fangqq at gmail.com (Qianqian Fang) Date: Fri, 30 Nov 2007 19:23:45 -0500 Subject: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings] In-Reply-To: <1196452990.11285.18.camel@behdad.behdad.org> References: <1196452990.11285.18.camel@behdad.behdad.org> Message-ID: I understand that some of the information might be missing, but even this is true, I think fontconfig should come up with a somewhat better default rendering than what it is now. The solution might just be putting Uming/Ukai or wqy fonts right before the Japanese "Mincho" or "Gothic" font series, simply because Japanese fonts do not have a large Unicode coverage than Uming/wqy (but Uming/wqy covers Japanese code points). For Japanese locales, we can match lang=ja and put Mincho/Gothic fonts in front of the Chinese fonts. To remind you my motivation for keep asking for a better default Chinese rendering, attached is a screenshot of browsing a Chinese web page under a fresh F8 installation (en-us of course), I doubt anyone would like to read this on a regular basis. On Nov 30, 2007 3:03 PM, Behdad Esfahbod wrote: > > This is because by default fontconfig doesn't come with a mind-reader. > You have to tell it which CJK language you want it to prefer. You can > do that by any of: > > - Setting $LANG to zh_CN for example. > > - Making sure your HTML pages have the lang="zh-cn" tag. No, > lang="zh" is not enough. > > - With recent Pango and a Pango-enabled firefox, you can set > $LANGUAGE=en_US,zh_CN, or set $PANGO_LANGUAGE=en_US,zh_CN. It does the > right thing then. > > -- > behdad > http://behdad.org/ > > _______________________________________________ > Fedora-fonts-list mailing list > Fedora-fonts-list at redhat.com > https://www.redhat.com/mailman/listinfo/fedora-fonts-list > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bad_rendering.png Type: image/png Size: 219438 bytes Desc: not available URL: From behdad at behdad.org Sat Dec 1 09:32:35 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Sat, 01 Dec 2007 04:32:35 -0500 Subject: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings] Message-ID: <1196501555.24797.8.camel@behdad.behdad.org> [Resending...] -- behdad http://behdad.org/ "Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin, 1759 -------------- next part -------------- An embedded message was scrubbed... From: Behdad Esfahbod Subject: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings Date: Fri, 30 Nov 2007 12:40:04 -0500 Size: 2579 URL: From fangqq at gmail.com Sat Dec 1 20:22:05 2007 From: fangqq at gmail.com (Qianqian Fang) Date: Sat, 01 Dec 2007 15:22:05 -0500 Subject: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings] In-Reply-To: References: <1196452990.11285.18.camel@behdad.behdad.org> Message-ID: <4751C26D.5020602@gmail.com> hi some new progress was made for the fontconfig file. The new version is attached. two changes: 1. for replacing wqy-bitmap-song by Chinese vector fonts for displaying >16px or <10px sizes, I changed match="font" to match="pattern", and now it works. I used 4 match blocks to handle serif>16px, serif<10px, sans>16px and sans<10px cases. 2. I added a match block to solve the high-priority of wqy fonts under zh locales and monospace alias. An additional test was inserted to the test list, to minimize the impact to non zh users. using this config file, the rendering of en-us and zh-cn/zh-tw are all looking fine (almost). the screenshots of the en-us desktop is http://wenq.org/gallery/albums/userpics/10002/F8-wqy-newconfig_enUS.png and that for zh-cn desktop is http://wenq.org/gallery/albums/userpics/10002/F8-wqy-newconfig_zhCN.png the "almost" bits come from the fact that when users specify "WenQuanYi Bitmap Song" as family rather than the genetic alias, larger/smaller fonts were still replaced by uming/ukai (see the right-bottom corner of the two test pages). However, this seems to be ok for me. just want to mention, two non-wqy bugs can be found in the screenshots 1. on the zhCN screenshot, in the sans-serif test block, the text of "... lasy dog 0123456789" you can see the numbers were rendered by wqy bitmap fonts, rather than the smooth Dejavu. This also happened for all Chinese webpages, when a number follows a Chinese character. I was told that this is a bug in Pango, Behdad, do you have some insight on this? 2. the date-time applet used reversed language on both screenshots, I believe this is Gnome's bug. please let me know what you think about this file. thank you Qianqian -------------- next part -------------- A non-text attachment was scrubbed... Name: 61-wqy-bitmapsong.conf Type: text/xml Size: 4237 bytes Desc: not available URL: From behdad at behdad.org Tue Dec 4 02:01:31 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Mon, 03 Dec 2007 21:01:31 -0500 Subject: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings] In-Reply-To: <4751C26D.5020602@gmail.com> References: <1196452990.11285.18.camel@behdad.behdad.org> <4751C26D.5020602@gmail.com> Message-ID: <1196733691.29088.39.camel@behdad.behdad.org> On Sat, 2007-12-01 at 15:22 -0500, Qianqian Fang wrote: > > 1. on the zhCN screenshot, in the sans-serif test block, the text of > "... lasy dog 0123456789" > you can see the numbers were rendered by wqy bitmap fonts, rather > than the smooth > Dejavu. This also happened for all Chinese webpages, when a number > follows a Chinese > character. I was told that this is a bug in Pango, Behdad, do you > have some insight on this? My insight is, well, you are getting what you asked for. This is where some people track this issue, but I've got used to ignoring it: http://bugzilla.gnome.org/show_bug.cgi?id=481210 > 2. the date-time applet used reversed language on both screenshots, I > believe this is Gnome's bug. -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From fangqq at gmail.com Tue Dec 4 02:58:16 2007 From: fangqq at gmail.com (Qianqian Fang) Date: Mon, 03 Dec 2007 21:58:16 -0500 Subject: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings] In-Reply-To: <1196733691.29088.39.camel@behdad.behdad.org> References: <1196452990.11285.18.camel@behdad.behdad.org> <4751C26D.5020602@gmail.com> <1196733691.29088.39.camel@behdad.behdad.org> Message-ID: <4754C248.9080908@gmail.com> hi Behdad you may well be right and the behavior of pango is not logically flawed. Perhaps this problem should be filed as a feature-request rather than a bug. From Chinese user perspective, Latin scripts and the Common scripts are both non-Hanzi or non-CJK characters, therefore, they are expecting a similar look-n-feel when rendering these characters. For other languages, I guess they more or less share the same view: numbers and basic Latin characters (or Basic ASCII, or keyboard characters) are the most frequently used, non-local-language dependent symbols. As long as their local language does not re-define these symbols, they are expected to be rendered with similar styles. I don't know the exact definition of PANGO_SCRIPT_COMMON and PANGO_SCRIPT_LATIN, but I think it is more natural to render the numbers using a Latin font rather than a Chinese font, as numbers and Latins are much closer. Huang Peng provided a patch to get the commonly expected behavior for this situation, if it can be implemented, or under the condition of Chinese locales, that would be a great help. I've seen this report many times on Mandriva, Debian, Redhat's bugzilla and almost all Chinese Linux forums. Back to the original topic of this thread, how do you think the fontconfig file in my last email? I have heard complains at some Chinese forums about font changes due to removing the original fontconfig file. Hope I can get something to commit to cease their complains. Qianqian Behdad Esfahbod wrote: > My insight is, well, you are getting what you asked for. This is where > some people track this issue, but I've got used to ignoring it: > > http://bugzilla.gnome.org/show_bug.cgi?id=481210 > > From behdad at behdad.org Tue Dec 4 07:35:12 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Tue, 04 Dec 2007 02:35:12 -0500 Subject: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings] In-Reply-To: <4754C248.9080908@gmail.com> References: <1196452990.11285.18.camel@behdad.behdad.org> <4751C26D.5020602@gmail.com> <1196733691.29088.39.camel@behdad.behdad.org> <4754C248.9080908@gmail.com> Message-ID: <1196753712.10445.36.camel@behdad.behdad.org> On Mon, 2007-12-03 at 21:58 -0500, Qianqian Fang wrote: > hi Behdad Hi, > you may well be right and the behavior of pango is not logically > flawed. Perhaps this problem should be filed as a feature-request > rather than a bug. I'm not stuck at semantic issues like feature-request vs bug. When I say it's technically infeasible, I mean it. > From Chinese user perspective, Latin scripts and the Common > scripts are both non-Hanzi or non-CJK characters, therefore, > they are expecting a similar look-n-feel when rendering these characters. > For other languages, I guess they more or less share the same > view: numbers and basic Latin characters (or Basic ASCII, or > keyboard characters) are the most frequently used, non-local-language > dependent symbols. As long as their local language does not > re-define these symbols, they are expected to be rendered with > similar styles. Let me repeat what's happening again: You are setting a Chinese locale, so when Pango see digits, it assumes that you want to use those digits with Chinese text, and you have provided a Chinese font that has glyphs for those digits, so it believes it's found the perfect font for them (your preferred font indeed) and uses it. If those digits are not desired, remove them from the font. > I don't know the exact definition of PANGO_SCRIPT_COMMON > and PANGO_SCRIPT_LATIN, but I think it is more natural to > render the numbers using a Latin font rather than a Chinese > font, as numbers and Latins are much closer. Then fix your font. > Huang Peng provided a patch to get the commonly expected > behavior for this situation, if it can be implemented, or > under the condition of Chinese locales, that would be a great > help. I've seen this report many times on Mandriva, Debian, > Redhat's bugzilla and almost all Chinese Linux forums. That's not going to happen. Pango's core has nothing language or script specific hardcoded in it except for the data that is computer-generated from the Unicode Character Database. In Unicode, ASCII digits are marked script Common. There is a very small part of the issue you are seeing that can be improved in Pango: http://bugzilla.gnome.org/show_bug.cgi?id=345386 but other than that, the behavior looks very reasonable to me. If you can think of an explanation of the behavior you want, without using "change character class of digits" and "special-case Chinese", I'm interested to hear that. There are a few ways to fix your problem: - Remove Latin and ASCII digits from your font. Why is it there if it's not desired? Nicolas suggested that fontconfig adds support for conditional blacklisting of individual blocks/glyphs in a font. That would help too, but it's not in fontconfig yet. - If you were doing your font in an OpenType container, you could split Latin and Chinese parts into two different fonts stuffed into a single container and having the same name. Then Pango will not see your Chinese font having ASCII digits and not use them. But at the end, it all comes down to real or hacky ways of removing those glyphs from the font. > Back to the original topic of this thread, how do you think the > fontconfig file in my last email? I have heard complains at > some Chinese forums about font changes due to removing > the original fontconfig file. Hope I can get something to > commit to cease their complains. No idea. > Qianqian -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From nicolas.mailhot at laposte.net Tue Dec 4 09:41:28 2007 From: nicolas.mailhot at laposte.net (Nicolas Mailhot) Date: Tue, 4 Dec 2007 10:41:28 +0100 (CET) Subject: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings] Message-ID: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> Le Mar 4 d?cembre 2007 08:35, Behdad Esfahbod a ?crit : > On Mon, 2007-12-03 at 21:58 -0500, Qianqian Fang wrote: Hi, I've let Behdad answer so far because he's the most qualified on the pango front, but I've wanted to reafirm some points for a few days, so I'll do it now: Your core problem as I wrote in one of my first mails is your font is providing bad glyphs for unicode blocks you don't really want to touch, and you're changing locales you shouldn't change so the easier and fastest solution for you has always beent to > - Remove Latin and ASCII digits from your font. Why is it there if > it's not desired? You have the chance to package a free/open-libre font, this is something that couldn't be done for most fonts but you can do it so don't hesitate to do it. > Nicolas suggested that fontconfig adds support for > conditional blacklisting of individual blocks/glyphs in a font. That > would help too, but it's not in fontconfig yet. Unfortunately many fonts are not so open and users still depend on them. So some sort of fontconfig blacklisting support is needed to support those fonts and users. From these exchanges, it seems chinese users are most affected by this problem. Since you have contacts in the chinese fonts community do consider reviving the patches posted on the fontconfig list in the past or writing others. Have chinese users indicate on the fontconfig list their support for them. It's not a short-term fix, but it's the right long-term fix, and if you don't push it this year you'll hit the same problem again and again till someone does this work. Last time the problem was discussed on fontconfig lists almost no one stepped in to write he needed this change. So fontconfig developpers decided it was a lot of work with no real need, and passed. The moral of this story is: your problems won't be fixed if you only focus on workarounds (as you're doing now) and let others with no core interest at stake drive changes. I know that culturally chinese people tend to avoid open disagreement, but if you need fontconfig to change silently hoping for fontconfig maintainers to realise this won't work. Similarly, if you need good Chinese rendering in non-chinese locales, chinifying en_US is not the solution. We've not heard from Japanese users yet but I'm sure they would strongly object to chinese-oriented defaults. That means you need to push for apps that do not do it yet to pass language info for properly tagged text to pango (like firefox does) and push for some sort of input language notification system. You can of course pass and hope others will do it but in the meantime you'll have to accept any workaround that affects users in other locales won't be accepted in the distro. And since getting proper localised input working is the only way to get your stuff working without side-effects for those other users, that means chinese users won't have optimal defaults in the meantime. >> Back to the original topic of this thread, how do you think the >> fontconfig file in my last email? The version posted on http://www.redhat.com/archives/fedora-fonts-list/2007-November/msg00088.html looks mostly fine, except I'm not sure the DejaVu LGC Sans Mono in monospace is needed and you rely on a high priority (61) to stomp on other CJK fonts (and probably others). IMHO this needs to be approved by Jens and the language teams affected. For the version on http://www.redhat.com/archives/fedora-fonts-list/2007-December/msg00002.html I'm not sure what the selectfont is there for. And likewise you have all sorts of stuff in monospace that assumes specific latin defaults out of your control. Will probably work most of the time, but removing the latin glyphs in your fonts would solve this in a more robust way. Regards, -- Nicolas Mailhot From fangqq at gmail.com Tue Dec 4 15:17:13 2007 From: fangqq at gmail.com (Qianqian Fang) Date: Tue, 04 Dec 2007 10:17:13 -0500 Subject: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings] In-Reply-To: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> Message-ID: <47556F79.9020503@gmail.com> the selectfont block originated from Debian-based distributions, where by default the bitmap fonts are disabled. This block only enables this font without turning on the global switch for all bitmap fonts. However, in Fedora, bitmap fonts are allowed by default, so, this block can be safely removed. Removing the Latin part might be a solution, but it works by sacrificing the integrity of the font and accommodating the insufficiency of fontconfig (I've never seen any Chinese font without Latin glyphs). In the long run, I don't think this will help either. To my understanding, the purpose of fontconfig is to provide the mechanism for font selection in non-invasively to the font pool, therefore, substitution and combining fonts based on the preferences of particular language SHOULD and COULD be done at this level. Another reason is that there is ~1/4 of the people who likes to use the bitmap Latin in wqy-bitmap-fonts as their default desktop, I can show you dozens of links to prove this if you can find someone who can read Chinese. I've tested the later file (http://www.redhat.com/archives/fedora-fonts-list/2007-December/msg00002.html ) under various locales, it works almost perfectly and I did not see the side effect of it. I am wondering if Jens would like to test it and let me know how you think about this file? Qianqian Nicolas Mailhot wrote: > The version posted on > http://www.redhat.com/archives/fedora-fonts-list/2007-November/msg00088.html > > looks mostly fine, except I'm not sure the DejaVu LGC Sans Mono in > monospace is needed and you rely on a high priority (61) to stomp on > other CJK fonts (and probably others). IMHO this needs to be approved > by Jens and the language teams affected. > > For the version on > http://www.redhat.com/archives/fedora-fonts-list/2007-December/msg00002.html > > I'm not sure what the selectfont is there for. And likewise you have > all sorts of stuff in monospace that assumes specific latin defaults > out of your control. Will probably work most of the time, but removing > the latin glyphs in your fonts would solve this in a more robust way. > > Regards, > > From fangqq at gmail.com Tue Dec 4 15:56:46 2007 From: fangqq at gmail.com (Qianqian Fang) Date: Tue, 04 Dec 2007 10:56:46 -0500 Subject: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings] In-Reply-To: <1196753712.10445.36.camel@behdad.behdad.org> References: <1196452990.11285.18.camel@behdad.behdad.org> <4751C26D.5020602@gmail.com> <1196733691.29088.39.camel@behdad.behdad.org> <4754C248.9080908@gmail.com> <1196753712.10445.36.camel@behdad.behdad.org> Message-ID: <475578BE.5030008@gmail.com> hi I respect your philosophy of structuring the style propagations based on the context and script natures. I think it is indeed an elegant solution to use a COMMON charset to represent the language-independent symbols and render them based on the context. IMHO, the confusion comes from the fact that "language-neutrality" and "local-language dependent" are not distinguished for the COMMON scripts. In another word, the charset of COMMON is a mixture of the characters that are essentially not tied to any specific language (such as digits), and those are re-defined by local languages (such as some punctuations, geometric shapes U2500-U25FF). For the former case, I think they should not be influenced by local language preferences, rather, using system fall-back setup (likely Latin-preferred) should be the best solution; for the later case, using local font preference is the best, as in your current COMMON charset handling. In short, I think the current COMMON set should be further refined into a NEUTRAL and a LOCAL_DEPENDENT char sets, and use system fall-back configuation for NEUTRAL set, and use local-language preferences for the LOCAL_DEPENDENT set. Specifically, for digits, they are language neutral and should be rendered by system fall-back settings rather than a local language settings. Qianqian Behdad Esfahbod wrote: > On Mon, 2007-12-03 at 21:58 -0500, Qianqian Fang wrote: > >> hi Behdad >> > > Hi, > > >> you may well be right and the behavior of pango is not logically >> flawed. Perhaps this problem should be filed as a feature-request >> rather than a bug. >> > > I'm not stuck at semantic issues like feature-request vs bug. When I > say it's technically infeasible, I mean it. > > > >> From Chinese user perspective, Latin scripts and the Common >> scripts are both non-Hanzi or non-CJK characters, therefore, >> they are expecting a similar look-n-feel when rendering these characters. >> For other languages, I guess they more or less share the same >> view: numbers and basic Latin characters (or Basic ASCII, or >> keyboard characters) are the most frequently used, non-local-language >> dependent symbols. As long as their local language does not >> re-define these symbols, they are expected to be rendered with >> similar styles. >> > > Let me repeat what's happening again: You are setting a Chinese locale, > so when Pango see digits, it assumes that you want to use those digits > with Chinese text, and you have provided a Chinese font that has glyphs > for those digits, so it believes it's found the perfect font for them > (your preferred font indeed) and uses it. If those digits are not > desired, remove them from the font. > > > >> I don't know the exact definition of PANGO_SCRIPT_COMMON >> and PANGO_SCRIPT_LATIN, but I think it is more natural to >> render the numbers using a Latin font rather than a Chinese >> font, as numbers and Latins are much closer. >> > > Then fix your font. > > > >> Huang Peng provided a patch to get the commonly expected >> behavior for this situation, if it can be implemented, or >> under the condition of Chinese locales, that would be a great >> help. I've seen this report many times on Mandriva, Debian, >> Redhat's bugzilla and almost all Chinese Linux forums. >> > > That's not going to happen. Pango's core has nothing language or script > specific hardcoded in it except for the data that is computer-generated > from the Unicode Character Database. In Unicode, ASCII digits are > marked script Common. There is a very small part of the issue you are > seeing that can be improved in Pango: > > http://bugzilla.gnome.org/show_bug.cgi?id=345386 > > but other than that, the behavior looks very reasonable to me. If you > can think of an explanation of the behavior you want, without using > "change character class of digits" and "special-case Chinese", I'm > interested to hear that. > > There are a few ways to fix your problem: > > - Remove Latin and ASCII digits from your font. Why is it there if > it's not desired? Nicolas suggested that fontconfig adds support for > conditional blacklisting of individual blocks/glyphs in a font. That > would help too, but it's not in fontconfig yet. > > - If you were doing your font in an OpenType container, you could > split Latin and Chinese parts into two different fonts stuffed into a > single container and having the same name. Then Pango will not see your > Chinese font having ASCII digits and not use them. > > But at the end, it all comes down to real or hacky ways of removing > those glyphs from the font. > > > >> Back to the original topic of this thread, how do you think the >> fontconfig file in my last email? I have heard complains at >> some Chinese forums about font changes due to removing >> the original fontconfig file. Hope I can get something to >> commit to cease their complains. >> > > No idea. > > > >> Qianqian >> > > From fangqq at gmail.com Tue Dec 4 16:40:20 2007 From: fangqq at gmail.com (Qianqian Fang) Date: Tue, 04 Dec 2007 11:40:20 -0500 Subject: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings] In-Reply-To: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> Message-ID: <475582F4.9070608@gmail.com> Nicolas Mailhot wrote: > Unfortunately many fonts are not so open and users still depend on > them. So some sort of fontconfig blacklisting support is needed to > support those fonts and users. From these exchanges, it seems chinese > users are most affected by this problem. > > Since you have contacts in the chinese fonts community do consider > reviving the patches posted on the fontconfig list in the past or > writing others. Have chinese users indicate on the fontconfig list > their support for them. It's not a short-term fix, but it's the right > long-term fix, and if you don't push it this year you'll hit the same > problem again and again till someone does this work. > > Last time the problem was discussed on fontconfig lists almost no one > stepped in to write he needed this change. So fontconfig developpers > decided it was a lot of work with no real need, and passed. > hi Nicolas I agree with you for the long-term solution of the problem. Here I just want to describe my observation to the Chinese users and my opinions on work-around. Unfortunately the Chinese user community is quite weak in communicating with the upstreams, majorly due to language reasons. More than half of the users do not like to use English to discuss their problems, the vast majority of the feedback and problem-solving were done at various Chinese Linux forums, BBS (bulletin board system) and instant messaging. Even those who are able to describe their problem clearly in English, only a small fraction went through all the culture training and practicing and become a contributor. The bad lucks of propagating patches back to upstream is also another reason that discourages Chinese to get involved. Chinese is one of the most complicated scripts and is always challenging to get what people expect without altering the default Latin handling, therefore, the upstream developers are very cautious about any change related to Chinese (or CJK). This also negatively impacts the situation. As a result, Chinese users HAD to find out work-arounds to meet their day-to-day needs. You may be supprised that almost all Chinese linux forums have a board called "Font Beautification", it sounds ridiculous but this is true. People used to spend days or weeks trying to fix their Chinese font settings for all applications. That is also my motivation to create the Wen Quan Yi project, just trying to save people's time and make Linux easier to use by Chinese. I can do my best to help pushing the fontconfig scheme that you mentioned, but I am not supprized if that still not implemented after years. But there are immediate needs to use Linux in a Chinese-friendly way and a good work-around can really build up the expanding user community and likely developer group, and that could make the life easier in the future. That's my rationale to push a reasonable fontconfig file for my font. Qianqian From nicolas.mailhot at laposte.net Tue Dec 4 19:49:50 2007 From: nicolas.mailhot at laposte.net (Nicolas Mailhot) Date: Tue, 04 Dec 2007 20:49:50 +0100 Subject: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings] In-Reply-To: <475582F4.9070608@gmail.com> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> Message-ID: <1196797790.13817.41.camel@rousalka.dyndns.org> Le mardi 04 d?cembre 2007 ? 11:40 -0500, Qianqian Fang a ?crit : > Unfortunately the Chinese user community is quite weak in communicating with > the upstreams, majorly due to language reasons. More than half > of the users do not like to use English to discuss their problems, the > vast majority of the feedback and problem-solving were done at various > Chinese > Linux forums, BBS (bulletin board system) and instant messaging. > Even those who are able to describe their problem clearly in English, > only a small fraction went through all the culture training > and practicing and become a contributor. This small fraction needs to organise itself, identify problems in Chinese font/text support, possible fixes, and relay them to upstream projects. For example every year freedekstop.org organises a text summit where pretty much every project that counts in FLOSS text rendering is represented: http://unifont.org/TextLayout2007/ It would be extremely helpful if the Chinese FLOSS user community sent someone to next year's summit to list the main problems affecting Chinese users, and what the Chinese community feels needs to be done in projects like fontconfig or pango to fix them. The rest of the year having clearly identified Chinese relays people can ask questions to (like "if I do this will it break Chinese apps") may probably help too. > The bad lucks of propagating patches back to upstream is also another > reason that discourages Chinese to get involved. Chinese is one > of the most complicated scripts and is always challenging to > get what people expect without altering the default Latin handling, > therefore, the upstream developers are very cautious about any change > related to Chinese (or CJK). CJK is difficult sure but do not underestimate the part lack of communication plays. Any maintainer will be ultra-cautious about making CJK changes when he knows that if he makes a mistake users are likely not to report back but suffer silently for years while cursing his name. > This also negatively impacts the situation. As a counter example you may have noticed there's been a lot of Greek-related activity on the list lately. It's not because Greek is easier or more interesting that other languages, but because the Greek community managed to organise itself. As a result they're getting good support from every distribution, Fedora included. > As a result, Chinese users HAD to find out work-arounds to meet their > day-to-day needs. You may be supprised that almost all Chinese linux > forums have a board called "Font Beautification", it sounds ridiculous but > this is true. People used to spend days or weeks trying to fix their > Chinese font settings for all applications. I'm not surprised at all this is typical workaround culture. > That is also my motivation > to create the Wen Quan Yi project, just trying to save people's time > and make Linux easier to use by Chinese. > > I can do my best to help pushing the fontconfig scheme that you mentioned, > but I am not supprized if that still not implemented after years. I'll be honest even if someone actively pushes fontconfig changes it may take a year for them to be integrated and yet more time for the changes to percolate in distribution. Getting fontconfig to change is not easy. However if you don't try I'm pretty sure nothing will have changed in 5 years. And you probably need fixes at other levels too. Just like the Wen Quan Yi project is part of the solution, but not the whole solution, fixing fontconfig will probably not be sufficient. For example even if fontconfig selects the perfect Chinese font when told to render Chinese, apps still need to detect they are rendering Chinese, which is not possible basing yourself only on unicode points, or the session locale (though for this particular problem you are better of than Latin languages since you only share codepoints with Japanese) > But there are immediate needs to use Linux in a Chinese-friendly > way and a good work-around can really build up the expanding > user community and likely developer group, and that could > make the life easier in the future. That's my rationale to > push a reasonable fontconfig file for my font. The work-around limits as you've found out is they get removed as soon as someone else complains of them. No one really wants to choose between Latin, Chinese or Japanese users at Fedora, so if two user communities conflict the one stepping on the other loses. By selecting a fontconfig priority of 61 you pretty much removed everyone relying on fonts with a less than 61 priority from the picture. That leaves people relying on fonts with a more-than-61 priority to complain. I suspect some of them, most likely Japanese users, can still be negatively affected by your changes but I'm no Japanese speaker so that's up to Jens to confirm (or infirm). And even if Jens greenlights there is still the possibility of later complains causing to remove your changes. Lastly for Fedora ? 9 we'll probably use DejaVu full not DejaVu LGC as default, so you may want to adapt your fontconfig file accordingly in Fedora-devel (you'll note DejaVu full as default got blocked for several releases due to the same kinds of conflicts you're encountering, and is only pushed now we're more confident in its non LGC parts. We try to be fair to everyone) Regards, -- Nicolas Mailhot -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: Ceci est une partie de message num?riquement sign?e URL: From fangqq at gmail.com Tue Dec 4 21:42:57 2007 From: fangqq at gmail.com (Qianqian Fang) Date: Tue, 04 Dec 2007 16:42:57 -0500 Subject: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings] In-Reply-To: <1196797790.13817.41.camel@rousalka.dyndns.org> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1196797790.13817.41.camel@rousalka.dyndns.org> Message-ID: <4755C9E1.5000608@gmail.com> Nicolas Mailhot wrote: > This small fraction needs to organise itself, identify problems in > Chinese font/text support, possible fixes, and relay them to upstream > projects. For example every year freedekstop.org organises a text summit > where pretty much every project that counts in FLOSS text rendering is > represented: > > http://unifont.org/TextLayout2007/ > > It would be extremely helpful if the Chinese FLOSS user community sent > someone to next year's summit to list the main problems affecting > Chinese users, and what the Chinese community feels needs to be done in > projects like fontconfig or pango to fix them. > > The rest of the year having clearly identified Chinese relays people can > ask questions to (like "if I do this will it break Chinese apps") may > probably help too. > I fully agree with you that a representing group is needed to facilitate the communications and speak for Chinese users on all text layout issues. I will make contact with the related people that I know, including Arne Gojge, the maintainer of the Uni-fonts project, some Redhat developers in Beijing and the Debian/Ubuntu Chinese group. I am not sure if my energy allows me to change anything other than taking care of my project, but I will make sure that your suggestions are passed around among those who are interested. > By selecting a fontconfig priority of 61 you pretty much removed > everyone relying on fonts with a less than 61 priority from the picture. > That leaves people relying on fonts with a more-than-61 priority to > complain. I suspect some of them, most likely Japanese users, can still > be negatively affected by your changes but I'm no Japanese speaker so > that's up to Jens to confirm (or infirm). And even if Jens greenlights > there is still the possibility of later complains causing to remove your > changes. > I am not sure if you noticed it or not, it has a language matching block before strong binding to DejaVu Mono: zh it also matches WenQuanYi Bitmap Song in family, IMO, this strong binding will only happen when user is under zh locales and has WQY font installed. So, I do not think it will mess up Japanese fonts as it will not match the lang tag. I tested it under ja locale, and everything seems to be normal (the mono font was not influenced), the screenshot is attached. hi Jens, I am not sure if my previous reply CCed you or not, but I want to know your opinion and test results on this new font config file (attached). thank you! > Lastly for Fedora ? 9 we'll probably use DejaVu full not DejaVu LGC as > default, so you may want to adapt your fontconfig file accordingly in > Fedora-devel (you'll note DejaVu full as default got blocked for several > releases due to the same kinds of conflicts you're encountering, and is > only pushed now we're more confident in its non LGC parts. We try to be > fair to everyone) > thank you for reminding me this, I will do the according adjustment if the config file get approved. Again, I really appreciate your careful thoughts on these issues. All the requests are quite reasonable. I will tailor my config file as best as I can to avoid future complications. > Regards, > > -------------- next part -------------- A non-text attachment was scrubbed... Name: wqy-newconfig-ja-locale.jpg Type: image/jpeg Size: 190190 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 61-wqy-bitmapsong.conf Type: text/xml Size: 4085 bytes Desc: not available URL: From behdad at behdad.org Fri Dec 7 07:41:01 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Fri, 07 Dec 2007 02:41:01 -0500 Subject: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings] In-Reply-To: <475582F4.9070608@gmail.com> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> Message-ID: <1197013261.27642.63.camel@behdad.behdad.org> On Tue, 2007-12-04 at 11:40 -0500, Qianqian Fang wrote: > hi Nicolas > > I agree with you for the long-term solution of the problem. Here I just > want to describe my observation to the Chinese users and my opinions > on work-around. > > Unfortunately the Chinese user community is quite weak in communicating with > the upstreams, majorly due to language reasons. More than half > of the users do not like to use English to discuss their problems, the > vast majority of the feedback and problem-solving were done at various > Chinese > Linux forums, BBS (bulletin board system) and instant messaging. > Even those who are able to describe their problem clearly in English, > only a small fraction went through all the culture training > and practicing and become a contributor. > > The bad lucks of propagating patches back to upstream is also another > reason that discourages Chinese to get involved. Chinese is one > of the most complicated scripts and is always challenging to > get what people expect without altering the default Latin handling, > therefore, the upstream developers are very cautious about any change > related to Chinese (or CJK). This also negatively impacts the situation. Hi Qianqian, [/me tries to write a motivational mail] It's easy to assume that one's problems are harder than others'. In this case, Chinese for example is a far easier script to support than Middle-Eastern scripts and definitely far easier than Indic scripts. Or in Iran, my native country, less than half of Iranians know enough English to be able to communicate at all, let alone preferring it... When I started working on Persian support in software back in 1999, it was a disaster. IE5 had just came out and had support for Unicode, but had a serious bug with the letter Persian Yeh that made it almost unusable for Persian. The community started using Arabic Yeh instead, and many individuals and companies produced fonts that had the shape of Persian Yeh in their Arabic Yeh glyph position. That's not the only problem that needed to be worked around. In the mean time, some of us started the FarsiWeb Project to systematically work on properly fixing Persian support in software. We soon got attracted to Free Software as there was not much we could do about proprietary ones other than reporting the bug (that particular IE bug took more than 4 years to fix...). Persian support in Free Software was even worse. Both KDE and GNOME had just added support for Arabic, but no Persian-specific feature was working. And there were no suitable fonts. No keyboard layout either. No translations whatsoever. Lots and lots of bugs in right-to-left UIs. The list goes on and on... While trying to learn the culture of upstream in FarsiWeb, we learned about similar projects in other countries that shared a bunch of those problems with us, namely, Arabeyes from all over the Arab countries and Ivrix from Israel. We worked on a lot of projects and patches together, with the main goal of *fixing upstream*. To make this mail short, fast forward a few years later and I now maintain Pango and HarfBuzz, comaintain cairo, hack on Gtk+, Fontconfig, and Mozilla/Firefox, and the Linux desktop has the best Persian support among all modern operating systems. We've come a long way, and there's still a lot left to go... Sorry if it was too personal and history, thought that may resonance with your feelings. Regards, -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From fangqq at gmail.com Mon Dec 10 23:01:09 2007 From: fangqq at gmail.com (Qianqian Fang) Date: Mon, 10 Dec 2007 18:01:09 -0500 Subject: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings] In-Reply-To: <1197013261.27642.63.camel@behdad.behdad.org> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> Message-ID: hi Behdad thank you sharing your experience and path for bettering local language support on FLOSS OSs. I do appreciate the diversities and complexities for text layout problems for mostly existing languages. And also, nothing would have happened if there was no one who endured the pains and devoted his efforts for improving the situation. As I said in the first email, I am new to package maintaining and communication to upstream developers. I might have underestimated the problem and fired at the wrong directions. If that is the case, then please forgive me. Go back to the digit font change issue as we discussed earlier, I spent some time in the past few days, trying to get myself a more clear picture on this. I dug out some bug reports from various bugzillas (Mozilla, Redbat, Gnome) and gathered a list of similar reports (see the bottom of the email). These reports were filed from simplified and traditional Chinese users and Japanese users (I believed Korean experienced the same problem). So, one thing that can be said from this list is that the "contextual font selection" does seem to be bothering CJK users in text formatting. I understand that "contextual shaping" is one of the techniques for rendering complex scripts. I am not sure how tight is the connection between "contextual shaping" and the "contextual format propagation", but one thing that I think may put some light to the complains of the CJK users is that Chinese (maybe Japanese as well) scripts are not contextual sensitive. Chinese characters are relatively independent and self-consistent in shapes (while, this statement is not true for Chinese calligraphy, where strokes may connect between characters depending on layout direction, but the current OSs and font technologies are not ready to handle this IMO). The only complexities may come from the fact that Hanzi for printing are mostly equal-width, and the punctuations among the Hanzi are expected to match the width of the surrounding Hanzi. As the full-width punctuations being encoded separately by Unicode, together with the contextual punctuation support of the input-methods, this seems to be handled very well. So, in short, for Chinese text layout, users are generally not expected to see contextual-based changes, either encoding/glyph or font faces (this may not include some extreme cases). Now go back to pango, from what I read from the bug reports, pango uses PANGO_SCRIPT_COMMON to represent language-independent symbols. I have no complain about that. It is a good classification based on the semantics of the symbols. What I, and most CJK users, are not satisfied with is the contextual-sensitivity of those common scripts when for mating text under cjk locales. I know that you have advocated to stick with the "face" meaning of SCRIPT_COMMON, which is supposedly to be rendered by local languages. But IMO, the face meaning is misleading here. From a Chinese user perspective, the difference between the SCRIPT_COMMON to Latin is negligible, compared with its difference to CJK characters. Therefore, using CJK fonts to render SCRIPT_COMMON is quite odd. Using Latin fonts for COMMON is most preferred; even specifying no face (i.e. using system fall-back) is better than assigning Chinese fonts for these scripts for that most Chinese fonts have low-quality Latin/common glyphs, even the commercial ones. As you see from the bug lists, this problem has existed for many years, and I am pretty sure that it will come back again and again, as long as the expected rendering is not achieved. If the current pango formatting logic is not sufficient to handle the CJK preferences as said above, I think to refine the logic to take it into consideration is better than stick with a fixed but incomplete logic. please let me know your thoughts and reasoning on whether this is feasible or not, if yes, where to get start. thank you for paying attention to this issue. Qianqian =============================================================== Bug 321113 - Wrong glyph subsituation algorithm for digital characters and punctuations http://bugzilla.gnome.org/show_bug.cgi?id=321113 Bug 345072 - changes font when typing different scripts on the same line http://bugzilla.gnome.org/show_bug.cgi?id=345072 Bug 345386 - Language and direction propagation in and between PangoLayouts http://bugzilla.gnome.org/show_bug.cgi?id=345386 (opened by yourself) https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679 Bug 481210 - [All lang] [firefox] - Face of the number is changing when enter number + Char, in any Locale http://bugzilla.gnome.org/show_bug.cgi?id=481210 Bug 481188 - ascii text space too narrow for Chinese encodings http://bugzilla.gnome.org/show_bug.cgi?id=481188 Bugzilla Bug 129541: changes font when typing different scripts on the same line https://bugzilla.redhat.com/show_bug.cgi?id=129541 Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango https://bugzilla.redhat.com/show_bug.cgi?id=131218 Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox give bad eol rendering and cursor placement https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens Petersen) https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link) Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is changing when enter number + Char, in any Locale https://bugzilla.redhat.com/show_bug.cgi?id=228804 Bugzilla Bug 221361: [pango] ascii text space and punctuation is narrow for CJK https://bugzilla.redhat.com/show_bug.cgi?id=221361 Bug 379125 - chinese punctuations after english letters are wrongly displayed https://bugzilla.mozilla.org/show_bug.cgi?id=379125 https://bugzilla.mozilla.org/attachment.cgi?id=263185 =============================================================== On Dec 7, 2007 2:41 AM, Behdad Esfahbod wrote: > Hi Qianqian, > > [/me tries to write a motivational mail] > > It's easy to assume that one's problems are harder than others'. In > this case, Chinese for example is a far easier script to support than > Middle-Eastern scripts and definitely far easier than Indic scripts. Or > in Iran, my native country, less than half of Iranians know enough > English to be able to communicate at all, let alone preferring it... > > When I started working on Persian support in software back in 1999, it > was a disaster. IE5 had just came out and had support for Unicode, but > had a serious bug with the letter Persian Yeh that made it almost > unusable for Persian. The community started using Arabic Yeh instead, > and many individuals and companies produced fonts that had the shape of > Persian Yeh in their Arabic Yeh glyph position. That's not the only > problem that needed to be worked around. > > In the mean time, some of us started the FarsiWeb Project to > systematically work on properly fixing Persian support in software. We > soon got attracted to Free Software as there was not much we could do > about proprietary ones other than reporting the bug (that particular IE > bug took more than 4 years to fix...). Persian support in Free > Software was even worse. Both KDE and GNOME had just added support for > Arabic, but no Persian-specific feature was working. And there were no > suitable fonts. No keyboard layout either. No translations whatsoever. > Lots and lots of bugs in right-to-left UIs. The list goes on and on... > > While trying to learn the culture of upstream in FarsiWeb, we learned > about similar projects in other countries that shared a bunch of those > problems with us, namely, Arabeyes from all over the Arab countries and > Ivrix from Israel. We worked on a lot of projects and patches together, > with the main goal of *fixing upstream*. To make this mail short, fast > forward a few years later and I now maintain Pango and HarfBuzz, > comaintain cairo, hack on Gtk+, Fontconfig, and Mozilla/Firefox, and the > Linux desktop has the best Persian support among all modern operating > systems. We've come a long way, and there's still a lot left to go... > > > Sorry if it was too personal and history, thought that may resonance > with your feelings. > > Regards, > > -- > behdad > http://behdad.org/ > > ...very few phenomena can pull someone out of Deep Hack Mode, with two > noted exceptions: being struck by lightning, or worse, your *computer* > being struck by lightning. -- Matt Welsh > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fangqq at gmail.com Wed Dec 12 19:44:28 2007 From: fangqq at gmail.com (Qianqian Fang) Date: Wed, 12 Dec 2007 14:44:28 -0500 Subject: Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]]] Message-ID: hi Nicolas I got more feedbacks on the config file testing from Jens, it seems the new config file works fine (the difference between the ja/zh monospace fonts is because ja users use Gothic as default monospace, not the Dejavu Mono). Do you think if it is ok for me to commit it to F8 as well? thanks Qianqian ---------- Forwarded message ---------- From: Jens Petersen Date: Dec 12, 2007 2:33 AM Subject: [Fwd: Re: [Fwd: Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]]] To: Qianqian Fang Hi, Here is a mail from Caius who tried to test your config a bit. I didn't have time yet to review his comments. Does it help at all? Jens Hi Jens, I tested zh_TW.UTF-8 and zh_CN.UTF-8, the monospace fonts are displayed properly. For ja_JP.UTF-8, the monospace fonts are also displayed properly. Between ja and zh, the monospace fonts used seems are different. ja is using a narrower width monospace than zh ones (they all monospaced). Please kindly point out if the info is not what you are inquiring. Best Regards, Caius. Jens Petersen ????????: > Caius, > > Could you take a look at this please, test the fontconfig file > and follow up on the list with your findings. > > Thanks, > > Jens > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.mailhot at laposte.net Wed Dec 12 19:58:25 2007 From: nicolas.mailhot at laposte.net (Nicolas Mailhot) Date: Wed, 12 Dec 2007 20:58:25 +0100 Subject: Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]]] In-Reply-To: References: Message-ID: <1197489506.23262.8.camel@rousalka.dyndns.org> Le mercredi 12 d?cembre 2007 ? 14:44 -0500, Qianqian Fang a ?crit : > hi Nicolas Hi, > I got more feedbacks on the config file testing from Jens, it seems > the new config file > works fine > Do you think if it is ok for me to commit it to F8 as well? If you pushed the original problem file in F8, you certainly owe a fixed one to F8 users. Me, as long as no one complains, I'm happy. Regards, -- Nicolas Mailhot -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: Ceci est une partie de message num?riquement sign?e URL: From behdad at behdad.org Wed Dec 12 21:25:18 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Wed, 12 Dec 2007 16:25:18 -0500 Subject: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> Message-ID: <1197494718.1444.174.camel@behdad.behdad.org> Hi Qianqian, [CC'ing to gtk-i18n-list, so hopefully this is the last time I have to repeat this.] On Mon, 2007-12-10 at 18:01 -0500, Qianqian Fang wrote: > > Go back to the digit font change issue as we discussed earlier, I > spent some time in the past few days, trying to get myself a more > clear > picture on this. I dug out some bug reports from various bugzillas > (Mozilla, Redbat, Gnome) and gathered a list of similar reports (see > the bottom of the email). These reports were filed from simplified and > traditional Chinese users and Japanese users (I believed Korean > experienced the same problem). So, one thing that can be said from > this list is that the "contextual font selection" does seem to be > bothering CJK users in text formatting. Yes, you have identified the problem very accurately. > I understand that "contextual shaping" is one of the techniques for > rendering complex scripts. I am not sure how tight is the connection > between "contextual shaping" and the "contextual format propagation", > but one thing that I think may put some light to the complains of the > CJK users is that Chinese (maybe Japanese as well) scripts are not > contextual sensitive. Chinese characters are relatively independent > and self-consistent in shapes (while, this statement is not true for > Chinese calligraphy, where strokes may connect between characters > depending on layout direction, but the current OSs and font > technologies are not ready to handle this IMO). The only complexities > may come from the fact that Hanzi for printing are mostly equal-width, > and the punctuations among the Hanzi are expected to match the width > of the surrounding Hanzi. As the full-width punctuations being encoded > separately by Unicode, together with the contextual punctuation > support of the input-methods, this seems to be handled very well. So, > in short, for Chinese text layout, users are generally not expected to > see contextual-based changes, either encoding/glyph or font faces > (this may not include some extreme cases). And Pango supports those all perfectly fine. Even vertical writing using the correct substituted punctuation glyphs. See: http://www.pango.org/ScriptGallery The main font issue though, is that Chinese (Simplified, Traditional), Korean, and Japanese share some Unicode code points, but they require slightly different renderings. Now if you don't tell Pango which version is preferred, how can it know which font to choose? It explicitly doesn't prefer any one over the others to avoid cultural problems. The symptoms of this problem are "multiple fonts used in the same line". Solution is: Either run under a CJK locale, or give hints to Pango about your preferred CJK locale using the env var PANGO_LANGUAGE. Note that theoretically Pango can do text analysis to come up with a best guess, but doing that would then introduce another bug with symptoms "changes font when typing a few characters on the same line". > Now go back to pango, from what I read from the bug reports, pango > uses PANGO_SCRIPT_COMMON to represent language-independent symbols. I > have no complain about that. It is a good classification based on the > semantics of the symbols. Good. Let me also note that there's no way to change that. It's hardcoded in the Unicode standard. > What I, and most CJK users, are not satisfied with is the > contextual-sensitivity of those common scripts when for mating text > under cjk locales. I know that you have advocated to stick with the > "face" meaning of SCRIPT_COMMON, which is supposedly to be rendered by > local languages. But IMO, the face meaning is misleading here. From a > Chinese user perspective, the difference between the SCRIPT_COMMON to > Latin is negligible, Lemme correct you here, "From a Chinese user perspective, the ASCII digits are considered Latin". There's sure a lot more than ASCII digits to SCRIPT_COMMON. Helps to be precise. > compared with its difference to CJK characters. Therefore, using CJK > fonts to render SCRIPT_COMMON is quite odd. Using Latin fonts for > COMMON is most preferred; even specifying no face ( i.e. using system > fall-back) is better than assigning Chinese fonts for these scripts > for that most Chinese fonts have low-quality Latin/common glyphs, even > the commercial ones. And this problem has a name: "crappy glyphs and multiple scripts in a font". Tell me about it... I already pointed out a few solutions to it previously: - Rip the crap out and everyone will feel better. - Use TrueType containers (even for bitmap-only fonts) and put each script's glyphs into its own face, with all faces having the same name and put into the same TrueType Collection file. - Finish patch for fontconfig to allow configuration to disable certain Unicode codepoints per font. The write such configuration for the crappy glyphs. Pick whichever you prefer and just do it. Another symptom, "digits change font after typing character" is in fact a very cool Pango feature, just badmouthed by the above problem. Fix the problem. > As you see from the bug lists, this problem has existed for many > years, and I am pretty sure that it will come back again and again, as > long as the expected rendering is not achieved. If the current pango > formatting logic is not sufficient to handle the CJK preferences as > said above, I think to refine the logic to take it into consideration > is better than stick with a fixed but incomplete logic. I consider patches improving Pango's font selection algorithm, but none that I've seen so far had been an improvement (from my point of view). If it has words like CJK or "special case", I'm most probably not interested. Of the bugs you listed, only the one I opened myself is valid IMO. The rest is just left open because no matter how many times I close them, they will be reopened... Oh well. > please let me know your thoughts and reasoning on whether this is > feasible or not, if yes, where to get start. Does the above make sense? I understand that it's easier to apply a two line patch to Pango instead of doing what of the things I listed above, but that just doesn't fit in the design, and it introduces other problems you don't see right now. > thank you for paying attention to this issue. > > Qianqian Regards, behdad > =============================================================== > Bug 321113 - Wrong glyph subsituation algorithm for digital characters > and punctuations > http://bugzilla.gnome.org/show_bug.cgi?id=321113 > > > Bug 345072 - changes font when typing different scripts on the same > line > http://bugzilla.gnome.org/show_bug.cgi?id=345072 > > > Bug 345386 - Language and direction propagation in and between > PangoLayouts > http://bugzilla.gnome.org/show_bug.cgi?id=345386 (opened by yourself) > https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679 > > > Bug 481210 - [All lang] [firefox] - Face of the number is changing > when enter number + Char, in any Locale > http://bugzilla.gnome.org/show_bug.cgi?id=481210 > > > Bug 481188 - ascii text space too narrow for Chinese encodings > http://bugzilla.gnome.org/show_bug.cgi?id=481188 > > > Bugzilla Bug 129541: changes font when typing different scripts on the > same line > https://bugzilla.redhat.com/show_bug.cgi?id=129541 > > > Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango > https://bugzilla.redhat.com/show_bug.cgi?id=131218 > > > Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox > give bad eol rendering and cursor placement > https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens > Petersen) > > > https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link) > > > Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is > changing when enter number + Char, in any Locale > https://bugzilla.redhat.com/show_bug.cgi?id=228804 > > > Bugzilla Bug 221361: [pango] ascii text space and punctuation is > narrow for CJK > https://bugzilla.redhat.com/show_bug.cgi?id=221361 > > > Bug 379125 - chinese punctuations after english letters are wrongly > displayed > https://bugzilla.mozilla.org/show_bug.cgi?id=379125 > https://bugzilla.mozilla.org/attachment.cgi?id=263185 > =============================================================== -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From fangqq at gmail.com Wed Dec 12 21:47:55 2007 From: fangqq at gmail.com (Qianqian Fang) Date: Wed, 12 Dec 2007 16:47:55 -0500 Subject: Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]]] In-Reply-To: <1197489506.23262.8.camel@rousalka.dyndns.org> References: <1197489506.23262.8.camel@rousalka.dyndns.org> Message-ID: <4760570B.3040509@gmail.com> no, I was talking about the new one, 61-wqy-bitmapsong.conf, as I did for devel. Nicolas Mailhot wrote: > If you pushed the original problem file in F8, you certainly owe a fixed > one to F8 users. Me, as long as no one complains, I'm happy. > > Regards, > > From fangqq at gmail.com Thu Dec 13 03:56:53 2007 From: fangqq at gmail.com (Qianqian Fang) Date: Wed, 12 Dec 2007 22:56:53 -0500 Subject: [gtk-i18n-list] On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <20071213122347.38bebb52.mpsuzuki@hiroshima-u.ac.jp> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <20071213122347.38bebb52.mpsuzuki@hiroshima-u.ac.jp> Message-ID: <4760AD85.5060507@gmail.com> hi For traditional Chinese used in Taiwan, people generally put commas and periods (full periods) near the center lines of the Chinese characters. But for simplified Chinese in mainland China, these marks are placed within the lower quadrant from the bottom of the glyph, similar to Latins. Funny that in traditional Chinese literatures, no punctuations were systematically used until the last 100 years, as introduced from western world :) as to the screenshots, I guess those are related to the settings of the font (arphic/uming in this case), and comma does looks to be a little bit too low and overlaps with the next character in vertical mode. mpsuzuki at hiroshima-u.ac.jp wrote: > > http://www.pango.org/ScriptGallery?action=AttachFile&do=get&target=Vertical.png > http://www.pango.org/ScriptGallery?action=AttachFile&do=get&target=VerticalSimple.png > > In the vertical texts, the 3rd character (punctuation > after "?") seems (for me) to be located at too-low > position, as if they were vertically-centerlined glyph > based on horizontal-writing mode. Qianqian, for Chinese > users' eyes, they seem to be correctly positioned? > > Regards, > mpsuzuki > > From behdad at behdad.org Thu Dec 13 07:49:26 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Thu, 13 Dec 2007 02:49:26 -0500 Subject: [gtk-i18n-list] On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <20071213122347.38bebb52.mpsuzuki@hiroshima-u.ac.jp> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <20071213122347.38bebb52.mpsuzuki@hiroshima-u.ac.jp> Message-ID: <1197532166.1444.266.camel@behdad.behdad.org> On Thu, 2007-12-13 at 12:23 +0900, mpsuzuki at hiroshima-u.ac.jp wrote: > > > http://www.pango.org/ScriptGallery?action=AttachFile&do=get&target=Vertical.png > http://www.pango.org/ScriptGallery?action=AttachFile&do=get&target=VerticalSimple.png > > In the vertical texts, the 3rd character (punctuation > after "?") seems (for me) to be located at too-low > position, as if they were vertically-centerlined glyph > based on horizontal-writing mode. Qianqian, for Chinese > users' eyes, they seem to be correctly positioned? Most probably the font doesn't have the vertical variants. If it has, Pango will use it, as you can see in the brackets in the last line where the brackets unlike other characters are actually rotated: http://www.pango.org/ScriptGallery?action=AttachFile&do=get&target=Vertical.png > Regards, > mpsuzuki -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From fangqq at gmail.com Thu Dec 13 17:13:45 2007 From: fangqq at gmail.com (Qianqian Fang) Date: Thu, 13 Dec 2007 12:13:45 -0500 Subject: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <1197494718.1444.174.camel@behdad.behdad.org> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> Message-ID: <47616849.7090905@gmail.com> hi Behdad I would have agreed with you if you clearly tell me why this change SHOULD be done in the fonts, or in the font selection, not in the layout engine. Your previous replies, either to the bug reports or to my email, simply refused to make this change by saying this is "technically impossible", but you do not tell me based on what model that you made the statement. If you can give me a diagram or document to illustrate that this is not the business of layout engine, I would not insist to continue this discussion. Secondly, you said that "contextual font selection" is a "cool" feature, I am wondering what languages are beneficial from this feature? (I believe there are, but just want to know). As I said in the previous email, this creates more troubles for CJK languages than benefits.Particularly this ruins the text alignment in monospace environment (see attachment). I doubt anyone see it would say "cool", rather, they would feel annoyed. In addition, you seem to underestimate the difficulties of ripping out part of a CJK font. This is not possible for commercial fonts. Even it is doable for open fonts (very few choices though), the incompatibility of the resulting fonts will make it totally unusable on most platforms. I want to add that on Windows, CJK users had never had such a problem, all known CJKfonts have their Latin glyphs (some are crappy), but the text rendering are "normal" (nothing like in the attachment). How window structures the style propagation for COMMON characters? Qianqian Behdad Esfahbod wrote: > Hi Qianqian, > > [CC'ing to gtk-i18n-list, so hopefully this is the last time I have to > repeat this.] > > On Mon, 2007-12-10 at 18:01 -0500, Qianqian Fang wrote: > >> Go back to the digit font change issue as we discussed earlier, I >> spent some time in the past few days, trying to get myself a more >> clear >> picture on this. I dug out some bug reports from various bugzillas >> (Mozilla, Redbat, Gnome) and gathered a list of similar reports (see >> the bottom of the email). These reports were filed from simplified and >> traditional Chinese users and Japanese users (I believed Korean >> experienced the same problem). So, one thing that can be said from >> this list is that the "contextual font selection" does seem to be >> bothering CJK users in text formatting. >> > > Yes, you have identified the problem very accurately. > > > >> I understand that "contextual shaping" is one of the techniques for >> rendering complex scripts. I am not sure how tight is the connection >> between "contextual shaping" and the "contextual format propagation", >> but one thing that I think may put some light to the complains of the >> CJK users is that Chinese (maybe Japanese as well) scripts are not >> contextual sensitive. Chinese characters are relatively independent >> and self-consistent in shapes (while, this statement is not true for >> Chinese calligraphy, where strokes may connect between characters >> depending on layout direction, but the current OSs and font >> technologies are not ready to handle this IMO). The only complexities >> may come from the fact that Hanzi for printing are mostly equal-width, >> and the punctuations among the Hanzi are expected to match the width >> of the surrounding Hanzi. As the full-width punctuations being encoded >> separately by Unicode, together with the contextual punctuation >> support of the input-methods, this seems to be handled very well. So, >> in short, for Chinese text layout, users are generally not expected to >> see contextual-based changes, either encoding/glyph or font faces >> (this may not include some extreme cases). >> > > And Pango supports those all perfectly fine. Even vertical writing > using the correct substituted punctuation glyphs. See: > > http://www.pango.org/ScriptGallery > > > The main font issue though, is that Chinese (Simplified, Traditional), > Korean, and Japanese share some Unicode code points, but they require > slightly different renderings. Now if you don't tell Pango which > version is preferred, how can it know which font to choose? It > explicitly doesn't prefer any one over the others to avoid cultural > problems. > > The symptoms of this problem are "multiple fonts used in the same line". > Solution is: Either run under a CJK locale, or give hints to Pango about > your preferred CJK locale using the env var PANGO_LANGUAGE. > > Note that theoretically Pango can do text analysis to come up with a > best guess, but doing that would then introduce another bug with > symptoms "changes font when typing a few characters on the same line". > > > >> Now go back to pango, from what I read from the bug reports, pango >> uses PANGO_SCRIPT_COMMON to represent language-independent symbols. I >> have no complain about that. It is a good classification based on the >> semantics of the symbols. >> > > Good. Let me also note that there's no way to change that. It's > hardcoded in the Unicode standard. > > > >> What I, and most CJK users, are not satisfied with is the >> contextual-sensitivity of those common scripts when for mating text >> under cjk locales. I know that you have advocated to stick with the >> "face" meaning of SCRIPT_COMMON, which is supposedly to be rendered by >> local languages. But IMO, the face meaning is misleading here. From a >> Chinese user perspective, the difference between the SCRIPT_COMMON to >> Latin is negligible, >> > > Lemme correct you here, "From a Chinese user perspective, the ASCII > digits are considered Latin". There's sure a lot more than ASCII digits > to SCRIPT_COMMON. Helps to be precise. > > > >> compared with its difference to CJK characters. Therefore, using CJK >> fonts to render SCRIPT_COMMON is quite odd. Using Latin fonts for >> COMMON is most preferred; even specifying no face ( i.e. using system >> fall-back) is better than assigning Chinese fonts for these scripts >> for that most Chinese fonts have low-quality Latin/common glyphs, even >> the commercial ones. >> > > And this problem has a name: "crappy glyphs and multiple scripts in a > font". Tell me about it... > > I already pointed out a few solutions to it previously: > > - Rip the crap out and everyone will feel better. > > - Use TrueType containers (even for bitmap-only fonts) and put each > script's glyphs into its own face, with all faces having the same name > and put into the same TrueType Collection file. > > - Finish patch for fontconfig to allow configuration to disable > certain Unicode codepoints per font. The write such configuration for > the crappy glyphs. > > Pick whichever you prefer and just do it. > > > Another symptom, "digits change font after typing character" is in fact > a very cool Pango feature, just badmouthed by the above problem. Fix > the problem. > > > >> As you see from the bug lists, this problem has existed for many >> years, and I am pretty sure that it will come back again and again, as >> long as the expected rendering is not achieved. If the current pango >> formatting logic is not sufficient to handle the CJK preferences as >> said above, I think to refine the logic to take it into consideration >> is better than stick with a fixed but incomplete logic. >> > > I consider patches improving Pango's font selection algorithm, but none > that I've seen so far had been an improvement (from my point of view). > If it has words like CJK or "special case", I'm most probably not > interested. Of the bugs you listed, only the one I opened myself is > valid IMO. The rest is just left open because no matter how many times > I close them, they will be reopened... Oh well. > > > >> please let me know your thoughts and reasoning on whether this is >> feasible or not, if yes, where to get start. >> > > Does the above make sense? I understand that it's easier to apply a two > line patch to Pango instead of doing what of the things I listed above, > but that just doesn't fit in the design, and it introduces other > problems you don't see right now. > > > >> thank you for paying attention to this issue. >> >> Qianqian >> > > Regards, > > behdad > > > >> =============================================================== >> Bug 321113 - Wrong glyph subsituation algorithm for digital characters >> and punctuations >> http://bugzilla.gnome.org/show_bug.cgi?id=321113 >> >> >> Bug 345072 - changes font when typing different scripts on the same >> line >> http://bugzilla.gnome.org/show_bug.cgi?id=345072 >> >> >> Bug 345386 - Language and direction propagation in and between >> PangoLayouts >> http://bugzilla.gnome.org/show_bug.cgi?id=345386 (opened by yourself) >> https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679 >> >> >> Bug 481210 - [All lang] [firefox] - Face of the number is changing >> when enter number + Char, in any Locale >> http://bugzilla.gnome.org/show_bug.cgi?id=481210 >> >> >> Bug 481188 - ascii text space too narrow for Chinese encodings >> http://bugzilla.gnome.org/show_bug.cgi?id=481188 >> >> >> Bugzilla Bug 129541: changes font when typing different scripts on the >> same line >> https://bugzilla.redhat.com/show_bug.cgi?id=129541 >> >> >> Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango >> https://bugzilla.redhat.com/show_bug.cgi?id=131218 >> >> >> Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox >> give bad eol rendering and cursor placement >> https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens >> Petersen) >> >> >> https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link) >> >> >> Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is >> changing when enter number + Char, in any Locale >> https://bugzilla.redhat.com/show_bug.cgi?id=228804 >> >> >> Bugzilla Bug 221361: [pango] ascii text space and punctuation is >> narrow for CJK >> https://bugzilla.redhat.com/show_bug.cgi?id=221361 >> >> >> Bug 379125 - chinese punctuations after english letters are wrongly >> displayed >> https://bugzilla.mozilla.org/show_bug.cgi?id=379125 >> https://bugzilla.mozilla.org/attachment.cgi?id=263185 >> =============================================================== >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot_gedit.png Type: image/png Size: 19301 bytes Desc: not available URL: From behdad at behdad.org Sun Dec 16 23:22:01 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Sun, 16 Dec 2007 18:22:01 -0500 Subject: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <47616849.7090905@gmail.com> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> Message-ID: <1197847321.797.68.camel@behdad.behdad.org> On Thu, 2007-12-13 at 12:13 -0500, Qianqian Fang wrote: > hi Behdad Hi, > I would have agreed with you if you clearly tell me why this change SHOULD > be done in the fonts, or in the font selection, not in the layout > engine. Your > previous replies, either to the bug reports or to my email, simply > refused to > make this change by saying this is "technically impossible", but you do > not tell me based on what model that you made the statement. If you can > give me a diagram or document to illustrate that this is not the business of > layout engine, I would not insist to continue this discussion. You've kept saying it should be different for CJK and I've always asked you to describe how exactly it should behave to no avail. Here is the set of assumptions that best describes the problem: A1. The layout engine is not provided any hints whatsoever on which of the CJK languages to prefer. A2. Any font available on the system is suitable for (aka "supports") at most one CJK language, not more. A3. For every CJK language, there exists a positive number of characters solely used in that CJK language and not any other one. A4. There exists a positive number of Unicode characters that are used in more than one CJK language. That's enough to prove that you can't fix both of these bugs at the same time: B1. "multiple CJK fonts on the same line" B2. "font face changes when more text is typed" This is what we will prove: "for any layout engine with font fallback support [1], there exists some CJK text that when typed on a line by the user, either results in more than one CJK font being used, or a font change for the already typed text happens", where font fallback support means that a character is assigned a font that is known to *support* that character, if any such font is available on the system. We prove by constructing such a piece of text. Here's a sketch: - Pick a Unicode character that is used in more than one CJK language. This is possible because of A4. Call it c[0]. - Let the layout engine choose a font to render this character. Let f[0] be the font used to render it. - Find the CJK language l[0] that font f[0] supports. By A2 we know that there can't be more than one such language. If no such language exists, the layout engine suffers from the bug "no CJK font is chosen". Abort. - Let l[1] be any CJK language other than l[0]. - Choose c[1] to be any CJK character used in language l[1] and l[1] only. That's possible because of A3. - Pass text c[0]c[1] to the layout engine, let f'[0]f[1] be the two fonts chosen to render characters c[0] and c[1] respectively. - Observe that: * if f'[0] == f[0]: We know f[0] supports l[0], and that l[0] != l[1]. By A2, it follows that f[0] does not support l[1], so f[0] cannot be chosen for c[1] and as a result, f'[0] != f[1], that is, multiple fonts are chosen to render the text. * if f'[0] != f[0]: Typing character c[1] on the line containing text c[0] caused the chosen font for c[0] to change. End of proof ? [1] I'm tempted to say deterministic Turing machine here, but I pass :) Similar proofs can be constructed for other CJK "bugs" (those involving Latin text, ASCII digits, etc), but I've already exceeded my time limit for this message. > Secondly, you said that "contextual font selection" is a "cool" > feature, I am wondering what languages are beneficial from this feature? > (I believe there are, but just want to know). Pretty much every non-Latin script. In some situations even the Latin script. Take the Unicode character U+002E FULL STOP, aka ASCII period. It is used in more than just Latin, in Arabic for example, in Hebrew, possibly in Indic and many other scripts. If it was not grouped with neighboring characters for font selection purposes all those people would have got their Arabic/Hebrew/... text assigned an Arabic/Hebrew/... font while the periods in at the end of sentences assigned a different (default Latin for example) font. The same happens for Latin under a document tagged as non-Latin. It's not a luxury thing. It's just how things are supposed to work. > As I said in the previous email, this > creates more > troubles for CJK languages than benefits.Particularly this ruins the text > alignment in monospace environment (see attachment). I doubt anyone > see it would say "cool", rather, they would feel annoyed. That's not true. If you have Chinese text and Latin text in the same line, and your Latin and Chinese monospace fonts have different widths, you are screwed no matter what. There are situations that that particular bug you are referencing here can be improved, and that's why I filed bug 345386, but you already knew that. > In addition, you seem to underestimate the difficulties of ripping out > part of > a CJK font. This is not possible for commercial fonts. Even it is doable > for open fonts (very few choices though), the incompatibility of the > resulting > fonts will make it totally unusable on most platforms. I've put three different ways in front of you. The fontconfig one is not hard at all for anyone willing to put their fingers where their mouth is. You on the other hand, seem to ignore the impossibility (not difficulty) of what you are asking for. > I want to add that on Windows, CJK users had never had such a problem, > all known CJKfonts have their Latin glyphs (some are crappy), but the text > rendering are "normal" (nothing like in the attachment). How window > structures the style propagation for COMMON characters? Windows does no font fallback. You choose which font to use. But you want your Latin characters in a different font than your Chinese characters AND you want to keep the crappy glyphs. They don't mix. > Qianqian behdad > Behdad Esfahbod wrote: > > Hi Qianqian, > > > > [CC'ing to gtk-i18n-list, so hopefully this is the last time I have to > > repeat this.] > > > > On Mon, 2007-12-10 at 18:01 -0500, Qianqian Fang wrote: > > > >> Go back to the digit font change issue as we discussed earlier, I > >> spent some time in the past few days, trying to get myself a more > >> clear > >> picture on this. I dug out some bug reports from various bugzillas > >> (Mozilla, Redbat, Gnome) and gathered a list of similar reports (see > >> the bottom of the email). These reports were filed from simplified and > >> traditional Chinese users and Japanese users (I believed Korean > >> experienced the same problem). So, one thing that can be said from > >> this list is that the "contextual font selection" does seem to be > >> bothering CJK users in text formatting. > >> > > > > Yes, you have identified the problem very accurately. > > > > > > > >> I understand that "contextual shaping" is one of the techniques for > >> rendering complex scripts. I am not sure how tight is the connection > >> between "contextual shaping" and the "contextual format propagation", > >> but one thing that I think may put some light to the complains of the > >> CJK users is that Chinese (maybe Japanese as well) scripts are not > >> contextual sensitive. Chinese characters are relatively independent > >> and self-consistent in shapes (while, this statement is not true for > >> Chinese calligraphy, where strokes may connect between characters > >> depending on layout direction, but the current OSs and font > >> technologies are not ready to handle this IMO). The only complexities > >> may come from the fact that Hanzi for printing are mostly equal-width, > >> and the punctuations among the Hanzi are expected to match the width > >> of the surrounding Hanzi. As the full-width punctuations being encoded > >> separately by Unicode, together with the contextual punctuation > >> support of the input-methods, this seems to be handled very well. So, > >> in short, for Chinese text layout, users are generally not expected to > >> see contextual-based changes, either encoding/glyph or font faces > >> (this may not include some extreme cases). > >> > > > > And Pango supports those all perfectly fine. Even vertical writing > > using the correct substituted punctuation glyphs. See: > > > > http://www.pango.org/ScriptGallery > > > > > > The main font issue though, is that Chinese (Simplified, Traditional), > > Korean, and Japanese share some Unicode code points, but they require > > slightly different renderings. Now if you don't tell Pango which > > version is preferred, how can it know which font to choose? It > > explicitly doesn't prefer any one over the others to avoid cultural > > problems. > > > > The symptoms of this problem are "multiple fonts used in the same line". > > Solution is: Either run under a CJK locale, or give hints to Pango about > > your preferred CJK locale using the env var PANGO_LANGUAGE. > > > > Note that theoretically Pango can do text analysis to come up with a > > best guess, but doing that would then introduce another bug with > > symptoms "changes font when typing a few characters on the same line". > > > > > > > >> Now go back to pango, from what I read from the bug reports, pango > >> uses PANGO_SCRIPT_COMMON to represent language-independent symbols. I > >> have no complain about that. It is a good classification based on the > >> semantics of the symbols. > >> > > > > Good. Let me also note that there's no way to change that. It's > > hardcoded in the Unicode standard. > > > > > > > >> What I, and most CJK users, are not satisfied with is the > >> contextual-sensitivity of those common scripts when for mating text > >> under cjk locales. I know that you have advocated to stick with the > >> "face" meaning of SCRIPT_COMMON, which is supposedly to be rendered by > >> local languages. But IMO, the face meaning is misleading here. From a > >> Chinese user perspective, the difference between the SCRIPT_COMMON to > >> Latin is negligible, > >> > > > > Lemme correct you here, "From a Chinese user perspective, the ASCII > > digits are considered Latin". There's sure a lot more than ASCII digits > > to SCRIPT_COMMON. Helps to be precise. > > > > > > > >> compared with its difference to CJK characters. Therefore, using CJK > >> fonts to render SCRIPT_COMMON is quite odd. Using Latin fonts for > >> COMMON is most preferred; even specifying no face ( i.e. using system > >> fall-back) is better than assigning Chinese fonts for these scripts > >> for that most Chinese fonts have low-quality Latin/common glyphs, even > >> the commercial ones. > >> > > > > And this problem has a name: "crappy glyphs and multiple scripts in a > > font". Tell me about it... > > > > I already pointed out a few solutions to it previously: > > > > - Rip the crap out and everyone will feel better. > > > > - Use TrueType containers (even for bitmap-only fonts) and put each > > script's glyphs into its own face, with all faces having the same name > > and put into the same TrueType Collection file. > > > > - Finish patch for fontconfig to allow configuration to disable > > certain Unicode codepoints per font. The write such configuration for > > the crappy glyphs. > > > > Pick whichever you prefer and just do it. > > > > > > Another symptom, "digits change font after typing character" is in fact > > a very cool Pango feature, just badmouthed by the above problem. Fix > > the problem. > > > > > > > >> As you see from the bug lists, this problem has existed for many > >> years, and I am pretty sure that it will come back again and again, as > >> long as the expected rendering is not achieved. If the current pango > >> formatting logic is not sufficient to handle the CJK preferences as > >> said above, I think to refine the logic to take it into consideration > >> is better than stick with a fixed but incomplete logic. > >> > > > > I consider patches improving Pango's font selection algorithm, but none > > that I've seen so far had been an improvement (from my point of view). > > If it has words like CJK or "special case", I'm most probably not > > interested. Of the bugs you listed, only the one I opened myself is > > valid IMO. The rest is just left open because no matter how many times > > I close them, they will be reopened... Oh well. > > > > > > > >> please let me know your thoughts and reasoning on whether this is > >> feasible or not, if yes, where to get start. > >> > > > > Does the above make sense? I understand that it's easier to apply a two > > line patch to Pango instead of doing what of the things I listed above, > > but that just doesn't fit in the design, and it introduces other > > problems you don't see right now. > > > > > > > >> thank you for paying attention to this issue. > >> > >> Qianqian > >> > > > > Regards, > > > > behdad > > > > > > > >> =============================================================== > >> Bug 321113 - Wrong glyph subsituation algorithm for digital characters > >> and punctuations > >> http://bugzilla.gnome.org/show_bug.cgi?id=321113 > >> > >> > >> Bug 345072 - changes font when typing different scripts on the same > >> line > >> http://bugzilla.gnome.org/show_bug.cgi?id=345072 > >> > >> > >> Bug 345386 - Language and direction propagation in and between > >> PangoLayouts > >> http://bugzilla.gnome.org/show_bug.cgi?id=345386 (opened by yourself) > >> https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679 > >> > >> > >> Bug 481210 - [All lang] [firefox] - Face of the number is changing > >> when enter number + Char, in any Locale > >> http://bugzilla.gnome.org/show_bug.cgi?id=481210 > >> > >> > >> Bug 481188 - ascii text space too narrow for Chinese encodings > >> http://bugzilla.gnome.org/show_bug.cgi?id=481188 > >> > >> > >> Bugzilla Bug 129541: changes font when typing different scripts on the > >> same line > >> https://bugzilla.redhat.com/show_bug.cgi?id=129541 > >> > >> > >> Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango > >> https://bugzilla.redhat.com/show_bug.cgi?id=131218 > >> > >> > >> Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox > >> give bad eol rendering and cursor placement > >> https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens > >> Petersen) > >> > >> > >> https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link) > >> > >> > >> Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is > >> changing when enter number + Char, in any Locale > >> https://bugzilla.redhat.com/show_bug.cgi?id=228804 > >> > >> > >> Bugzilla Bug 221361: [pango] ascii text space and punctuation is > >> narrow for CJK > >> https://bugzilla.redhat.com/show_bug.cgi?id=221361 > >> > >> > >> Bug 379125 - chinese punctuations after english letters are wrongly > >> displayed > >> https://bugzilla.mozilla.org/show_bug.cgi?id=379125 > >> https://bugzilla.mozilla.org/attachment.cgi?id=263185 > >> =============================================================== > >> > > > > > -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From nicolas.mailhot at laposte.net Mon Dec 17 10:46:38 2007 From: nicolas.mailhot at laposte.net (Nicolas Mailhot) Date: Mon, 17 Dec 2007 11:46:38 +0100 Subject: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <1197847321.797.68.camel@behdad.behdad.org> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> <1197847321.797.68.camel@behdad.behdad.org> Message-ID: <1197888398.29492.19.camel@rousalka.dyndns.org> Le dimanche 16 d?cembre 2007 ? 18:22 -0500, Behdad Esfahbod a ?crit : > On Thu, 2007-12-13 at 12:13 -0500, Qianqian Fang wrote: > > Secondly, you said that "contextual font selection" is a "cool" > > feature, I am wondering what languages are beneficial from this feature? > > (I believe there are, but just want to know). > > Pretty much every non-Latin script. In some situations even the Latin > script. > > Take the Unicode character U+002E FULL STOP, aka ASCII period. It is > used in more than just Latin, in Arabic for example, in Hebrew, possibly > in Indic and many other scripts. If it was not grouped with neighboring > characters for font selection purposes all those people would have got > their Arabic/Hebrew/... text assigned an Arabic/Hebrew/... font while > the periods in at the end of sentences assigned a different (default > Latin for example) font. > > The same happens for Latin under a document tagged as non-Latin. It's > not a luxury thing. It's just how things are supposed to work. To be honest this was mostly solved latin-size by creating pan-european+ LGC fonts to completely avoid triggering substitutions. Creating coherent pan-unicode fonts would solve it for other locales but that's a huge piece of work and some bits like opentype base are not there yet on the FLOSS side. > > As I said in the previous email, this > > creates more > > troubles for CJK languages than benefits.Particularly this ruins the text > > alignment in monospace environment (see attachment). I doubt anyone > > see it would say "cool", rather, they would feel annoyed. > > That's not true. If you have Chinese text and Latin text in the same > line, and your Latin and Chinese monospace fonts have different widths, > you are screwed no matter what. That's means that for monospace separate fonts with different metrics are a dead-end, right? :p I wonder if something semi-monospaced like using twice the base size for complex scripts would be worth it or would just break horribly apps. > > In addition, you seem to underestimate the difficulties of ripping out > > part of > > a CJK font. This is not possible for commercial fonts. Even it is doable > > for open fonts (very few choices though), the incompatibility of the > > resulting > > fonts will make it totally unusable on most platforms. > > I've put three different ways in front of you. Easy one: removing latin from the FLOSS font. But wouldn't solve proprietary fonts people use in the wild. Complete one: enhancing fontconfig to blacklist parts of fonts. I don't see much the point of the TTC solution, except as a workaround to lack of opentype BASE support. > The fontconfig one is > not hard at all for anyone willing to put their fingers where their > mouth is. You on the other hand, seem to ignore the impossibility (not > difficulty) of what you are asking for. > > > I want to add that on Windows, CJK users had never had such a problem, > > all known CJKfonts have their Latin glyphs (some are crappy), but the text > > rendering are "normal" (nothing like in the attachment). How window > > structures the style propagation for COMMON characters? > > Windows does no font fallback. But windows, however, has an input chooser that explicitely specifies the language in use instead of just a keyboard layout switcher, and I suspect some windows apps do use it to select the right font Unfortunately it seems Sergey Udaltsov was discouraged by lack of positive feedback and stopped pushing something like http://fedoraproject.org/wiki/SIGs/Fonts/Dev/LanguageAwarenessProblem ?Qianqian: you need to realise the low hanging fruits have been harvested long ago. There are no easy solution left that was not rejected for one reason or another. That's why you're hitting a wall (and exasperating Behdad). The bits needed to support well CJK and complex scripts are well-known, but they're non-trivial so they do need some concerted effort by the affected communities. Regards, -- Nicolas Mailhot -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: Ceci est une partie de message num?riquement sign?e URL: From behdad at behdad.org Mon Dec 17 17:32:09 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Mon, 17 Dec 2007 12:32:09 -0500 Subject: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <1197888398.29492.19.camel@rousalka.dyndns.org> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> <1197847321.797.68.camel@behdad.behdad.org> <1197888398.29492.19.camel@rousalka.dyndns.org> Message-ID: <1197912729.9234.35.camel@behdad.behdad.org> On Mon, 2007-12-17 at 11:46 +0100, Nicolas Mailhot wrote: > > I don't see much the point of the TTC solution, except as a workaround > to lack of opentype BASE support. You are prolly right. Families with the same name have almost all the problems of a pan-Unicode font when installed. -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From behdad at behdad.org Thu Dec 20 12:48:50 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Thu, 20 Dec 2007 07:48:50 -0500 Subject: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <476A38AE.6020209@gmx.net> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> <1197847321.797.68.camel@behdad.behdad.org> <476A38AE.6020209@gmx.net> Message-ID: <1198154930.9724.8.camel@behdad.behdad.org> On Thu, 2007-12-20 at 15:41 +0600, Christopher Fynn wrote: > > Language specific rendering *can* be achieved using OpenType lookups - > but, even > if the font contains the necessary language specific lookups (and most > don't), > for this feature to function correctly the system somehow needs to > know which > language is being used. This cannot always be determined by current > locale, the > keyboard/IME used to type the text, or from the range of Unicode > characters > involved so especially with multilingual documents you need users to > reliably > mark up text. Language then needs to be indicated by some high-level > form of > mark-up or tagging within the documents - which right away excludes > plain text. Setting locale is actually enough. If that's not desired, $PANGO_LANGUAGE can be set as a fallback. So far seems like most of the issues happen because either the users are not setting locale correctly or are using crappy fonts. How do I don't care enough about those cases I'm not surprised. > - Chris -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From behdad at behdad.org Thu Dec 20 15:08:05 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Thu, 20 Dec 2007 10:08:05 -0500 Subject: [gtk-i18n-list] Re: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <20071220230429.35dc5f26.mpsuzuki@hiroshima-u.ac.jp> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> <1197847321.797.68.camel@behdad.behdad.org> <476A38AE.6020209@gmx.net> <1198154930.9724.8.camel@behdad.behdad.org> <20071220230429.35dc5f26.mpsuzuki@hiroshima-u.ac.jp> Message-ID: <1198163285.9724.17.camel@behdad.behdad.org> On Thu, 2007-12-20 at 23:04 +0900, mpsuzuki at hiroshima-u.ac.jp wrote: > On Thu, 20 Dec 2007 07:48:50 -0500 > Behdad Esfahbod wrote: > >Setting locale is actually enough. If that's not desired, > >$PANGO_LANGUAGE can be set as a fallback. So far seems like most of the > >issues happen because either the users are not setting locale correctly > >or are using crappy fonts. How do I don't care enough about those cases > >I'm not surprised. > > Excuse me, PANGO_LANGUAGE is the solution to modify the > Pango's behaviour that Qianqian & Abel ask for fix? It's a way to tell Pango which of the CJK languages to prefer. It's main use is when running under non-CJK locale (en_US for example) and the text doesn't have language tags. It solves most of the "multiple fonts used in the same line" issues with CJK characters. > Regards, > mpsuzuki -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From behdad at behdad.org Thu Dec 20 18:55:10 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Thu, 20 Dec 2007 13:55:10 -0500 Subject: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> <1197847321.797.68.camel@behdad.behdad.org> Message-ID: <1198176910.9724.46.camel@behdad.behdad.org> On Thu, 2007-12-20 at 04:22 +0800, Abel Cheung wrote: > Hi, Hi, > My reply is followed below, inline... So is mine. > On Dec 17, 2007 7:22 AM, Behdad Esfahbod wrote: > [..........tons of quasi-maths ...........] > > > > > Secondly, you said that "contextual font selection" is a "cool" > > > feature, I am wondering what languages are beneficial from this feature? > > > (I believe there are, but just want to know). > > > > Pretty much every non-Latin script. In some situations even the Latin > > script. > > > > Take the Unicode character U+002E FULL STOP, aka ASCII period. It is > > used in more than just Latin, in Arabic for example, in Hebrew, possibly > > in Indic and many other scripts. If it was not grouped with neighboring > > characters for font selection purposes all those people would have got > > their Arabic/Hebrew/... text assigned an Arabic/Hebrew/... font while > > the periods in at the end of sentences assigned a different (default > > Latin for example) font. > > > > The same happens for Latin under a document tagged as non-Latin. It's > > not a luxury thing. It's just how things are supposed to work. > > That means, font change depending on context is actually preferrred in > some fonts or some langauges, is it? If that's true, then this would be > a per-language preference, some want it, some don't. > > So does pango support toggling this behavior yet? (I guess not?) What do you exactly mean by "this behavior"? Which behavior? Show me the source code line. I'm getting tired of all the hand waving. > > > > The main font issue though, is that Chinese (Simplified, Traditional), > > > > Korean, and Japanese share some Unicode code points, but they require > > > > slightly different renderings. Now if you don't tell Pango which > > > > version is preferred, how can it know which font to choose? It > > > > explicitly doesn't prefer any one over the others to avoid cultural > > > > problems. > > > > > > > > The symptoms of this problem are "multiple fonts used in the same line". > > > > Solution is: Either run under a CJK locale, or give hints to Pango about > > > > your preferred CJK locale using the env var PANGO_LANGUAGE. > > > > > > > > Note that theoretically Pango can do text analysis to come up with a > > > > best guess, but doing that would then introduce another bug with > > > > symptoms "changes font when typing a few characters on the same line". > > Let me set the record straight here. Most people seeing this problem is not > exactly complaining about the font changing, but about the font changing TO > SOME BAD LATIN GLYPH THEY DON'T LIKE. It is understood that font changing is > almost not avoidable, since typing just a few characters may not provide enough > information on what kind of font should be picked, and typing more > gives more info. > So far it is determined per sentence, or per what? Believe me, I know that. And I understand it if you don't WRITE IN CAPS too. Does it help if I say THEN GO REMOVE THE CRAPPY FONT? [...] > Sadly this way absolutely won't satisfy everybody -- one party only. And in > particular, the font picked is determined per glyph, causing a sentence to be > intermixed by multiple CJK fonts as described. This is totally wrong. Pango first tags each piece of text with a language, then asks fontconfig to sort fonts for that language, then uses the sorted list to assign font to each character. That is, if you mark your text zh_CN (by either running under that locale, or setting PANGO_LANGUAGE to that, or otherwise marking it), and have a suitable font for that language and if you have crappy fonts for it, have fontconfig configured to prefer the good one, then Pango chooses the right font. Now all the "bugs" you show me are in all the steps mentioned except for what Pango is doing. > What if the font determination is not chopped glyph by glyph, but also > determined heuristically with context? Pango already does that. That's exactly what you call "contextual" something above and condemn. > If my guess is correct this would work most of the > cases, even among language variants (think zh_CN and zh_TW). No. You need to go back and read and understand my "tons of quasi-maths". > > > > Another symptom, "digits change font after typing character" is in fact > > > > a very cool Pango feature, just badmouthed by the above problem. Fix > > > > the problem. > > When a solution is not universal enough to be accepted by everybody, > and caused more trouble then its worth for specific people, it would be > badmouthed no matter what. Or not? I don't know the rule here. You officially don't know what you are talking about. behdad > Abel > > > > > > > > > > > > > > > > > >> As you see from the bug lists, this problem has existed for many > > > >> years, and I am pretty sure that it will come back again and again, as > > > >> long as the expected rendering is not achieved. If the current pango > > > >> formatting logic is not sufficient to handle the CJK preferences as > > > >> said above, I think to refine the logic to take it into consideration > > > >> is better than stick with a fixed but incomplete logic. > > > >> > > > > > > > > I consider patches improving Pango's font selection algorithm, but none > > > > that I've seen so far had been an improvement (from my point of view). > > > > If it has words like CJK or "special case", I'm most probably not > > > > interested. Of the bugs you listed, only the one I opened myself is > > > > valid IMO. The rest is just left open because no matter how many times > > > > I close them, they will be reopened... Oh well. > > > > > > > > > > > > > > > >> please let me know your thoughts and reasoning on whether this is > > > >> feasible or not, if yes, where to get start. > > > >> > > > > > > > > Does the above make sense? I understand that it's easier to apply a two > > > > line patch to Pango instead of doing what of the things I listed above, > > > > but that just doesn't fit in the design, and it introduces other > > > > problems you don't see right now. > > > > > > > > > > > > > > > >> thank you for paying attention to this issue. > > > >> > > > >> Qianqian > > > >> > > > > > > > > Regards, > > > > > > > > behdad > > > > > > > > > > > > > > > >> =============================================================== > > > >> Bug 321113 - Wrong glyph subsituation algorithm for digital characters > > > >> and punctuations > > > >> http://bugzilla.gnome.org/show_bug.cgi?id=321113 > > > >> > > > >> > > > >> Bug 345072 - changes font when typing different scripts on the same > > > >> line > > > >> http://bugzilla.gnome.org/show_bug.cgi?id=345072 > > > >> > > > >> > > > >> Bug 345386 - Language and direction propagation in and between > > > >> PangoLayouts > > > >> http://bugzilla.gnome.org/show_bug.cgi?id=345386 (opened by yourself) > > > >> https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679 > > > >> > > > >> > > > >> Bug 481210 - [All lang] [firefox] - Face of the number is changing > > > >> when enter number + Char, in any Locale > > > >> http://bugzilla.gnome.org/show_bug.cgi?id=481210 > > > >> > > > >> > > > >> Bug 481188 - ascii text space too narrow for Chinese encodings > > > >> http://bugzilla.gnome.org/show_bug.cgi?id=481188 > > > >> > > > >> > > > >> Bugzilla Bug 129541: changes font when typing different scripts on the > > > >> same line > > > >> https://bugzilla.redhat.com/show_bug.cgi?id=129541 > > > >> > > > >> > > > >> Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango > > > >> https://bugzilla.redhat.com/show_bug.cgi?id=131218 > > > >> > > > >> > > > >> Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox > > > >> give bad eol rendering and cursor placement > > > >> https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens > > > >> Petersen) > > > >> > > > >> > > > >> https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link) > > > >> > > > >> > > > >> Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is > > > >> changing when enter number + Char, in any Locale > > > >> https://bugzilla.redhat.com/show_bug.cgi?id=228804 > > > >> > > > >> > > > >> Bugzilla Bug 221361: [pango] ascii text space and punctuation is > > > >> narrow for CJK > > > >> https://bugzilla.redhat.com/show_bug.cgi?id=221361 > > > >> > > > >> > > > >> Bug 379125 - chinese punctuations after english letters are wrongly > > > >> displayed > > > >> https://bugzilla.mozilla.org/show_bug.cgi?id=379125 > > > >> https://bugzilla.mozilla.org/attachment.cgi?id=263185 > > > >> =============================================================== > > > >> > > > > > > > > > > > > > -- > > behdad > > http://behdad.org/ > > > > ...very few phenomena can pull someone out of Deep Hack Mode, with two > > noted exceptions: being struck by lightning, or worse, your *computer* > > being struck by lightning. -- Matt Welsh > > > > _______________________________________________ > > gtk-i18n-list mailing list > > gtk-i18n-list at gnome.org > > http://mail.gnome.org/mailman/listinfo/gtk-i18n-list > > > > > -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From fangqq at gmail.com Thu Dec 20 20:30:24 2007 From: fangqq at gmail.com (Qianqian Fang) Date: Thu, 20 Dec 2007 15:30:24 -0500 Subject: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <1198176910.9724.46.camel@behdad.behdad.org> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> <1197847321.797.68.camel@behdad.behdad.org> <1198176910.9724.46.camel@behdad.behdad.org> Message-ID: <476AD0E0.5020106@gmail.com> hi Badhdad I don't think the tone in your reply going to be helpful in any aspect toward a solution of this problem. I hope you understand that people raise these issues for the goods of pango. They want to make it more powerful and logical under all possible circumstances. Second, there must be a reason for this issue being raised again and again in the past many years. I think insufficient explanations and poor guidance for users toward a good solution play roles here (I am sorry to say that your "proof" in the last email still did not help because it was not what I was asking for). As reading the replies in the past few days, I came to realize the key of problem is to set up a "correct" fall-back path of the untagged (or COMMON) text. Obviously, you are reluctant to explicitly tag them as LATIN in pango. You may be right if differentiating COMMON with LATIN is practically necessary (I mean "practically", not semantically as in Unicode standard). You have your rationales here. Unfortunately, the current fall-back mechanism eventually assign the current locale info to these untagged text. And it turns out that for some users (if not all), particularly for CJK users (where the practical differences between Latin/Common are not significant), it created unpleasant formating results due to the mixing of fonts. So, it seems obvious that additional info is needed to assist the fall-back of these untagged text to the preferred settings. This info can be introduced by the patched fontconfig, using block preference font list; or using the current keyboard layout as suggested by Sergey and Chris. Maybe a third way is to create a LC variable, say LC_COMMON, independent of LC_ALL/LANG, taking care of the untagged text formating. I actually felt that this is probably more suitable than the other two approaches. Because this is a locale-based preference, not font or keyboard preferences (here is just my first thought on this, I may be wrong). In any case, I "think" I understand your argument, although there are still details needs to be verified. But I think it will be useful if we focus on clarifying a solution rather than arguing who is right and who is wrong. Qianqian Behdad Esfahbod wrote: > On Thu, 2007-12-20 at 04:22 +0800, Abel Cheung wrote: > >> Hi, >> > > Hi, > > >> My reply is followed below, inline... >> > > So is mine. > > >> On Dec 17, 2007 7:22 AM, Behdad Esfahbod wrote: >> [..........tons of quasi-maths ...........] >> >>>> Secondly, you said that "contextual font selection" is a "cool" >>>> feature, I am wondering what languages are beneficial from this feature? >>>> (I believe there are, but just want to know). >>>> >>> Pretty much every non-Latin script. In some situations even the Latin >>> script. >>> >>> Take the Unicode character U+002E FULL STOP, aka ASCII period. It is >>> used in more than just Latin, in Arabic for example, in Hebrew, possibly >>> in Indic and many other scripts. If it was not grouped with neighboring >>> characters for font selection purposes all those people would have got >>> their Arabic/Hebrew/... text assigned an Arabic/Hebrew/... font while >>> the periods in at the end of sentences assigned a different (default >>> Latin for example) font. >>> >>> The same happens for Latin under a document tagged as non-Latin. It's >>> not a luxury thing. It's just how things are supposed to work. >>> >> That means, font change depending on context is actually preferrred in >> some fonts or some langauges, is it? If that's true, then this would be >> a per-language preference, some want it, some don't. >> >> So does pango support toggling this behavior yet? (I guess not?) >> > > What do you exactly mean by "this behavior"? Which behavior? Show me > the source code line. I'm getting tired of all the hand waving. > > > > >>>>> The main font issue though, is that Chinese (Simplified, Traditional), >>>>> Korean, and Japanese share some Unicode code points, but they require >>>>> slightly different renderings. Now if you don't tell Pango which >>>>> version is preferred, how can it know which font to choose? It >>>>> explicitly doesn't prefer any one over the others to avoid cultural >>>>> problems. >>>>> >>>>> The symptoms of this problem are "multiple fonts used in the same line". >>>>> Solution is: Either run under a CJK locale, or give hints to Pango about >>>>> your preferred CJK locale using the env var PANGO_LANGUAGE. >>>>> >>>>> Note that theoretically Pango can do text analysis to come up with a >>>>> best guess, but doing that would then introduce another bug with >>>>> symptoms "changes font when typing a few characters on the same line". >>>>> >> Let me set the record straight here. Most people seeing this problem is not >> exactly complaining about the font changing, but about the font changing TO >> SOME BAD LATIN GLYPH THEY DON'T LIKE. It is understood that font changing is >> almost not avoidable, since typing just a few characters may not provide enough >> information on what kind of font should be picked, and typing more >> gives more info. >> So far it is determined per sentence, or per what? >> > > Believe me, I know that. And I understand it if you don't WRITE IN CAPS > too. Does it help if I say THEN GO REMOVE THE CRAPPY FONT? > > > [...] > >> Sadly this way absolutely won't satisfy everybody -- one party only. And in >> particular, the font picked is determined per glyph, causing a sentence to be >> intermixed by multiple CJK fonts as described. >> > > This is totally wrong. Pango first tags each piece of text with a > language, then asks fontconfig to sort fonts for that language, then > uses the sorted list to assign font to each character. That is, if you > mark your text zh_CN (by either running under that locale, or setting > PANGO_LANGUAGE to that, or otherwise marking it), and have a suitable > font for that language and if you have crappy fonts for it, have > fontconfig configured to prefer the good one, then Pango chooses the > right font. Now all the "bugs" you show me are in all the steps > mentioned except for what Pango is doing. > > > >> What if the font determination is not chopped glyph by glyph, but also >> determined heuristically with context? >> > > Pango already does that. That's exactly what you call "contextual" > something above and condemn. > > > >> If my guess is correct this would work most of the >> cases, even among language variants (think zh_CN and zh_TW). >> > > No. You need to go back and read and understand my "tons of > quasi-maths". > > > >>>>> Another symptom, "digits change font after typing character" is in fact >>>>> a very cool Pango feature, just badmouthed by the above problem. Fix >>>>> the problem. >>>>> >> When a solution is not universal enough to be accepted by everybody, >> and caused more trouble then its worth for specific people, it would be >> badmouthed no matter what. Or not? I don't know the rule here. >> > > You officially don't know what you are talking about. > > > behdad > > > >> Abel >> >> >> >>>>> >>>>> >>>>>> As you see from the bug lists, this problem has existed for many >>>>>> years, and I am pretty sure that it will come back again and again, as >>>>>> long as the expected rendering is not achieved. If the current pango >>>>>> formatting logic is not sufficient to handle the CJK preferences as >>>>>> said above, I think to refine the logic to take it into consideration >>>>>> is better than stick with a fixed but incomplete logic. >>>>>> >>>>>> >>>>> I consider patches improving Pango's font selection algorithm, but none >>>>> that I've seen so far had been an improvement (from my point of view). >>>>> If it has words like CJK or "special case", I'm most probably not >>>>> interested. Of the bugs you listed, only the one I opened myself is >>>>> valid IMO. The rest is just left open because no matter how many times >>>>> I close them, they will be reopened... Oh well. >>>>> >>>>> >>>>> >>>>> >>>>>> please let me know your thoughts and reasoning on whether this is >>>>>> feasible or not, if yes, where to get start. >>>>>> >>>>>> >>>>> Does the above make sense? I understand that it's easier to apply a two >>>>> line patch to Pango instead of doing what of the things I listed above, >>>>> but that just doesn't fit in the design, and it introduces other >>>>> problems you don't see right now. >>>>> >>>>> >>>>> >>>>> >>>>>> thank you for paying attention to this issue. >>>>>> >>>>>> Qianqian >>>>>> >>>>>> >>>>> Regards, >>>>> >>>>> behdad >>>>> >>>>> >>>>> >>>>> >>>>>> =============================================================== >>>>>> Bug 321113 - Wrong glyph subsituation algorithm for digital characters >>>>>> and punctuations >>>>>> http://bugzilla.gnome.org/show_bug.cgi?id=321113 >>>>>> >>>>>> >>>>>> Bug 345072 - changes font when typing different scripts on the same >>>>>> line >>>>>> http://bugzilla.gnome.org/show_bug.cgi?id=345072 >>>>>> >>>>>> >>>>>> Bug 345386 - Language and direction propagation in and between >>>>>> PangoLayouts >>>>>> http://bugzilla.gnome.org/show_bug.cgi?id=345386 (opened by yourself) >>>>>> https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679 >>>>>> >>>>>> >>>>>> Bug 481210 - [All lang] [firefox] - Face of the number is changing >>>>>> when enter number + Char, in any Locale >>>>>> http://bugzilla.gnome.org/show_bug.cgi?id=481210 >>>>>> >>>>>> >>>>>> Bug 481188 - ascii text space too narrow for Chinese encodings >>>>>> http://bugzilla.gnome.org/show_bug.cgi?id=481188 >>>>>> >>>>>> >>>>>> Bugzilla Bug 129541: changes font when typing different scripts on the >>>>>> same line >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=129541 >>>>>> >>>>>> >>>>>> Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=131218 >>>>>> >>>>>> >>>>>> Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox >>>>>> give bad eol rendering and cursor placement >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens >>>>>> Petersen) >>>>>> >>>>>> >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link) >>>>>> >>>>>> >>>>>> Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is >>>>>> changing when enter number + Char, in any Locale >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=228804 >>>>>> >>>>>> >>>>>> Bugzilla Bug 221361: [pango] ascii text space and punctuation is >>>>>> narrow for CJK >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=221361 >>>>>> >>>>>> >>>>>> Bug 379125 - chinese punctuations after english letters are wrongly >>>>>> displayed >>>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=379125 >>>>>> https://bugzilla.mozilla.org/attachment.cgi?id=263185 >>>>>> =============================================================== >>>>>> >>>>>> >>>>> >>> -- >>> behdad >>> http://behdad.org/ >>> >>> ...very few phenomena can pull someone out of Deep Hack Mode, with two >>> noted exceptions: being struck by lightning, or worse, your *computer* >>> being struck by lightning. -- Matt Welsh >>> >>> _______________________________________________ >>> gtk-i18n-list mailing list >>> gtk-i18n-list at gnome.org >>> http://mail.gnome.org/mailman/listinfo/gtk-i18n-list >>> >>> >> >> From behdad at behdad.org Thu Dec 20 22:25:42 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Thu, 20 Dec 2007 17:25:42 -0500 Subject: [gtk-i18n-list] Re: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <20071221012443.314dcf85.mpsuzuki@hiroshima-u.ac.jp> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> <1197847321.797.68.camel@behdad.behdad.org> <476A38AE.6020209@gmx.net> <1198154930.9724.8.camel@behdad.behdad.org> <20071220230429.35dc5f26.mpsuzuki@hiroshima-u.ac.jp> <1198163285.9724.17.camel@behdad.behdad.org> <20071221011708.653f440a.mpsuzuki@hiroshima-u.ac.jp> <20071221012443.314dcf85.mpsuzuki@hiroshima-u.ac.jp> Message-ID: <1198189542.24702.55.camel@behdad.behdad.org> On Fri, 2007-12-21 at 01:24 +0900, mpsuzuki at hiroshima-u.ac.jp wrote: > Sorry, I slipped to attach the picture, here it is. > > On Fri, 21 Dec 2007 01:17:08 +0900 > mpsuzuki at hiroshima-u.ac.jp wrote: > > >On Thu, 20 Dec 2007 10:08:05 -0500 > >Behdad Esfahbod wrote: > > > >>On Thu, 2007-12-20 at 23:04 +0900, mpsuzuki at hiroshima-u.ac.jp wrote: > >>> On Thu, 20 Dec 2007 07:48:50 -0500 > >>> Behdad Esfahbod wrote: > >>> >Setting locale is actually enough. If that's not desired, > >>> >$PANGO_LANGUAGE can be set as a fallback. So far seems like most of the > >>> >issues happen because either the users are not setting locale correctly > >>> >or are using crappy fonts. How do I don't care enough about those cases > >>> >I'm not surprised. > >>> > >>> Excuse me, PANGO_LANGUAGE is the solution to modify the > >>> Pango's behaviour that Qianqian & Abel ask for fix? > >> > >>It's a way to tell Pango which of the CJK languages to prefer. It's > >>main use is when running under non-CJK locale (en_US for example) and > >>the text doesn't have language tags. It solves most of the "multiple > >>fonts used in the same line" issues with CJK characters. > > > >Excuse me again, please let me know more detail. > >I attached a picture to describe the behaviour I want to fix. Thanks for raising a concrete issue. > >The picture (1), (2), (3) are screenshots under English. > > > >If I execute gedit as > > $ env LANG=C PANGO_LANGUAGE=en gedit > >font is not changed during I type "[" then "a". > > > >The picture (1'), (2'), (3'), (4') are screenshots under Japanese. > > > >If I execute gedit as > > $ env LANG=ja_JP.euc-jp PANGO_LANGUAGE=ja gedit As long as your LANG and PANGO_LANGUAGE are the same, you don't need both. PANGO_LANGUAGE is mostly useful when you set LANG to en. That's not relevant to your issue here though. > >and I type "[" then "a" then "?". The font to display > >"[" is dynamically changed as (2'), (3'), (4') during > >typing keys. The dynamically font switching shifts the > >baseline up and down, it looks as strange zig-zag behaviour. > >I could not stop this switching by setting PANGO_LANGUAGE=en > >nor PANGO_LANGUAGE=ja. How can I stop this switching? I tell you what's happening, you tell me what Pango is doing wrong and how you think it can be fixed: - In image 2', you are running under Japanese locale, you type a COMMON character ('[') only, Pango assumes you are going to type Japanese text, your preferred Japanese font has a glyph for '[', so Pango uses it, hoping that it will use the same font when you enter Japanese text. - In image 3', you entered a Latin letter, not Japanese (an unexpected event given that you run under Japanese locale), so Pango now associates the bracket to the Latin text, because, well, that's the only non-COMMON script there. You sure have a bracket and Latin text in it. So it renders the bracket using the same font that it uses for the Latin text. - In image 4', you add a Japanese character. No surprises here: you have two fonts, the line takes the height of the taller font. So the Latin text is shifted down a bit. So, the issue comes down to the fact that: - It's unexpected to enter Latin under Japanese locale. - You have a COMMON character at the beginning of the line. - Your Japanese and Latin fonts have different heights. And this case is rare enough that I normally don't consider it an issue at all. But apparently multiplying that by 1 billion makes it quite visible! One way one may suggest is that Pango should reserve a minimum line height that is enough to fit the default Japanese font, because it's running under Japanese locale after all. That would fix the jump from 3' to 4', but makes English-only paragraphs look very ugly and badly spaced vertically, so that's not an option either. The jump from 2' to 3' can't be fixed. I already proved that. If one fixes it, it would introduce the bug that '[' followed by a Japanese character will choose a separate fonts for those chars, OR, that font used for '[' will change when you type a Japanese char. It's as simple as this: Pango can't know what you are going to type next. It can just guess, and it's guessing pretty good. It's just not reading your mind yet :). I have two suggestions for what you can do that may achieve better results for you. - Run under LC_LANG=en_US LC_MESSAGES=ja_JA - Choose a non-generic font family in gedit. That is, something other than Sans, Sans-serif, and Monospace. > >Regards, > >mpsuzuki Regards, -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From nicolas.mailhot at laposte.net Thu Dec 20 22:37:27 2007 From: nicolas.mailhot at laposte.net (Nicolas Mailhot) Date: Thu, 20 Dec 2007 23:37:27 +0100 Subject: [gtk-i18n-list] Re: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <1198189542.24702.55.camel@behdad.behdad.org> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> <1197847321.797.68.camel@behdad.behdad.org> <476A38AE.6020209@gmx.net> <1198154930.9724.8.camel@behdad.behdad.org> <20071220230429.35dc5f26.mpsuzuki@hiroshima-u.ac.jp> <1198163285.9724.17.camel@behdad.behdad.org> <20071221011708.653f440a.mpsuzuki@hiroshima-u.ac.jp> <20071221012443.314dcf85.mpsuzuki@hiroshima-u.ac.jp> <1198189542.24702.55.camel@behdad.behdad.org> Message-ID: <1198190247.17795.1.camel@rousalka.dyndns.org> Le jeudi 20 d?cembre 2007 ? 17:25 -0500, Behdad Esfahbod a ?crit : > I have two suggestions for what you can do that may achieve better > results for you. > > - Run under LC_LANG=en_US LC_MESSAGES=ja_JA > > - Choose a non-generic font family in gedit. That is, something other > than Sans, Sans-serif, and Monospace. 3. Have an IM/layout switcher that explicitely declares to apps and pango the language which is going to be typed. -- Nicolas Mailhot -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: Ceci est une partie de message num?riquement sign?e URL: From behdad at behdad.org Thu Dec 20 22:42:07 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Thu, 20 Dec 2007 17:42:07 -0500 Subject: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <476AD0E0.5020106@gmail.com> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> <1197847321.797.68.camel@behdad.behdad.org> <1198176910.9724.46.camel@behdad.behdad.org> <476AD0E0.5020106@gmail.com> Message-ID: <1198190527.24702.71.camel@behdad.behdad.org> On Thu, 2007-12-20 at 15:30 -0500, Qianqian Fang wrote: > hi Badhdad > > I don't think the tone in your reply going to be helpful in any aspect > toward a solution of this problem. I try to respond decently. However, can't help when someone does not spend the same effort that I put in my replies, has not read my previous mails in the thread, and uses caps... Writing these replies takes time. No abundance of it here. And I get frustrated from saying the same thing again and again. > I hope you understand that people raise these issues for the goods of > pango. I don't agree. I'm very clearly saying: Pango doesn't want to fix this issue. Fix it somewhere else if you want the issue fixed. > They want to make it more powerful and logical under all possible > circumstances. I want to keep Pango as clean in design as it is. That means, no "if (CJK)". Pango is a true international text layout system. It's quite different from a MS Windows "Chinese Edition" or Adobe Photoshop "Asian Edition", etc. It is supposed to be able to render all languages and scripts, in the same process, in the same document. > Second, there must be a reason for this issue being raised again and again > in the past many years. Compared to other scripts: - No one has "fixed" it properly so far, so it keeps coming up. - Chinese people are a great majority. The only comparable majorities are: Latin/Cyrillic script users, Arabic script users, and Indic script users. Latin and Arabic pretty much work. Indic has lots of issues, and it comes up again and again and more than CJK, believe me. There used to be a time that Arabic was a disaster too. And people complained about it, a lot. But it's fixed now. Because there were people that fixed it all. Not by attacking the maintainer BTW. Not by taking it personally. > I think insufficient explanations and poor > guidance for > users toward a good solution play roles here (I am sorry to say that > your "proof" > in the last email still did not help because it was not what I was > asking for). I knew it doesn't help. Because it was an obvious fact for everyone thinking about it without prejudice. You asked that I say exactly why it's impossible, and I did. Now either read and try understanding that, or take my word when I say it's impossible. > As reading the replies in the past few days, I came to realize the key of > problem is to set up a "correct" fall-back path of the untagged > (or COMMON) text. Obviously, you are reluctant to explicitly > tag them as LATIN in pango. You may be right if differentiating > COMMON with LATIN is practically necessary (I mean "practically", not > semantically as in Unicode standard). You have your rationales > here. If I hardcode them to LATIN, I'm sure *you* come back and complain about it too. When you see in a monospace piece of text that you've got bitmap crisp glyphs for Chinese glyphs, but a wider, fuzzy glyph for your '['. > Unfortunately, the current fall-back mechanism eventually assign > the current locale info to these untagged text. No. Not current locale. Adjacent scripts. If there's none, then current locale. > And it turns out > that for some users (if not all), particularly for CJK users (where the > practical differences > between Latin/Common are not significant), it created unpleasant > formating results due to the mixing of fonts. Read above. It's going to create an unpleasant result in one case or the other. There's no magic bullet here. > So, it seems obvious that additional info is needed to assist the fall-back > of these untagged text to the preferred settings. This info can be > introduced > by the patched fontconfig, using block preference font list; or using > the current > keyboard layout as suggested by Sergey and Chris. Maybe a third > way is to create a LC variable, say LC_COMMON, independent > of LC_ALL/LANG, taking care of the untagged text formating. I actually > felt that this is probably more suitable than the other two approaches. > Because this is a locale-based preference, not font or keyboard preferences > (here is just my first thought on this, I may be wrong). I don't agree completely, but do note that none of the above involves Pango (at least initially). > In any case, I "think" I understand your argument, although there are still > details needs to be verified. But I think it will be useful if we focus on > clarifying a solution rather than arguing who is right and who is wrong. Except that I'm not interested in fixing it if it doesn't involve Pango. Regards, -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From behdad at behdad.org Thu Dec 20 22:54:49 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Thu, 20 Dec 2007 17:54:49 -0500 Subject: [gtk-i18n-list] Re: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <1198190247.17795.1.camel@rousalka.dyndns.org> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> <1197847321.797.68.camel@behdad.behdad.org> <476A38AE.6020209@gmx.net> <1198154930.9724.8.camel@behdad.behdad.org> <20071220230429.35dc5f26.mpsuzuki@hiroshima-u.ac.jp> <1198163285.9724.17.camel@behdad.behdad.org> <20071221011708.653f440a.mpsuzuki@hiroshima-u.ac.jp> <20071221012443.314dcf85.mpsuzuki@hiroshima-u.ac.jp> <1198189542.24702.55.camel@behdad.behdad.org> <1198190247.17795.1.camel@rousalka.dyndns.org> Message-ID: <1198191289.24702.81.camel@behdad.behdad.org> On Thu, 2007-12-20 at 23:37 +0100, Nicolas Mailhot wrote: > Le jeudi 20 d?cembre 2007 ? 17:25 -0500, Behdad Esfahbod a ?crit : > > > I have two suggestions for what you can do that may achieve better > > results for you. > > > > - Run under LC_LANG=en_US LC_MESSAGES=ja_JA > > > > - Choose a non-generic font family in gedit. That is, something other > > than Sans, Sans-serif, and Monospace. > > 3. Have an IM/layout switcher that explicitely declares to apps and > pango the language which is going to be typed. That may help when typing, but has the following problems: - Fonts change when you switch language. - To make it meaningful, your editor should store the language at the time of typing as a tag. Or it will lose it and void the advantage. - Doesn't help when copy/pasting or opening a document. What will be helpful is, if pango could query your session and see that you have American English and Chinese Chinese IM/layouts set, so automatically set PANGO_LANGUAGE to en_US:zh_CN. That is, respect your set languages, but not necessarily follow the currently-selected one. -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From behdad at behdad.org Thu Dec 20 23:40:29 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Thu, 20 Dec 2007 18:40:29 -0500 Subject: [gtk-i18n-list] Re: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <20071221082107.2f66f1c2.mpsuzuki@hiroshima-u.ac.jp> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> <1197847321.797.68.camel@behdad.behdad.org> <476A38AE.6020209@gmx.net> <1198154930.9724.8.camel@behdad.behdad.org> <20071220230429.35dc5f26.mpsuzuki@hiroshima-u.ac.jp> <1198163285.9724.17.camel@behdad.behdad.org> <20071221011708.653f440a.mpsuzuki@hiroshima-u.ac.jp> <20071221012443.314dcf85.mpsuzuki@hiroshima-u.ac.jp> <1198189542.24702.55.camel@behdad.behdad.org> <20071221082107.2f66f1c2.mpsuzuki@hiroshima-u.ac.jp> Message-ID: <1198194029.24702.91.camel@behdad.behdad.org> On Fri, 2007-12-21 at 08:21 +0900, mpsuzuki at hiroshima-u.ac.jp wrote: > Thank you very much! You are very welcome. [...] > > - In image 2', you are running under Japanese locale, you type a > >COMMON character ('[') only, Pango assumes you are going to type > >Japanese text, your preferred Japanese font has a glyph for '[', so > >Pango uses it, hoping that it will use the same font when you enter > >Japanese text. > > Oh! It corrects my misunderstanding. I was misunderstanding as > only '[' was given, the text was recognized as a Latin because > '[' is included in ASCII. Now I understand that ASCII numerical > digits are also COMMON character. Yes. > >So, the issue comes down to the fact that: > > > > - It's unexpected to enter Latin under Japanese locale. > > > > - You have a COMMON character at the beginning of the line. > > > > - Your Japanese and Latin fonts have different heights. > > I see. The first clause is quite important. I guess, inputting > Latin text under Arabic locale might be possible but irregular > (right guessing? please let me know), but an insertion of > Latin text (to be more correctly, I mean a string of ASCII > alphabets) under Japanese locale is popular, especially text > around information technology. For example, please check the > website of Japanese Standards Association http://www.jsa.or.jp/ Arabic is like Japanese in that regard, no difference. I actually see that coming, should have clarified. By unexpected, I mean it's not the most likely event. Japanese text coming is more expected. That said, we don't have that in issue as much in Arabic because it's considered bad writing to start an Arabic/Persian paragraph with an English word written in Latin. It also screws bidirectional code in Pango and you end up with a left-to-right paragraph (because that's what it looks like from your text), so people just avoid it. > >One way one may suggest is that Pango should reserve a minimum line > >height that is enough to fit the default Japanese font, because it's > >running under Japanese locale after all. That would fix the jump from > >3' to 4', but makes English-only paragraphs look very ugly and badly > >spaced vertically, so that's not an option either. > > > >The jump from 2' to 3' can't be fixed. I already proved that. If one > >fixes it, it would introduce the bug that '[' followed by a Japanese > >character will choose a separate fonts for those chars, OR, that font > >used for '[' will change when you type a Japanese char. > > Umm. Is it possible for Pango to bind COMMON characters to single > font? I understand the font switching in my example is caused by > the fact that the appropriate font to show COMMON character is > determined by its context. If the font to show COMMON character is > fixed to single font, my problem will be slightly better although > the line height shifting still occurs. But then when rendering a Japanese only text, all the punctuation marks will be rendered using a different font! Now imagine that in a monospace text, with bitmap Japanese font and non-bitmap punctuation font. > >It's as simple as this: Pango can't know what you are going to type next. > >It can just guess, and it's guessing pretty good. It's just not reading > >your mind yet :). > > Indeed. I wish anybody can implement it in Pango2 :-) That's already on my wishlist. I may as well open a bug for it. > >I have two suggestions for what you can do that may achieve better > >results for you. > > > > - Run under LC_LANG=en_US LC_MESSAGES=ja_JA > > > > - Choose a non-generic font family in gedit. That is, something other > >than Sans, Sans-serif, and Monospace. > > Oops, it's too application specific... No. Give it a try. It should have the effect you asked for. All punctuation should be chosen from the non-generic font you choose. I said do it in gedit just to test, otherwise it's nothing specific to gedit, that's how fontconfig works. Lets see: [behdad at behdad berlin-fest]$ fc-match 'sans:lang=en' --sort | head -4 DejaVuSans.ttf: "DejaVu Sans" "Book" DejaVuSans-ExtraLight.ttf: "DejaVu Sans" "ExtraLight" DejaVuSans-BoldOblique.ttf: "DejaVu Sans" "Bold Oblique" luxisr.ttf: "Luxi Sans" "Regular" [behdad at behdad berlin-fest]$ fc-match 'sans:lang=ja' --sort | head -4 sazanami-gothic.ttf: "Sazanami Gothic" "Regular" DejaVuSans.ttf: "DejaVu Sans" "Book" DejaVuSans-ExtraLight.ttf: "DejaVu Sans" "ExtraLight" DejaVuSans-BoldOblique.ttf: "DejaVu Sans" "Bold Oblique" [behdad at behdad berlin-fest]$ fc-match 'DejaVu Sans:lang=en' --sort | head -4 DejaVuSans.ttf: "DejaVu Sans" "Book" DejaVuSans-ExtraLight.ttf: "DejaVu Sans" "ExtraLight" DejaVuSans-BoldOblique.ttf: "DejaVu Sans" "Bold Oblique" luxisr.ttf: "Luxi Sans" "Regular" [behdad at behdad berlin-fest]$ fc-match 'DejaVu Sans:lang=ja' --sort | head -4 DejaVuSans.ttf: "DejaVu Sans" "Book" DejaVuSans-ExtraLight.ttf: "DejaVu Sans" "ExtraLight" DejaVuSans-BoldOblique.ttf: "DejaVu Sans" "Bold Oblique" sazanami-gothic.ttf: "Sazanami Gothic" "Regular" That is, if you ask for a non-generic font (DejaVu Sans here) for language Japanese, it first gives you DejaVu Sans (even if it doesn't cover Japanese), then the best Japanese font available. > Regards, > mpsuzuki -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From nicolas.mailhot at laposte.net Fri Dec 21 08:50:12 2007 From: nicolas.mailhot at laposte.net (Nicolas Mailhot) Date: Fri, 21 Dec 2007 09:50:12 +0100 Subject: [gtk-i18n-list] Re: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <1198191289.24702.81.camel@behdad.behdad.org> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> <1197847321.797.68.camel@behdad.behdad.org> <476A38AE.6020209@gmx.net> <1198154930.9724.8.camel@behdad.behdad.org> <20071220230429.35dc5f26.mpsuzuki@hiroshima-u.ac.jp> <1198163285.9724.17.camel@behdad.behdad.org> <20071221011708.653f440a.mpsuzuki@hiroshima-u.ac.jp> <20071221012443.314dcf85.mpsuzuki@hiroshima-u.ac.jp> <1198189542.24702.55.camel@behdad.behdad.org> <1198190247.17795.1.camel@rousalka.dyndns.org> <1198191289.24702.81.camel@behdad.behdad.org> Message-ID: <1198227012.3126.25.camel@rousalka.dyndns.org> Le jeudi 20 d?cembre 2007 ? 17:54 -0500, Behdad Esfahbod a ?crit : > That may help when typing, but has the following problems: > > - Fonts change when you switch language. Fonts will change anyway (indeed the alpha and omega of current complains is they change but people disagree with the heuristics), and it's better to let users in control when we can not guess properly in a large number of cases > - To make it meaningful, your editor should store the language at the > time of typing as a tag. Or it will lose it and void the advantage. Understood. I won't happen overnight. Nevertheless if can still happen faster than finding the perfect crystal ball. > - Doesn't help when copy/pasting or opening a document. Cut & paste can probably be solved with a "tagged text" media type. Opening a document will never work for document types that do not store language info. But if the problem can be reduced to this perimeter we'll have made a huge leap forward. > What will be helpful is, if pango could query your session and see that > you have American English and Chinese Chinese IM/layouts set, so > automatically set PANGO_LANGUAGE to en_US:zh_CN. That is, respect your > set languages, but not necessarily follow the currently-selected one. To me that is an if(CJK) solution. That is to say it sort of solves the problem of one group of users without being generalisable to other groups of users. It assumes you can deduce language from configured IMs, when those can overlap, when many languages can and are commonly typed through IMs primarily designed for another language, etc The breakage when the wrong language is detected is far more widespread than just chinese, even if the effects are often more subtle. You need good language detection to autoselect the right spellchecker, to tell office suite what language should tag a run of text, to select the right locl font alternative, etc including when users type several languages sharing the same unicode blocks. You'll never autodetect those through locales, IMs, or codepoints used. German people write English. Balkan people write Russian. They still use their primary IM for this since it gives them access to the codepoints needed without needing to learn another layout. -- Nicolas Mailhot -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: Ceci est une partie de message num?riquement sign?e URL: From behdad at behdad.org Fri Dec 21 16:02:01 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Fri, 21 Dec 2007 11:02:01 -0500 Subject: [gtk-i18n-list] Re: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <20071221091241.35b2d604.mpsuzuki@hiroshima-u.ac.jp> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> <1197847321.797.68.camel@behdad.behdad.org> <476A38AE.6020209@gmx.net> <1198154930.9724.8.camel@behdad.behdad.org> <20071220230429.35dc5f26.mpsuzuki@hiroshima-u.ac.jp> <1198163285.9724.17.camel@behdad.behdad.org> <20071221011708.653f440a.mpsuzuki@hiroshima-u.ac.jp> <20071221012443.314dcf85.mpsuzuki@hiroshima-u.ac.jp> <1198189542.24702.55.camel@behdad.behdad.org> <20071221082107.2f66f1c2.mpsuzuki@hiroshima-u.ac.jp> <1198194029.24702.91.camel@behdad.behdad.org> <20071221091241.35b2d604.mpsuzuki@hiroshima-u.ac.jp> Message-ID: <1198252921.24702.118.camel@behdad.behdad.org> On Fri, 2007-12-21 at 09:12 +0900, mpsuzuki at hiroshima-u.ac.jp wrote: > >Arabic is like Japanese in that regard, no difference. I actually see > >that coming, should have clarified. By unexpected, I mean it's not the > >most likely event. Japanese text coming is more expected. > > > >That said, we don't have that in issue as much in Arabic because it's > >considered bad writing to start an Arabic/Persian paragraph with an > >English word written in Latin. It also screws bidirectional code in > >Pango and you end up with a left-to-right paragraph (because that's what > >it looks like from your text), so people just avoid it. > > I see. Hearing "people just avoid it" is quite interesting. As I said, it's not just technical. It's bad style to start a Persian/Arabic paragraph with a Latin word. > >But then when rendering a Japanese only text, all the punctuation marks > >will be rendered using a different font! Now imagine that in a > >monospace text, with bitmap Japanese font and non-bitmap punctuation > >font. > > Yes. Do you think it's worse than contextual font switching? I think rendering single-script text correctly is more important, yes. If you have plain text in one script, it should only use the preferred font of that script. Can't compromise here. > I don't think so. But it's because my fonts have varied/inconsistent > baselines and heights (and their inconsistency makes the contextual > font switching quite ugly), so my disagree is not so strong at present. > > Anyway, your mention on bidi reminded me that binding a fixed font > to COMMON characters may confuse bidi glyph shaping of punctuation. > If so, it would be problematic and binding should be disabled even > if it's possible. Oops. No, bidi reordering is done independent of font selection. Those are completely separate processes. > >> >I have two suggestions for what you can do that may achieve better > >> >results for you. > >> > > >> > - Run under LC_LANG=en_US LC_MESSAGES=ja_JA > >> > > >> > - Choose a non-generic font family in gedit. That is, something other > >> >than Sans, Sans-serif, and Monospace. > >> > >> Oops, it's too application specific... > > > >No. Give it a try. It should have the effect you asked for. All > >punctuation should be chosen from the non-generic font you choose. I > >said do it in gedit just to test, otherwise it's nothing specific to > >gedit, that's how fontconfig works. > > OK, I will try to setup ~/.fonts.conf. I don't think that would do it. Just set it in gnome-font-properties. > It seems that my > request (binding a same font to COMMON character, at > least in Latin & CJK context) can be realized by it Not exactly. Hardcoding a font in your fontconfig config to always return a certain font as the first font is not a good idea, and is actually what started this thread at the beginning. > - so it's off-topic to this list? Should I move to fontconfig? I don't think forcing to use the same font for COMMON characters is really a solution. The simplest solution for the case you showed is to use a font that has both Japanese and Latin glyphs (plus all the punctuation). Again, what started this thread was that the CJK font had Latin glyphs, but crappy ones. > Anyway, thank you for enlightening me. > > Regards, > mpsuzuki -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh From behdad at behdad.org Wed Dec 26 19:01:10 2007 From: behdad at behdad.org (Behdad Esfahbod) Date: Wed, 26 Dec 2007 14:01:10 -0500 Subject: [gtk-i18n-list] Re: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings]) In-Reply-To: <1198227012.3126.25.camel@rousalka.dyndns.org> References: <26937.192.54.193.53.1196761288.squirrel@rousalka.dyndns.org> <475582F4.9070608@gmail.com> <1197013261.27642.63.camel@behdad.behdad.org> <1197494718.1444.174.camel@behdad.behdad.org> <47616849.7090905@gmail.com> <1197847321.797.68.camel@behdad.behdad.org> <476A38AE.6020209@gmx.net> <1198154930.9724.8.camel@behdad.behdad.org> <20071220230429.35dc5f26.mpsuzuki@hiroshima-u.ac.jp> <1198163285.9724.17.camel@behdad.behdad.org> <20071221011708.653f440a.mpsuzuki@hiroshima-u.ac.jp> <20071221012443.314dcf85.mpsuzuki@hiroshima-u.ac.jp> <1198189542.24702.55.camel@behdad.behdad.org> <1198190247.17795.1.camel@rousalka.dyndns.org> <1198191289.24702.81.camel@behdad.behdad.org> <1198227012.3126.25.camel@rousalka.dyndns.org> Message-ID: <1198695670.31755.27.camel@behdad.behdad.org> On Fri, 2007-12-21 at 09:50 +0100, Nicolas Mailhot wrote: > Le jeudi 20 d?cembre 2007 ? 17:54 -0500, Behdad Esfahbod a ?crit : > > > That may help when typing, but has the following problems: > > > > - Fonts change when you switch language. > > Fonts will change anyway (indeed the alpha and omega of current > complains is they change but people disagree with the heuristics), and > it's better to let users in control when we can not guess properly in a > large number of cases Guess I wasn't clear. Fonts change on the current line as you switch keyboard/IM. So, you rotate through Chinese and English locales and fonts keep changing with your change. Unless you add markup to keep state... > > - To make it meaningful, your editor should store the language at the > > time of typing as a tag. Or it will lose it and void the advantage. > > Understood. I won't happen overnight. Nevertheless if can still happen > faster than finding the perfect crystal ball. > > > - Doesn't help when copy/pasting or opening a document. > > Cut & paste can probably be solved with a "tagged text" media type. > Opening a document will never work for document types that do not store > language info. But if the problem can be reduced to this perimeter we'll > have made a huge leap forward. > > > What will be helpful is, if pango could query your session and see that > > you have American English and Chinese Chinese IM/layouts set, so > > automatically set PANGO_LANGUAGE to en_US:zh_CN. That is, respect your > > set languages, but not necessarily follow the currently-selected one. > > To me that is an if(CJK) solution. That is to say it sort of solves the > problem of one group of users without being generalisable to other > groups of users. It assumes you can deduce language from configured IMs, > when those can overlap, when many languages can and are commonly typed > through IMs primarily designed for another language, etc Again, guess I wasn't clear. I'm saying that if and when the desktop has the language information (as opposed to just keyboard layout information), Pango should use the list of all languages in that way. This helps for example preferring Persian fonts over Arabic fonts. > The breakage when the wrong language is detected is far more widespread > than just chinese, even if the effects are often more subtle. You need > good language detection to autoselect the right spellchecker, to tell > office suite what language should tag a run of text, to select the right > locl font alternative, etc including when users type several languages > sharing the same unicode blocks. You'll never autodetect those through > locales, IMs, or codepoints used. German people write English. Balkan > people write Russian. They still use their primary IM for this since it > gives them access to the codepoints needed without needing to learn > another layout. > -- behdad http://behdad.org/ ...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning. -- Matt Welsh