[publican-list] sortable lists, esp. glossaries

Peter Moulder peter.moulder at monash.edu
Tue Jan 31 07:45:12 UTC 2012


In two messages around Jan 30, 2012, Jeff Fearn wrote:

> Collating the three kana scripts of Japanese properly is the Mt
> Everest of this challenge.
>
> [...]
> 
> Collating each of them separately is easy, but it's perfectly valid
> in Japanese to mix them so you have to be able to collate all of
> them together. AFAIK no one has done that in any open source
> project.

Apparently, the mapping from a string of Kanji to its pronunciation
(ordering) isn't even a deterministic operation, at least for proper
names.

(The example I came across is that the woman's name 角田 純子 has at
least four possible readings of the family name times two possible
readings of the given name.)

Thus, the solution would have to involve supplying pronunciations somehow
for at least some glossary entries.

Once pronunciations (in Katakana or Hiragana) are available for all the
glossary entries, the Lingua::JA::Sort::JIS perl module can be used to do
the JIS X 4061:1996 collation among them.

Really, the problem would benefit from Japanese input on how the problem
is usually solved.  The Japanese translators might be able to help there,
at least as to how they supply pronunciations to other computer software
that needs to know sorting order.


(Btw, if anyone was going to try looking up JIS X 4061:1996, then
 unfortunately it looks like it's only available for a fee and in
 Japanese:

   http://www.webstore.jsa.or.jp/webstore/Com/FlowControl.jsp?lang=en&bunsyoId=JIS+X+4061%3A1996&dantaiCd=JIS&status=1&pageNo=6

 However, I'm told that the Japanese wikipedia article

   http://ja.wikipedia.org/wiki/日本語文字列照合順番

 has an overview.  The google translation of that page is challenging to
 read, though:

   http://translate.google.com/translate?sl=ja&tl=en&u=http%3A%2F%2Fja.wikipedia.org%2Fwiki%2F%E6%97%A5%E6%9C%AC%E8%AA%9E%E6%96%87%E5%AD%97%E5%88%97%E7%85%A7%E5%90%88%E9%A0%86%E7%95%AA

 .)


pjrm.




More information about the publican-list mailing list