[libvirt] [PATCH 2/3] Improve tokenizing of linkable terms

Philipp Hahn hahn at univention.de
Fri Aug 12 06:03:37 UTC 2011



Am Donnerstag 11 August 2011 21:45:18 schrieb Eric Blake:
> On 08/11/2011 06:44 AM, Philipp Hahn wrote:
> > Currently only tabs and blanks are used for tokenizing the description,
> > which breaks when a term is at the end of a line or has () appended to
> > it.
> > 1. Use also other white space characters such as new-lines and carriage
> >     return for splitting.
> > 2. Remove some common non-word characters from the token before lookup.
> >
> > Signed-off-by: Philipp Hahn<hahn at univention.de>
> > ---
> >   docs/newapi.xsl |    9 ++++++---
> >   1 files changed, 6 insertions(+), 3 deletions(-)
>
> I am not fluent in reading xsl files.  How would I go about testing if
> the output is more visually appealing?  I can ack on the basis of
> comparison on before vs. after appearance, but only if I figure out
> which files are affected to compare a view in my browser of the
> generated html.

Go to <http://libvirt.org/html/libvirt-libvirt.html> at search for "()" or 
strings starting with "VIR_": You'll notice lots of them, which aren't links, 
but many others are. This hapens because only space characters were used to 
separate word, which then looked for "virDomainGetVcpus()" instead of 
just "virDomainGetVcpus". The other case was the keywords were at the end on 
line, where the "\n" wasn't used for word breaking, so the search went for 
"VIR_DOMAIN_MEMORY_HARD_LIMIT\nsomething" instead of 
just "VIR_DOMAIN_MEMORY_HARD_LIMIT".

It's still not perfect, because only "_function_()" is clickable and 
not "function()", but at least it is. It would have been easier with RegExps 
<http://www.exslt.org/regexp/> for stemming, but that isn't supported by 
libxslt.

Sincerely
Philipp Hahn
-- 
Philipp Hahn           Open Source Software Engineer      hahn at univention.de
Univention GmbH        Linux for Your Business        fon: +49 421 22 232- 0
Mary-Somerville-Str.1  D-28359 Bremen                 fax: +49 421 22 232-99
                                                   http://www.univention.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20110812/5779460a/attachment-0001.sig>


More information about the libvir-list mailing list