[zanata-bugs] [Bug 730189] Overhaul translation memory similarity algorithm

Tue Aug 16 01:23:11 UTC 2011

Please do not reply directly to this email. All additional
comments should be made in the comments box of this bug.

https://bugzilla.redhat.com/show_bug.cgi?id=730189

Hedda Peters <hpeters at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hpeters at redhat.com

--- Comment #5 from Hedda Peters <hpeters at redhat.com> 2011-08-15 21:23:10 EDT ---
(In reply to comment #0)

> b. If trying to match a short string, will a much larger string which contains
> the target string receive a suitably high score?  And if so, should we
> artificially reduce it from 100%? [1]
> c. If two strings both contain exact substring matches for a target string, how
> can we ensure that the shorter string receives a higher similarity score?

Screenshot attached to demonstrate the current behaviour, two rather short
strings, but only one of them should be 100% match.

> d. Is it feasible to highlight the matching trigrams?

I'd like to make this a feature request, albeit not urgent. It is extremely
helpful to have the matching parts highlighted. Example use case: An long entry
has been changed by the writer slightly since the last version. If the matching
parts of the message are highlighted it is easy to spot the *one* word that has
changed, rather than having to compare the whole message carefully. The
translation memory in Lokalize implements this exact feature and more. Happy to
demonstrate.

-- 
Configure bugmail: https://bugzilla.redhat.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.