[zanata/zanata-server] 882e30: Revert "Change token regex for similarity comparis...

GitHub noreply at github.com
Thu Jun 11 00:35:03 UTC 2015


  Branch: refs/heads/new-tm-endpoints-rhbz1209669
  Home:   https://github.com/zanata/zanata-server
  Commit: 882e3013b79551c7fcef2cc09b8e65f5e76aa51f
      https://github.com/zanata/zanata-server/commit/882e3013b79551c7fcef2cc09b8e65f5e76aa51f
  Author: David Mason <drdmason at gmail.com>
  Date:   2015-06-11 (Thu, 11 Jun 2015)

  Changed paths:
    M zanata-war/src/main/java/org/zanata/search/LevenshteinTokenUtil.java

  Log Message:
  -----------
  Revert "Change token regex for similarity comparison to pick up more cases."

This reverts commit db71115b4b8aba64bcc859f010f4cc8675eff3e2.

More test-cases are required to ensure this regex is correct. Rolling back for
now so that this change does not go into the next release without thorough
testing.

The main problem with the new regex is that it would break on '.' in constructions
such as URLs, which does not appear to be desired behaviour.


  Commit: 81f4476e1c4fd321d0c73f6b79851d20b520f649
      https://github.com/zanata/zanata-server/commit/81f4476e1c4fd321d0c73f6b79851d20b520f649
  Author: David Mason <drdmason at gmail.com>
  Date:   2015-06-11 (Thu, 11 Jun 2015)

  Changed paths:
    M zanata-war/src/main/java/org/zanata/search/LevenshteinTokenUtil.java
    M zanata-war/src/test/java/org/zanata/search/LevenshteinTokenUtilTest.java

  Log Message:
  -----------
  Remove fallback to stop-words comparison for similarity score calculation.

This change is contentious, so I am removing it until an amicable resoultion
to the discussion is reached. Similarity scores need more user testing to
determine which algorithm will b emost useful to translators.


Compare: https://github.com/zanata/zanata-server/compare/70cbb12a6cb5...81f4476e1c4f


More information about the zanata-commits mailing list