Bugzilla dupes attack
Paul A Houle
ph18 at cornell.edu
Wed Feb 15 15:09:15 UTC 2006
At 04:30 PM 2/14/2006, Jeff Spaleta wrote:
>I'm suggesting that there is very little to be learned from the
>specific comparison to google. I'm saying that since we are incapable
>of examining the details of how google search works, there is very
>little to be gleamed from looking at example output from google at
>all. The magic in the google search is the search algorithm which
>produces the results. And its exactly that piece of magic which we
>don't have access to to examine and reuse. Are you really suggesting
>that we blindly reverse-engineer the google search algorithm and apply
>it to bugzilla?
The magic of Google isn't in the ranking algorithm, it's in the
kind and quantity of data that it searches over and the expectations people
have of it.
The problems of information retrieval depend on the scale of your
database. Historically, people have evaluated IR systems based on two
things: precision and recall.
If you've got a database with 10,000 items, and there is 1 item
that matches, there's a lot of risk that that 1 item will be lost if
someone doesn't type in the perfect search term. Recall is the issue, so
it's important to stem words (working -> work), have a system that's smart
about synonyms, etc.
Now, if you're searching a database with 10 billion items, there
will be 1 million hits for a 1:10000 item. The issue is picking out the
best items out of that million items, so there's more stress on precise
phrase matching, things like pagerank. Antispam measures are
essential, and so is the removal of duplicate documents.
Google's trying to do something entirely different from what
bugzilla search is trying to do or, say, beagle should do on your desktop.
More information about the fedora-devel-list
mailing list