Skip to content

How Scoring Works in Quepid

Eric Pugh edited this page Feb 17, 2020 · 7 revisions

To measure how good your search quality is, we need a Evaluation Measure, which in Quepid parlance is a Scorer. Today Quepid only supports a single scorer (in contrast to tools like RRE).

NDCG

The default scorer is a NDCG@10 with a rating scale of 1 to 4. The way to think about these ratings when you are using them is:

  1. A One is Poor, and it's a document that makes you actively upset with your search engine! It's a BAD result.
  2. A Two is Fair, and it represents a irrelevant document. You understand why it matched, but it's not relevant.
  3. A Three is Good, and it is a relevant document. The search engine made sense on why it returned the document.
  4. A Four is Perfect, and this is a perfect match. There isn't any ambiguity about why that document matched.

It's okay for there to NOT be any fours, especially in a query that is relatively exploratory in nature. For example, if you searched for "best movie", well it would be hard, unless the engine knows your favorite movie to return documents that would be rated a 4. However, if you type in "Star Wars" and get back "Star Wars a New Hope", well that looks like a Four. But returning "Star Wars: The Phantom Menace", well that is probably a Three. It's relevant, but it wasn't exactly what I wanted!

Some other notes about NDCG.

Some other