-
Notifications
You must be signed in to change notification settings - Fork 103
How Scoring Works in Quepid
To measure how good your search quality is, we need a Evaluation Measure, which in Quepid parlance is a Scorer. Today Quepid only supports a single scorer (in contrast to tools like RRE).
The default scorer is a NDCG@10 with a rating scale of 1 to 4. The way to think about these ratings when you are using them is:
- A One is Poor, and it's a document that makes you actively upset with your search engine! It's a BAD result.
- A Two is Fair, and it represents an irrelevant document. You understand why it matched, but it's not relevant.
- A Three is Good, and it is a relevant document. The search engine made sense on why it returned the document.
- A Four is Perfect, and this is a perfect match. There isn't any ambiguity about why that document matched.
It's okay for there to NOT be any fours, especially in a query that is relatively exploratory in nature. For example, if you searched for "best movie", well it would be hard, unless the engine knows your favorite movie to return documents that would be rated a 4. However, if you type in "Star Wars" and get back "Star Wars a New Hope", well that looks like a Four. But returning "Star Wars: The Phantom Menace", well that is probably a Three. It's relevant, but it wasn't exactly what I wanted!
When scoring your documents, you may see that you score 100, yet you've given your documents very low scores.
One weird thing about NDCG is that it wants you to sort the documents in perfect descending order. Ie 4,3,2,1 and 3,3,1 and 1,1,1 all score the same. This means that having all low ranked documents that are in order still is 100. So, if you score ten documents as 1,4,1,1,1,1,1,1,1,1, you get a 72. Tweak your algorithem to move that 4 in position 2 to position 1 so its sorted as 4,1,1,1,1,1,1,1,1,1 and boom, you are back to 100!
Another issue is that the default NDCG@10 scorer only looks at the documents that are returned by the search engine. This is sometimes called the "NDCG Local" variant. If you know that there are other highly relevant documents that you've scored, using the Explain Other feature to find and score them, then you would think they would contribute to the score? Imagine if your search engine returns a single document rated a 1, then you currently get 100 as the score.
If you then went and used Explain Other to find some other docs that you score as a 1,4,4, then you would think NDCG would look at the score as 1,1,4,4, and give you a very low score for not having 4's higher! That would be the "NDCG Global" variant, which is currently not supported. This is tracked in https://github.com/o19s/quepid/issues/78.