Replies: 1 comment
-
|
On a side note: authors say that next week they'll update the website (code snippets and leaderboard) to reflect this definition of fairness. Hopefully we'll get ready-to-use code for computing metrics that also takes into account this bucketing |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
How to evaluate fairness? (from the challenge authors)
How do we define ‘fairness’ in this challenge? We don’t want to penalize users for being less popular on the platform. The quality of the recommendations should be independent from the popularity of the authors. How do we calculate fairness then? We divide authors into their popularity quantile (defined as num followers), and we ask people to balance the metrics across each bin.
Note: we are not introducing a fairness metric, we are just dividing existing metrics into smaller buckets.
Authors in the test set are divided in quantiles, based on their popularity (here measured by their number of followers).
For each quantile, rce and average precision are computed (as shown in the website), then average.
Why is that different from doing a global metric? While the number of users in each quantile is the same (by definition), the number of engagements by those authors in the dataset is very different. Put in another way: we want predictions to be equally "good" for everyone, not only for popular users - who weight more in the dataset.
See:
Beta Was this translation helpful? Give feedback.
All reactions