FairDiverse is a toolkit for reproducing and developing fairness- and diversity-aware Information Retrieval (IR) tasks.
We welcome the contributors to join our toolkit implementation! Any information, please contact:
python>=3.7.0
numpy>=1.20.3
torch>=1.11.0
PyYAML>=6.0
pandas>=1.2.0
scipy>=1.15.1
cvxpy>=1.6.0
tqdm>=4.65.0
scikit_learn>=1.3.0
pot
mip>=1.15.0
gurobipy>=12.0.1
Require Linux system
backoff==2.2.1
json_repair==0.35.0
networkx==3.1
openai==1.61.1
Requests==2.32.3
transformers==4.32.1
urllib3==1.26.16
vllm>=0.6.0
To install vllm, please refer to vllm official document.
Then, you need to download your llms (huggingface) to any path and write the path into ~/recommendation/properties/LLMs.yaml:
llm_path_dict:
{
'Llama3-8B': "",
'Qwen2-7B': "",
'Mistral-7B': "",
'ChatGLM-9B': "",
'bert': "",
'gpt2': "",
}
Require Gurobi license
mip>=1.15.0
gurobipy>=12.0.1
Install R and the following python packages:
rpy2==3.5.17
With the source code, you can start three steps:
-
Download the datasets and check the default parameters of the four stages of pipelines (we provide a toy dataset steam already).
-
Set your custom configuration file to execute the pipeline (we already provide a template file).
-
Run the shell command, with the task, stage, dataset, and your custom configuration file specifying (you can directly run the command).
For in-processing methods, please run
python main.py --task recommendation --stage in-processing --dataset steam --train_config_file In-processing.yaml
Or you can create a new test.py to test:
from recommendation.trainer import RecTrainer
config = {'model': 'BPR', 'data_type': 'pair', 'fair-rank': True, 'rank_model': 'APR', 'use_llm': False, 'log_name': "test", 'dataset': 'steam'}
trainer = RecTrainer(train_config=config)
trainer.train()
For post-processing methods, please run
python main.py --task recommendation --stage post-processing --dataset steam --train_config_file Post-processing.yaml
from recommendation.reranker import RecReRanker
config = {'ranking_store_path': 'steam-base-mf', 'model': 'CPFair', 'fair-rank': True, 'log_name': 'test',
'fairness_metrics': ["MMF", "GINI"], 'dataset': 'steam'}
reranker = RecReRanker(train_config=config)
reranker.rerank()
For the pre-processing methods, you can begin with:
python main.py --task search --stage pre-processing --dataset compas --train_config_file train_preprocessing.yaml
You can set "preprocessing_model" to one of the supported methods [LFR, iFair, gFair, CIFRank].
from search.trainer_preprocessing_ranker import RankerTrainer
config = { "train_ranker_config": {"preprocessing_model": "iFair", "name": "Ranklib", "ranker": "RankNet", "lr": 0.0001, "epochs": 10}}
reranker = RankerTrainer(train_config=config)
reranker.train()
For the post-processing methods, you can begin with:
python main.py --task search --stage post-processing --dataset clueweb09 --train_config_file Post-processing.yaml
You can set "postprocessing_model" to one of the supported methods [DESA, DALETOR, LLM, xQuAD, PM2].
from search.trainer import SRDTrainer
config = {'model': 'xquad', 'dataset': 'clueweb09', 'task': 'search', 'mode': 'train', "log_name": "test", "model_save_dir": "model/", "tmp_dir": "tmp/", "best_model_list": [], "device": "cpu"}
trainer = SRDTrainer(train_config=config)
trainer.train()
For the recommendation dataset, we utilize the dataset format in Recbole Datasets.
For the search dataset, we utilize the ClueWeb dataset, and the COMPAS dataset.
| Types | Models | Descriptions |
|---|---|---|
| Non-LLMs | DMF | optimizes the matrix factorization with the deep neural networks. |
| Non-LLMs | BPR | optimizes pairwise ranking via implicit feedback. |
| Non-LLMs | GRU4Rec | employs gated recurrent units (GRUs) for session-based recommendations. |
| Non-LLMs | SASRec | leverages self-attention mechanisms to model sequential user behavior. |
| LLMs | LLama3 | utilizing rank-specific prompts to conduct ranking tasks under LLMs |
| LLMs | Qwen2 | utilizing rank-specific prompts to conduct ranking tasks under LLMs |
| LLMs | Mistral | utilizing rank-specific prompts to conduct ranking tasks under LLMs |
| Types | Models | Descriptions |
|---|---|---|
| Re-weight | APR | an adaptive reweighing method that dynamically prioritizes samples near the decision boundary to mitigate distribution shifts. |
| Re-weight | FairDual | applies dual-mirror gradient descent to dynamically compute the weight for each sample to support the worst-off groups. |
| Re-weight | IPS | employs the reciprocal of the sum popularity of items within the group as the weight assigned to that group. |
| Re-weight | Minmax-SGD | applies optimizing techniques to dynamically sample groups. |
| Re-weight | SDRO | Improves DRO with the distributional shift to optimize group MMF. |
| Re-sample | FairNeg | adjusts the group-level negative sampling distribution in the training process. |
| Regularizer | FOCF | applies a fair-aware regularization loss of different groups. |
| Regularizer | DPR | applies a fair-aware adversarial loss based on statistical parity and equal opportunity. |
| Regularizer | Reg | imposes a penalty on the squared difference between the average scores of two groups across all positive user-item pairs. |
| Prompt-based | FairPrompts | Manually designe fair-aware prompts |
| Types | Models | Descriptions |
|---|---|---|
| Heuristic | CP-Fair | applies a greedy solution to optimize the knapsack problem of fair ranking. |
| Heuristic | min-regularizer | adds an additional fairness score to the ranking scores, capturing the gap between the current utility and the worst-off utility. |
| Heuristic | RAIF | a model-agnostic repeat-bias-aware item fairness optimization algorithm based on mixed-integer linear programming. |
| Learning-based | P-MMF | applies a dual-mirror gradient descent method to optimize the accuracy-fairness trade-off problem. |
| Learning-based | FairRec | proposes leveraging Nash equilibrium to guarantee Max-Min Share of item exposure. |
| Learning-based | FairRec+ | proposes leveraging Nash equilibrium to guarantee Max-Min Share of item exposure. |
| Learning-based | FairSync | proposes to guarantee the minimum group utility under distributed retrieval stages. |
| Learning-based | Tax-Rank | applies the optimal transportation (OT) algorithm to trade-off fairness-accuracy. |
| Learning-based | Welf | use the Frank-Wolfe algorithm to maximize the Welfare functions of worst-off items. |
| Learning-based | ElasticRank | use the Elastic theory to optimize the fair re-ranking. |
| Learning-based | ManifoldRank | interpret reranking as manifold optmization to find best equilibrium. |
| Types | Models | Descriptions |
|---|---|---|
| Pointwise | MART | a gradient boosting decision tree model that optimizes ranking by iteratively refining regression trees to minimize loss. |
| Pairwise | RankNet | A neural network-based model that minimizes the number of incorrectly ranked pairs. |
| Pairwise | RankBoost | An ensemble-based boosting algorithm that optimizes pairwise ranking orders. |
| Pairwise | AdaRank | A functional gradient boosting approach for ranking. |
| Listwise | ListNet | Uses a probabilistic model to directly optimize listwise ranking performance. |
| Listwise | Random Forests | A tree-based model for learning-to-rank. |
| Listwise | Coordinate Ascent | Optimizes ranking functions by iteratively adjusting parameters to maximize a ranking-based objective. |
| Listwise | LambdaMART | A gradient boosting tree-based model that uses LambdaRank to optimize ranking metrics like NDCG. |
| Types | Models | Descriptions |
|---|---|---|
| Causal | CIFRank | estimates the causal effect of the sensitive attributes on the data and makes use of them to correct for the bias encoded. |
| Probabilisitc Mapping | LFR | optimizes for group fairness by making sure that the probability of a group to be mapped to a cluster is equal to the probability of the other group. |
| Probabilisitc Mapping | iFair | optimizes for individual fairness by making sure that the distance between similar individuals is maintained in the new space |
| Probabilisitc Mapping | gFair | optimizes for group fairness by making sure that the distance between similar individuals from a group are close, in the new space, to similar individuals from the other group. Moreover, it constraints the optimization to maintain the relative distance between individuals belonging to the same group. |
| Types | Models | Descriptions |
|---|---|---|
| Unsupervised | PM2 | optimizes proportionality by iteratively determining the topic that best maintained the overall proportionality. |
| Unsupervised | xQuAD | utilizes sub-queries representing pseudo user intents and diversifies document rankings by directly estimating the relevance of the retrieved documents to each sub-queries. |
| Supervised | DESA | employs the attention mechanism to model the novelty of documents and the explicit subtopics. |
| Supervised | DALETOR | proposes diversification-aware losses to approach the optimal ranking. |
| DiversePrompts | GPT-4o | a diversity ranking model based on large language models. |
| DiversePrompts | Claude 3.5 | a diversity ranking model based on large language models. |
You just needs few steps and some lines of codes to develop and evaluate your own models based on our toolkit!
- set your custom model parameters
- based on your model type, inherit the corresponding abstract class and write the code for your model.
- set up your model for the training pipelines
Then you can run the shell command to evaluate your own models.
- Recommendation
Here, we provide an example code demonstrating how to design a custom in-processing model.
#/recommendation/rank_model/YourModel.py
class YourModel(Abstract_Regularizer):
def __init__(self, config, group_weight):
super().__init__(config)
def fairness_loss(self, input_dict):
losses = input_dict['scores']
return torch.var(losses)
#/recommendation/rank_model/__init__.py
from .YourModel import YourModel
#/recommendation/trainer.py
if config["model"] == "YourModel":
self.Model = YourModel(config)
#test.py
from recommendation.trainer import RecTrainer
config = {'model': 'BPR', 'data_type': 'pair', 'fair-rank': True, 'rank_model': 'YourModel', 'use_llm': False, 'log_name': "test", 'dataset': 'steam'}
trainer = RecTrainer(train_config=config)
trainer.train()
Here, we provide an example code demonstrating how to design a custom post-processing model.
#/recommendation/rerank_model/YourModel.py
class YourModel(Abstract_Reranker):
def __init__(self, config, weights = None):
super().__init__(config, weights)
def rerank(self, ranking_score, k):
rerank_list = []
for u in trange(user_size):
result_item = np.argsort(ranking_score[u,:])[::-1]
result_item = result_item[:k]
rerank_list.append(result_item)
return rerank_list
#/recommendation/rerank_model/__init__.py
from .YourModel import YourModel
#/recommendation/reranker.py
elif config['model'] == 'YourModel':
Reranker = YourModel(config)
#test.py
from recommendation.reranker import RecReRanker
config = {'ranking_store_path': 'steam-base-mf', 'model': 'YourModel', 'fair-rank': True, 'log_name': 'test', 'fairness_metrics': ["GINI"], 'dataset': 'steam'}
reranker = RecReRanker(train_config=config)
reranker.rerank()
- Search
Here, we provide an example code demonstrating how to design a custom pre-processing model.
#/search/preprocessing_model/YourModel.py
class YourModel(PreprocessingFairnessIntervention):
def __init__(self, configs, dataset):
super().__init__(configs, dataset)
def fit(self, X_train, run):
self.opt_params = # updated params of your model
def transform(self, X_train, run file_name=None):
X_train_fair = # use self.opt_params to apply the transformation on the data
return X_train_fair
#/search/properties/models/YourModel.yaml.
# Define your config file for "YourModel".
#/search/preprocessing_model/__init__.py
from .YourModel import YourModel
fairness_method_mapping['YourModel'] = YourModel
#test.py
from search.trainer_preprocessing_ranker import RankerTrainer
config = { "train_ranker_config": {"preprocessing_model": "YourModel", "name": "Ranklib", "ranker": "RankNet", "lr": 0.0001, "epochs": 10}}
reranker = RankerTrainer(train_config=config)
reranker.train()
Here, we provide an example code demonstrating how to design a custom post-processing model.
#/search/postprocessing_model/YourModel.py
class YourModel(BasePostProcessModel):
def __init__(self, dropout):
super().__init__(dropout)
def fit(self):
# design your own model.
#/search/properties/models/YourModel.yaml.
# Define your config file for "YourModel".
#/search/postprocessing_model/__init__.py
from .YourModel import YourModel
diversity_method_mapping['YourModel'] = YourModel
#test.py
from search.trainer import SRDTrainer
config = {'model': 'xquad', 'dataset': 'clueweb09', 'task': 'search', 'mode': 'train', "log_name": "test", "model_save_dir": "model/", "tmp_dir": "tmp/", "best_model_list": [], "device": "cpu"}
trainer = SRDTrainer(train_config=config)
trainer.train()
FairDiverse uses MIT License. All data and code in this project can only be used for academic purposes.
If you use our toolkit in the paper, please reference the following bib:
@inproceedings{xu2025fairdiverse,
author = {Chen Xu and Zhirui Deng and Clara Rus and Xiaopeng Ye and Yuanna Liu and Jun Xu and Zhicheng Dou and Ji-Rong Wen and Maarten de Rijke},
title = {FairDiverse: A Comprehensive Toolkit for Fair and Diverse Information Retrieval Algorithms},
booktitle = {Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25)},
year = {2025},
isbn = {979-8-4007-1592-1},
publisher = {Association for Computing Machinery},
address = {Padua, Italy},
month = {July},
doi = {10.1145/3726302.3730280},
url = {https://doi.org/10.1145/3726302.3730280}
}


