ORBIT - Open Recommendation Benchmark for Reproducible Research with Hidden Tests

ORBIT Benchmark | ClueWeb-Reco Dataset | Leaderboard

Public Benchmark

Recbole Setup

Install RecBole as per RecBole README:

cd RecSys-Benchmark/RecBole
pip install -e . --verbose

Public Datasets Preparation

Raw Data Processing

We format public datasets into interactions, user-data, item-data. We use RecSysDatasets - Conversion Tools over this pre-processing step. We directly use the preprocessed versions given in their Google Drive.

AmazonReview

While the item data processing part of the conversion tool do not work for AmazonReview 2023 datasets at the time we perform our experiements, we include a script that align the downloaded raw item data with the interactions exported by this conversion tool: data_preprocessing/gen_item_column.sh.

MIND-small

We use all the positive labeled interactions from the train set of MIND-small. Similar to other dataset, we fed this into RecSysDatasets - Conversion Tools. Notice that the mind_small_train alias is depreciated so we used mind_large_train.

We then run data_preprocessing/remove_0_labels.py to select positive-labeled interactions only to enable full sort evaluation in RecBole.

Export Benchmark Data Splits

You can run the following script to export ORBIT's train, validation, and test splits.

cd RecSys-Benchmark/RecBole
sbatch scripts/run_export_data_splits.sh

Note: Remember to update the dataset you want to process in the scripts and associated paths. The model config is a placeholder we use to adapt the benchmark data split exports in Recbole pipeline.

Recbole-Supported Experiments

We use Recbole implementation of the following models. The configurations files we use over different models and datasets can be found under folder Recbole/configs. For more information and usage of custom configurations, please refer to the official Recbole repository.

The scripts we use to launch experiments for each model can be found under folder RecBole/scripts. You can modify the model and dataset variables within these scripts to reproduce our experiments on the model and dataset desired.

For example:

cd RecSys-Benchmark/RecBole
sbatch scripts/run_SASRec.sh

Note:

For some models, we might modify the attributes/features used for different dataset, for instance, dropout rate of SASRec are different for ml-1m and Amazon dataset. Please go over the model config for details.

Note:

Remember to update the dataset you want to process in the scripts and associated paths in the dataset config yaml under RecSys-Benchmark/RecBole/configs/datasets.
Remember to update the checkpoint directory and wandb project name in the general config yaml under RecSys-Benchmark/RecBole/configs/eval.yaml.
To reproduce experiments on certain GPU device, use argument gpu_id="0,1" in the config yaml.
To resume experiments from a checkpoint, use argument resume_path="checkpoint_file_path" in the config yaml.
The model architecture we use can vary between MovieLens and AmazonReview datasets. Check the model config yaml under RecSys-Benchmark/RecBole/configs/models for more details.

HSTU

We follow the HSTU official repository to perform HSTU experiments.

Please follow the instruction in HSTU/reproduce for experiment reproduction.

TASTE

We follow the TASTE official opensource to perform TASTE experiments. Please follow their official opensource for environment setup.

TASTE Data Processing

We adapt the TASTE official data processing pipeline by running the following scripts. These scripts are adapted from the TASTE official opensource.

Firstly, format the dataset and its splits from the raw data described in Public Datasets Preparation.

sbatch RecBole/scripts/process_TASTE.sh

Secondly, generate item features:

sbatch TASTE/reproduce/dataprocess/gen_all_items.sh

Thirdly, generate training and evalution features:

sbatch TASTE/reproduce/dataprocess/build_train.sh

Note: Remember to update the dataset you want to process in the scripts and associated paths.

TASTE Experiments

The training scripts of TASTE experiments are in TASTE/reproduce/train. The testing scripts of TASTE experiments are in TASTE/reproduce/test. We use the lowest evaluation loss checkpoint to quickly perform testing.

For example, to train and test TASTE on ml-1m dataset:

cd TASTE
sbatch reproduce/train/ml/train_ml.sh
sbatch reproduce/test/ml/test_ml.sh

Note: We manually remove the 0th [PAD] item included in the item.txt file created by process_TASTE.sh before running inference via test_ml.sh, aligned with the original TASTE implementation.

Note: Remember to update associated paths in these scripts.

HLLM

We follow the HLLM official repository to perform HLLM experiments. Please see the HLLM official repository for more details.

Environment Setup

To avoid package conflicts, create a virtual environment and install dependencies from our adapted requirements.txt:

python -m venv hllm_env
source hllm_env/bin/activate
pip install -r requirements.txt

HLLM Data Preparation

We provide scripts to prepare data for HLLM experiments in the reproduce/transform_datda_format.sh script. Note that we assume you have already downloaded the raw data and formatted the data into official splits as described in Public Dataset Preparation.

Run the following script to prepare ml-1m data for HLLM experiments:

cd RecSys-Benchmark/HLLM
bash reproduce/transform_datda_format.sh

Experiments

We provide scripts to reproduce our experiments across all datasets in the reproduce/ folder. Our scripts are adapted from the HLLM official repository with additional support for checkpointing, loading, multinode training, and tailored to our datasets.

Run experiments on different datasets using:

# Amazon Toys dataset
sbatch reproduce/amzn_toys_HLLM.sh

# Amazon Books dataset
sbatch reproduce/amzn_books_HLLM.sh

#Amazon Sports dataset
sbatch reproduce/amzn_sports_HLLM.sh

# ML-1M dataset
sbatch reproduce/ml1m_HLLM.sh

Note: Remember to update dataset paths and model directories in these scripts and in relevant YAML config files if needed before running.

ClueWeb-Reco Benchmark

ClueWeb-Reco Dataset

You can find ClueWeb-Reco on Huggingface.

We provide two formats for the dataset:

pure interaction format: input in columms [session_id, cw_internal_id, timestamp] that described all interaction in the dataset.
ordered cw id list format: input in columns [session_id, ordered_history_cw_internal_id] that group all historically interacted items for each session.

The ClueWeb-Reco dataset is structured as the following:

## Source Files
-- cwid_to_id.tsv: mapping bewteen official ClueWeb22 docids and our internal docids

## Splits in pure interaction format:
-- interaction_splits:
    -- valid_inter_input.tsv: input for validation dataset
    -- valid_inter_target.tsv: validation dataset ground truth
    -- test_inter_input.tsv: input for testing dataset (ground truth hidden)


## Splits in ordered cw id list format:
-- ordered_id_splits:
    -- valid_input.tsv: input for validation dataset
    -- valid_target.tsv: validation dataset ground truth
    -- test_input.tsv: input for testing dataset (ground truth hidden)

## Utility files for ClueWebApi usage and example processing on the ordered cw id list format
-- cw_data_processing:
    - ClueWeb22Api.py: API to retrieve ClueWeb document information from official ClueWeb22 docids
    - example_dataset.py: example to load input data sequences with ClueWeb22Api

ClueWeb-Reco Benchmark Submission and Evaluation

Your submitted prediction should be a binary file of the following format. Please make sure the submitted prediction binary files contain the ClueWeb internal IDs (0-indexing integer) instead of the official ClueWeb ids.

< 4 bytes int representing num_sessions><4 bytes int representing K><num_queries * K * sizeof(int) representing predicted clueweb internal ids>

We follow Recbole's evaluation to evaluate ClueWeb-Reco results, as in ClueWeb-Reco/get_metrics.sh and ClueWeb-Reco/get_metrics.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ORBIT - Open Recommendation Benchmark for Reproducible Research with Hidden Tests

ORBIT Benchmark | ClueWeb-Reco Dataset | Leaderboard

Table of Contents

Public Benchmark

Recbole Setup

Public Datasets Preparation

Raw Data Processing

AmazonReview

MIND-small

Export Benchmark Data Splits

Recbole-Supported Experiments

HSTU

TASTE

TASTE Data Processing

TASTE Experiments

HLLM

Environment Setup

HLLM Data Preparation

Experiments

ClueWeb-Reco Benchmark

ClueWeb-Reco Dataset

ClueWeb-Reco Benchmark Submission and Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
ClueWeb-Reco		ClueWeb-Reco
HLLM		HLLM
HSTU		HSTU
RecBole		RecBole
TASTE		TASTE
data_preprocessing		data_preprocessing
.gitignore		.gitignore
README.md		README.md

cxcscmu/RecSys-Benchmark

Folders and files

Latest commit

History

Repository files navigation

ORBIT - Open Recommendation Benchmark for Reproducible Research with Hidden Tests

ORBIT Benchmark | ClueWeb-Reco Dataset | Leaderboard

Table of Contents

Public Benchmark

Recbole Setup

Public Datasets Preparation

Raw Data Processing

AmazonReview

MIND-small

Export Benchmark Data Splits

Recbole-Supported Experiments

HSTU

TASTE

TASTE Data Processing

TASTE Experiments

HLLM

Environment Setup

HLLM Data Preparation

Experiments

ClueWeb-Reco Benchmark

ClueWeb-Reco Dataset

ClueWeb-Reco Benchmark Submission and Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages