Skip to content

snap-research/EnsRec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EnsRec: Ensembling ID + Text Sequence Encoders for Sequential Recommendation

PyTorch Lightning Config: Hydra Template

⚡ Overview

This repository reproduces the experiments in the paper: "Exploiting ID-Text Complementarity via Ensembling for Sequential Recommendation". This includes an implementation of EnsRec, our simple ID + Text ensembling strategy for sequential recommendation.

📦 Installation

Prerequisites

  • Python 3.10+
  • CUDA-compatible GPU (recommended)

Setup Environment

# clone project
git clone ...
cd ensrec

# create conda environment
conda create -n ensrec python=3.11
conda activate ensrec

# install requirements
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu124

🗂️ Data preparation

Please download the preprocessed data from this Google Drive link. We have followed the same preprocessing as in LIGER. Please unzip this folder and put it in the data/ folder, structured as follows:

data/
├── beauty/         # Beauty from Amazon Reviews 2018
    ├── training/       # training sequence of user history 
    ├── evaluation/     # validation sequence of user history 
    ├── testing/        # testing sequence of user history 
    └── items/          # text of all items in the dataset
├── sports/         # Sports from Amazon Reviews 2018
├── toys/           # Toys from Amazon Reviews 2018
└── steam/          # Steam

All folders beauty/, sports/, toys/ and steam/ should have training/, evaluation/, testing/ and items/ subfolders.

Text embedding generation

Please run notebooks/embedding_gen.ipynb to generate item text embeddings for each dataset. We use SentenceT5-XXL by default.

After completing this, the data is ready for experimentation.

🚀 How to run

To run EnsRec

bash scripts/run_ensrec.sh --method_1=id_only --method_2=text_only --dataset=beauty --seed=42

This will first separately train ID-Only and Text-Only models and save their test user and item embeddings. Then, it will run the notebook notebooks/test_ensrec.ipynb to evaluate test recommendation performance, including complementarity statistics.

You can also run a slightly modified version of EnsRec that trains ID-Only and Text-Only models in the same experiment run. Note that this is different from the EnsRec result we report since it early stops on the ensemble validation performance, instead of independently early stopping the ID-Only and Text-Only models based on their own validation performances (and only ensembling them for testing). To run this version:

bash scripts/run_ensrec.sh --method_1=id_only --method_2=text_only --dataset=beauty --seed=42 --one_job

To reproduce the complementarity results in Table 2, method_1 and method_2 shoud be changed to any variant of ID-Only and/or Text-Only methods, e.g.

bash scripts/run_ensrec.sh --method_1=id_only --method_2=id_only/ablate_encoder --dataset=beauty 

Specifically, the possible values of method_1 and method_2 are

  • id_only
  • id_only/ablate_encoder
  • id_only/ablate_negatives
  • id_only/ablate_init
  • text_only
  • text_only/ablate_encoder
  • text_only/ablate_negatives
  • text_only/ablate_lm

dataset can be one of

  • beauty
  • toys
  • sports
  • steam

To run baselines

bash scripts/run_baseline.sh --method=fdsa --dataset=beauty --seed=42

This will train and test the baseline method, which can be one of

  • id_only
  • test_only
  • llm_init
  • whitenrec
  • unisrec
  • rlmrec_con
  • rlmrec_gen
  • llm_esr
  • alphafuse
  • fdsa

To run ensemble ablation

First you should run EnsRec on the desired dataset using the command above. Then, run the notebook notebooks/ablate_ensrec.ipynb, setting dataset and seeds in the first code cell (the parameters cell) as appropriate.

To run trainng and testing, in general

You can train and test any model with chosen experiment configuration from configs/experiment/:

python src/train.py experiment=id_only/train_beauty

You can override any parameter from the command line like this:

python src/train.py experiment=id_only/train_beauty trainer.max_epochs=20 optim.optimizer.lr=0.001 model.d_model=64

Logging

Training and evaluation logs are logged in logs/. By default, metrics are logged in csv format. This can be changed to tensorboard records by passing logger=tensorboard.

🤝 Acknowledgements

This repo is based on the repo GRID: Generative Recommendation with Semantic IDs. It also adapts code from AlphaFuse: Learn ID Embeddings for Sequential Recommendation in Null Space of Language Embeddings for AlphaFuse, UniSRec, WhitenRec, and RLMRec implementations. It adopts data preprocessing settings and default hyperparameter settings from Unifying Generative and Dense Retrieval for Sequential Recommendation.

📞 Contact

For questions and support, please create a GitHub issue or contact Liam Collins ([email protected]).

📚 Citation

If you find our paper and/or code useful, please use the following citation:

@article{collins2025exploiting,
  title={Exploiting ID-Text Complementarity via Ensembling for Sequential Recommendation},
  author={Collins, Liam and Kumar, Bhuvesh and Ju, Clark Mingxuan and Zhao, Tong and Loveland, Donald and Neves, Leonardo and Shah, Neil},
  journal={arXiv preprint arXiv:2512.17820},
  year={2025}
}

About

Exploiting ID-Text Complementarity via Ensembling for Sequential Recommendation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published