Skip to content

Official code for “CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models”, accepted by SIGIR 2025

Notifications You must be signed in to change notification settings

BUPT-GAMMA/CORONA

 
 

Repository files navigation

CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models

News

  • Our paper has been accepted to SIGIR 2025 🎉

Introduction

CORONA is a coarse-to-fine recommendation framework that retrieves user-neighborhoods on user–item bipartite graphs and leverages LLM-augmented user profiles as side information. The coarse stage retrieves candidate users via graph-aware similarity with distance priors; the fine stage constructs compact subgraphs for downstream training/evaluation.

Repository Structure

  • main.py: training/validation/testing for user retriever
  • model.py: retriever model with distance-aware embedding transformation
  • construct_graph.py: build per-user subgraphs from retrieved neighbors
  • chat_api_query.py: LLM-based user profiling and embedding generation
  • load.py: utilities for dataset loading and diagnostics
  • netflix_data/: example dataset placeholder (train/val/test splits and sparse matrices)

Quick Start

1) Environment

  • Python 3.9+
  • Install minimal deps:
pip install -r requirements-min.txt

For CUDA/Torch Geometric GPU wheels, follow the official guides.

2) Configure

Create .env (or copy .env.example) to specify paths/devices:

cp .env.example .env

Key variables:

  • DATA_DIR: project root for data and outputs (default .)
  • DATASET_DIR: dataset subdir (default /netflix_data)
  • CUDA_VISIBLE_DEVICES: GPU id (default 0)
  • TOP_K: retrieved users per query (default 500)
  • OPENAI_*: LLM credentials for profiling

3) Prepare Dataset

We experiment on Netflix, MovieLens, and Amazon-Book. Provide only textual side information for all methods.

  • Place processed files under ${DATA_DIR}${DATASET_DIR}:
    • train.json, val.json, test.json (uid -> item list)
    • train_mat (scipy sparse user–item CSR, pickled)
    • augmented_user_init_embedding_final (numpy array pickled, dim = user embedding)
    • Optional: netflix_image_text/item_attribute.csv for profiling
  • For Netflix node features, we recommend following LLMRec instructions.

4) LLM-based User Profiling (Optional)

If you need to generate augmented_user_init_embedding_final:

make augment

This reads train_mat/test.json and writes ${AUGMENT_FILE_PATH}/augmented_user_init_embedding_final.

5) Train and Evaluate

make train

After training, the best model and retrieved nodes are saved to ${DATA_DIR}/Graph_RA_Rec/model_states/. For testing independently:

make test

6) Construct Subgraphs

make graphs

Produces user/item subgraphs under ${DATA_DIR}/Graph_RA_Rec/${basename(DATASET_DIR)}/.

Datasets Details

  • Netflix (KDD Cup 2007)
  • MovieLens-10M (ACM TiiS 2015)
  • Amazon-Book (EMNLP 2019) We follow LLMRec for Netflix/MovieLens splits and RLMRec for Amazon-Book. Textual info is encoded by Sentence-BERT.

Reproducibility Notes

  • Determinism: set_seed(3) in main.py
  • GPU selection via CUDA_VISIBLE_DEVICES
  • Cached tensors: *_for_RA.pkl are stored in ${DATA_DIR}${DATASET_DIR}

Citation

If you find this repository helpful, please cite:

@inproceedings{corona2025,
  title={CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models},
  booktitle={Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2025}
}

References

  • He et al., LightGCN, SIGIR 2020
  • Wei et al., LLMRec, arXiv 2024
  • Ren et al., RLMRec, WWW 2024
  • Bennett and Lanning, The Netflix Prize, KDD Cup 2007
  • Harper and Konstan, MovieLens, ACM TiiS 2015
  • Reimers and Gurevych, Sentence-BERT, EMNLP/IJCNLP 2019

License

This code is released for research purposes. See repository license if provided.

About

Official code for “CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models”, accepted by SIGIR 2025

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.6%
  • Makefile 1.4%