🎉 We are pleased to announce that our paper has been accepted at the ACL 2025 Main Conference.
Welcome to the official repository for the CER paper. This README provides step-by-step instructions to set up the environment, download necessary datasets, and reproduce the results presented in the paper.
-
Clone the Repository:
Start by cloning this repository to your local machine. -
Create a Python Virtual Environment:
Set up a virtual environment of your choice to isolate package dependencies. -
Install Dependencies:
Run the following command to install the required packages:pip install -r requirement.txt
-
Download the spaCy Model:
Download theen_core_web_trf
model with:python -m spacy download en_core_web_trf
-
Create Data Directory:
In the root of the project, create a directory nameddata
:mkdir data cd data
-
Download Datasets:
Download the following files into thedata
directory: -
Extract and Prepare the Data:
Use the commands below to download and extract the files:wget https://people.eecs.berkeley.edu/~hendrycks/MATH.tar wget http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_fullwiki_v1.json tar xvf MATH.tar # Extract the MATH dataset
-
Create the Datasets:
Generate the datasets by running:python src/custom_datasets/data_loader.py
for the mathematical datasets, and
python src/custom_datasets/multihop_loader.py
for the open-domain question generation datasets.
-
Experiment Settings:
All necessary experiment configurations are defined in thesrc/config.py
file. Modify this file to suit your requirements. -
Environment Variables:
The code supports a.env
file. Set your desired environment variables in this file, which are then used insrc/config.py
.
Once the environment is set up and data prepared, run the main program with:
python main.py
-
MODEL_NAME:
The name or path to the desired Hugging Face model. -
DATA_DIR:
The absolute path to the data directory (e.g.,/home/user/CER/data
). -
RUN_NAME:
Specifies the running mode. Use"all"
to execute all configurations defined in themulti_run_configs
dictionary. -
K:
The number of generated paths. -
aggregate:
A boolean flag indicating whether to aggregate paths (True
) or select the best path (False
). -
MULTIHOP:
Determines whether to run the Trivia QA or HotPot QA datasets. -
N_SAMPLE:
The number of samples to process. -
SEED:
The seed value used for shuffling the dataset. -
BATCH_SIZE:
The batch size used during inference. -
STEP_DECOMPOSITION:
Flag to use the incremental reasoning step prompt. -
DATASETS:
A dictionary mapping dataset names to their corresponding files. For example:{ "allenai": "allenai_math_qa_test_processed.parquet", "math": "src_datasets_math_dataset_test_processed.parquet", "gsm8k": "openai_gsm8k_test_processed.parquet", "hotpot": "hotpotqa_processed.parquet", "trivia": "triviaqa_processed.parquet", }