MATH Training and Evaluation

This repository contains scripts for training and evaluating models on the MATH dataset.

Environment Setup

Our experiments were successfully conducted on 8×80GB A100 GPUs with CUDA 12.4.
To reproduce the environment, run the following commands:

conda env create -f environment.yml
conda activate MATH
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.2/flashinfer_python-0.2.2+cu124torch2.6-cp38-abi3-linux_x86_64.whl#sha256=5e1cdb2fb7c0e9e9a2a2241becc52b771dc0093dd5f54e10f8bf612e46ef93a9

Running the Code

After setting up the environment, you can start training with:

bash examples/Qwen2_5_MATH_1_5_b_CCGSPG.sh

During training, models are continuously evaluated, and all experiment logs are automatically tracked via Weights & Biases (wandb).

Using Pretrained Models

We provide pretrained checkpoints for reproducibility:

Download the model from this link.

Place the checkpoints folder in the following path:

MATH_Code/checkpoints/MATH/NEW_qwen2_5_MATH_1_5b_ccpo_bce_beta_0.5

Run the script:

bash examples/Qwen2_5_MATH_1_5_b_CCGSPG.sh

This will allow you to see results in wandb or check the log file at:

MATH_Code/checkpoints/MATH/NEW_qwen2_5_MATH_1_5b_ccpo_bce_beta_0.5/training_process.log

Evaluation Metrics

To compute evaluation metrics such as Accuracy, Expected Calibration Error (ECE), and Brier Score (BS), simply specify the path to your generated outputs and run:

python cal_metric.py

Acknowledgements

This repository is built upon verl. We thank the verl team for open-sourcing such a powerful RL4LLMs framework.
We sincerely acknowledge the datasets and reward functions provided by DeepScaleR and AdaRFT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MATH Training and Evaluation

Environment Setup

Running the Code

Using Pretrained Models

Evaluation Metrics

Acknowledgements

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

MATH Training and Evaluation

Environment Setup

Running the Code

Using Pretrained Models

Evaluation Metrics

Acknowledgements