Skip to content

thu-coai/BARREL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs

method

This is the codebase for our paper BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs.

We identify two pathological reasoning patterns characterized by overthinking that contribute to the overconfident and incorrect answers: last-minute guessing and second-thought spiraling. To address these issues, we propose BARREL—a novel framework that promotes concise and boundary-aware factual reasoning. Our experiments show that BARREL-training increases the reliability of DeepSeek-R1-Distill-Llama-8B from 39.33% to 61.48%, while still achieving accuracy comparable to models finetuned on reasoning data generated by R1. These results demonstrate that our pilot study is inspiring to build more reliable and factual System 2 LRMs. Please refer to our paper for more details.

More detailed code and data for data construction and ablation analysis will be available soon. Please feel free to contact us if you have any questions regarding our code and other implementation details.

Quick Start

Setup

git clone [email protected]:thu-coai/BARREL.git
cd BARREL
pip install -r requirements.txt

Evaluation

We provide a script to support batch generation of model responses using the vLLM toolkit. Before running the script, make sure to update the model/tokenizer path as well as the input/output paths accordingly.

To run the generation:

cd gen_code
bash generate.sh

We also provide quick evaluation scripts to compute factual scores on relevant datasets and accuracy scores on math datasets. Before running the evaluations, remember to update the name, model, data_path, and output_path fields in both factual_eval.py and math_eval.py to reflect your own generation results.

To run the evaluation:

cd evaluation
python factual_eval.py
python math_eval.py

Training

We provide training scripts for both the SFT stage and the GRPO stage. Before running the scripts, make sure to adjust the configurations as needed (e.g., model path, data path, output directories, hyperparameters, etc.).

SFT Stage

To run SFT training:

cd ./training_scripts/sft_stage
bash run_sft.sh

GRPO Stage

To run GRPO training:

cd ./training_scripts/grpo_stage
bash run_grpo.sh

Remember to change the default paths to the SFT-trained model and tokenizer.

result

Please feel free to contact us if you have any questions regarding our code and other implementation details.

Citation

@misc{yang2025barrelboundaryawarereasoningfactual,
      title={BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs}, 
      author={Junxiao Yang and Jinzhe Tu and Haoran Liu and Xiaoce Wang and Chujie Zheng and Zhexin Zhang and Shiyao Cui and Caishun Chen and Tiantian He and Hongning Wang and Yew-Soon Ong and Minlie Huang},
      year={2025},
      eprint={2505.13529},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.13529}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published