Skip to content

bigai-nlco/RuleReasoner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

Paper DeepWiki Github Hugging Face Models

News

  • [2026.01.26] Our paper has been accepted by ICLR 2026 ✨.
  • [2025.01.26] Our project introduction has been featured on DeepWiki.
  • [2025.06.11] Our post on X (aka. Twitter) has received many likes.
  • [2025.06.11] We were featured as HuggingFace Daily Paper #3.

Overview

Reinforced Rule-based Reasoning (RuleReasoner) is a simple yet effective method enabling small reasoning models (SRMs) to effectively learn rule-based reasoning. Unlike large models that need complex training, RuleReasoner uses a curated collection of tasks and a domain-aware dynamic sampling approach, adjusting training based on historical performance. This simple yet effective technique allows SRMs to outperform frontier Large Reasoning Models (LRMs) by +4.1% on in-distribution tasks and +10.4% on out-of-distribution tasks, while also being more computationally efficient.

  • Domain-aware dynamic sampling with higher training sampling efficiency and domain performance balance.
OOD Performance
  • Comprehensive Data curation for data curricula on rule-centric application.
OOD Performance
  • Rule Reasoner (8B and 4B) depicts comparable performance versus a wide range of baselines.
OOD Performance
  • Rule Reasoner (8B and 4B) also achives strong OOD performance across three benchmarks (subsets of rule-based reasoning) including BBH, ProverQA, and BBEH.
OOD Performance

Table of Contents

Quick Start

Prerequisites

Running RuleReasoner requires the dependencies listed in requirements.txt.

Installation

Build RuleReasoner from the source and install dependencies:

  1. Clone the repository:

    git clone https://github.com/bigai-nlco/RuleReasoner.git
  2. Navigate to the project directory:

    cd RuleReasoner
  3. Install the dependencies:

    pip install -r requirements.txt
    pip install -e ./verl
    pip install -e .

Training

Run the training with:

./scripts/train/train_mix.sh

Evaluation

Run the evaluation with:

./scripts/eval/eval_model.sh \
    --model $MODEL_PATH \
    --datasets $DATASET_PATH \
    --output-dir $OUTPUT_DIR

Project Structure

└── RuleReasoner
    ├── LICENSE
    ├── README.md
    ├── requirements.txt
    ├── scripts
    │   ├── build_dataset.py
    │   ├── data
    │   ├── eval
    │   └── train
    ├── setup.py
    ├── src
    │   ├── __init__.py
    │   ├── data
    │   ├── globals.py
    │   ├── system_prompts.py
    │   └── utils.py
    └── verl
	└── ...

Contributing

  • Discussions: Share your insights, provide feedback, or ask questions in the discussions section.
  • Issues: Submit bugs found or log feature requests for the RuleReasoner project in the issues section.
  • Pull Requests: Review open PRs, and submit your own PRs.
Contributing Guidelines
  1. Fork the Repository: Start by forking the project repository to your own account.
  2. Clone Locally: Clone the forked repository to your local machine.
    git clone https://github.com/<YOUR-USERNAME>/RuleReasoner.git
  3. Create a New Branch: Always work on a new branch, giving it a descriptive name.
    git checkout -b new-feature-x
  4. Make Your Changes: Develop and test your changes locally.
  5. Commit Your Changes: Commit with a clear message describing your updates.
    git commit -m 'Implemented new feature x.'
  6. Push to your fork: Push the changes to your forked repository.
    git push origin new-feature-x
  7. Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
  8. Review: Once your PR is reviewed and approved, it will be merged into the main branch.

License

Rulereasoner is distributed under the terms of the MIT License.

Citation

@inproceedings{
    liu2026rulereasoner,
    title={RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling},
    author={Yang Liu and Jiaqi Li and Zilong Zheng},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=MQV4TJyqnb}
}

About

[ICLR 2026] RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

Topics

Resources

License

Stars

Watchers

Forks

Contributors