ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement

This paper is accepted by ICLR-2025 as Oral Presentation.

Abstract

Post-training Large Language Models (LLMs) with explicit reasoning trajectories can enhance their reasoning abilities. However, acquiring such high-quality trajectory data typically demands meticulous supervision from humans or superior models, which can be either expensive or license-constrained. In this paper, we explore how far an LLM can improve its reasoning by self-synthesizing reasoning paths as training data without any additional supervision. Existing self-synthesizing methods, such as STaR, suffer from poor generalization to out-of-domain (OOD) reasoning tasks. We hypothesize it is due to that their self-synthesized reasoning paths are too task-specific, lacking general task-agnostic reasoning guidance. To address this, we propose Reasoning Generalist via Self-Improvement (ReGenesis), a method to self-synthesize reasoning paths as post-training data by progressing from abstract to concrete. More specifically, ReGenesis self-synthesizes reasoning paths by converting general reasoning guidelines into task-specific ones, generating reasoning structures, and subsequently transforming these structures into reasoning paths, without the need for human-designed task-specific examples used in existing methods. We show that ReGenesis achieves superior performance on all in-domain and OOD settings tested compared to existing methods. For six OOD tasks specifically, while previous methods exhibited an average performance decrease of approximately 4.6% after post training, ReGenesis delivers around 6.1% performance improvement. We also conduct in-depth analysis of our framework and show ReGenesis is effective across various LLMs and design choices.

Environment

pip install -r ./requirements_vllm.txt

Pipeline

1. Generate Reasoning Structure (Section `3.1`)

cd ./src/reasoning
CUDA_VISIBLE_DEVICES=0 python reasoning_paths_gen.py

2. Process the reasoning (Filtering) to do Exact Match (Section `3.2.1`)

cd ./src/reasoning
CUDA_VISIBLE_DEVICES=1 python process_reason.py

3. Filter out samples which gets no reasoning structure in Step 1

cd ./src/reasoning
python filter_questions_cannot_answer.py

4. Add truth to the prompt and run self_discover again on samples filtered out by Step 3

cd ./src/reasoning
CUDA_VISIBLE_DEVICES=0 python truth_convert_reason.py

After this step, you will go back to step 2 for filtering again to get step4-2.json

5. Prepare for Fine-tune

cd ./src/finetune_code
python convert_format.py

6. Fine-tune

cd ./src/finetune_code
# need a new env
pip install -r requirements.txt
sh ft_mistral.sh

7. Eval Dataset

We use the codes from https://github.com/OpenBMB/Eurus

Citation

@article{peng2024regenesis,
  title={ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement},
  author={Peng, Xiangyu and Xia, Congying and Yang, Xinyi and Xiong, Caiming and Wu, Chien-Sheng and Xing, Chen},
  journal={arXiv preprint arXiv:2410.02108},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
AI_ETHICS.md		AI_ETHICS.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
requirements_vllm.txt		requirements_vllm.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement

Abstract

Environment

Pipeline

1. Generate Reasoning Structure (Section `3.1`)

2. Process the reasoning (Filtering) to do Exact Match (Section `3.2.1`)

3. Filter out samples which gets no reasoning structure in Step 1

4. Add truth to the prompt and run self_discover again on samples filtered out by Step 3

5. Prepare for Fine-tune

6. Fine-tune

7. Eval Dataset

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

SalesforceAIResearch/ReGenesis

Folders and files

Latest commit

History

Repository files navigation

ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement

Abstract

Environment

Pipeline

1. Generate Reasoning Structure (Section 3.1)

2. Process the reasoning (Filtering) to do Exact Match (Section 3.2.1)

3. Filter out samples which gets no reasoning structure in Step 1

4. Add truth to the prompt and run self_discover again on samples filtered out by Step 3

5. Prepare for Fine-tune

6. Fine-tune

7. Eval Dataset

Citation

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

1. Generate Reasoning Structure (Section `3.1`)

2. Process the reasoning (Filtering) to do Exact Match (Section `3.2.1`)

Packages