Revolutionary thought template-augmented LLM reasoning paradigm (ReasonFlux-V1/F1/V2) enpowers a 32B model to achieve SOTA-Level performance in complex reasoning tasks.
- [2025/6/04] 🎉 We release our co-evolving RL optimized coding LLMs, ReasonFlux-Coder-7B and ReasonFlux-Coder-14B, which outperform similarly sized Qwen Coders and DeepSeek Coders, and naturally fit into common test-time scaling and agentic coding pipelines. We also release our Long-CoT model ReasonFlux-Coder-4B, outperforming Qwen3-4B while achieving 64.8% efficiency in unit test generation.
- [2025/5/26] 🎉 We open-source the model weights, training & evaluation scripts for ReasonFlux-v2. We will release our ReasonFlux-V2 paper soon.
- [2025/5/26] 🎉We release ReasonFlux-v2, an effective template-augmented reasoning paradigm that internalizes thought template through iterative hierarchical reinforcement learning. It has achieved SOTA-Level performance with less token consumption.
- [2025/3/24] 🎉We release ReasonFlux-F1-32B, ReasonFlux-F1-14B, ReasonFlux-F1-7B, a series of SOTA-level reasoning LLMs by leveraging the template-augmented reasoning trajectories collected from our ReasonFlux-Zero. For the training and evaluation scripts, please refer to reasonflux-f1/README.md for detail.
- [2025/2/11] 🎉We release the data, training scripts for SFT stage and demo inference code along with template library of ReasonFlux-v1.
- [2025/2/11]🎉We propose ReasonFlux-v1, a hierarchical LLM reasoning framework that significantly enhances complex reasoning capabilities, outperforming SOTA models like o1-preview and DeepSeek-V3 on challenging MATH and AIME benchmarks.
Her we compare our ReasonFlux series models with Frontier LLMs and other Open-Sourced Reasoning LLMs on challenging benchmarks like MATH-500,AIME2024,AIME-2025 and GPQA-Diamond. We can see that our method has achieved state-of-the-art performance on all evaluated tasks.
Model | MATH-500 | AIME 2024 | AIME 2025 | GQPA-Diamond |
---|---|---|---|---|
Frontier LLMs | ||||
OpenAI-o1-2024-12-17 | 94.8 | 74.3 | 79.2 | – |
OpenAI-o3-mini (medium) | 96.8 | 79.6 | 74.8 | 76.8 |
Grok3 Beta | 96.6 | 83.9 | 77.3 | – |
Gemini 2.5-Pro | 98.4 | 92.0 | 86.7 | 84.0 |
Open-Sourced Reasoning LLMs | ||||
DeepSeek-R1-Distill-7B | 83.3 | 55.5 | 23.3 | 49.1 |
DeepSeek-R1-Distill-14B | 93.9 | 69.7 | 26.7 | 59.1 |
DeepSeek-R1-Distill-32B | 94.3 | 72.6 | 53.3 | 62.1 |
DeepSeek-R1-Distill-70B | 94.5 | 70.0 | 56.7 | 65.2 |
DeepSeek-R1-67B | 97.3 | 79.8 | 70.0 | 71.5 |
QwQ-32B-Preview | 90.6 | 50.0 | 46.7 | 65.2 |
QwQ-32B | 97.6 | 80.0 | 63.3 | 68.2 |
Qwen3-32B | 96.6 | 81.4 | 72.9 | 69.2 |
Qwen3-30B-A3B | 96.8 | 80.4 | 70.9 | 65.8 |
Qwen3-235B-A22B | 97.6 | 85.7 | 81.5 | – |
Sky-T1-32B | 86.4 | 43.3 | 36.7 | 56.8 |
LIMO-32B | 56.7 | 33.3 | 92.2 | 58.8 |
s1.1-32B | 93.1 | 60.0 | 60.0 | 63.1 |
OpenThinker-32B | 94.8 | 63.3 | 46.7 | 60.1 |
Light-R1-32B | 96.2 | 78.1 | 68.0 | 60.1 |
ReasonFlux-V1 (2025-1) | 91.2 | 56.7 | 37.2 | 61.2 |
ReasonFlux-F1 (2025-3) | 96.0 | 76.7 | 53.3 | 67.2 |
ReasonFlux-V2 (2025-5) | 97.8 | 86.7 | 76.7 | 71.2 |
@article{yang2025reasonflux,
title={ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates},
author={Yang, Ling and Yu, Zhaochen and Cui, Bin and Wang, Mengdi},
journal={arXiv preprint arXiv:2502.06772},
year={2025}
}