We welcome everyone to open an issue for any related work we havenβt discussed, and weβll try to address it in the next release!
- [2025-11-05] π₯ Excited to release our paper list about Memory for Agents, covering breakthroughs in Context Management and Learning from Experience powering self-improving AI agents. Check it out: GitHub
- [2025-10] π Honored to give talks at BAAI, Qingke Talk and Tencent Wiztalk! Here are the slides.
- [2025-09-18] π We update the full list of papers in the category structure of the survey!
- [2025-09-12] π Our survey was ranked #1 Paper of the Day on π€ Hugging Face Daily Papers!
- [2025-09-11] π₯ Excited to release our RL for LRMs Survey! Weβll be updating the full list of papers in with a new category structure soon. Check it out: Paper.
- [2025-08-15] π₯ Introducing SSRL: an investigation for Agentic Search RL without reliance on external search engine. Check it out: GitHub and Paper.
- [2025-05-27] π₯ Introducing MARTI: A Framework for LLM-based Multi-Agent Reinforced Training and Inference. Check it out: Github.
- [2025-04-23] π₯ Introducing TTRL: an open-source solution for online RL on data without ground-truth labels, especially test data. Check it out: Github and Paper.
- [2025-03-20] π₯ We are excited to introduce collection of papers and projects on RL for reasoning models!
If you find this survey helpful, please cite our work:
@article{zhang2025survey,
title={A survey of reinforcement learning for large reasoning models},
author={Zhang, Kaiyan and Zuo, Yuxin and He, Bingxiang and Sun, Youbang and Liu, Runze and Jiang, Che and Fan, Yuchen and Tian, Kai and Jia, Guoli and Li, Pengfei and others},
journal={arXiv preprint arXiv:2509.08827},
year={2025}
}- A Survey of Reinforcement Learning for Large Reasoning Models
- π News
- π Citation
- π Contents
- πΊοΈ Overview
- π Paper List
- Frontier Models
- Reward Design
- Policy Optimization
- Sampling Strategy
- Training Resource
- Static Corpus (Code)
- Static Corpus (STEM)
- Static Corpus (Math)
- Static Corpus (Agent)
- Static Corpus (Mix)
- Dynamic Environment (Rule-based)
- Dynamic Environment (Code-based)
- Dynamic Environment (Game-based)
- Dynamic Environment (Model-based)
- Dynamic Environment (Ensemble-based)
- RL Infrastructure (Primary)
- RL Infrastructure (Secondary)
- Applications
- π Acknowledgment
- β¨ Star History
Our survey provides a comprehensive examination of Reinforcement Learning for Large Reasoning Models.
We organize the survey into five main sections:
- Foundational Components: Reward design, policy optimization, and sampling strategies
- Foundational Problems: Key debates and challenges in RL for LRMs
- Training Resources: Static corpora, dynamic environments, and infrastructure
- Applications: Real-world implementations across diverse domains
- Future Directions: Emerging research opportunities and challenges
This survey is extended and refined from the original Awesome RL Reasoning Recipes repo. We are deeply grateful to all contributors for their efforts, and we sincerely thank for their all interest in Awesome RL Reasoning Recipes. The contents of the previous repository are available here.

