Skip to content

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

License

Notifications You must be signed in to change notification settings

Meng-03/Awesome-RL-based-Reasoning-MLLMs

Β 
Β 

Repository files navigation

Awesome RL-based Reasoning MLLMs

License: MIT Awesome

Recent advancements in leveraging reinforcement learning to enhance LLM reasoning capabilities have yielded remarkably promising results, exemplified by DeepSeek-R1, Kimi k1.5, OpenAI o3-mini, Grok 3. These exhilarating achievements herald ascendance of Large Reasoning Models, making us advance further along the thorny path towards Artificial General Intelligence (AGI). Study of LLM reasoning has garnered significant attention within the community, and researchers have concurrently summarized Awesome RL-based LLM Reasoning. Recently, researchers have also compiled a collection of some projects with detailed configurations about Large Reasoning Models in Awesome RL Reasoning Recipes ("Triple R"). Meanwhile, we have observed that remarkably awesome work has already been done in the domain of RL-based Reasoning Multimodal Large Language Models (MLLMs). We aim to provide the community with a comprehensive and timely synthesis of this fascinating and promising field, as well as some insights into it.

"The senses are the organs by which man perceives the world, and the soul acts through them as through tools."
β€” Leonardo da Vinci

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

News

πŸ”₯πŸ”₯πŸ”₯[2025-5-24] We write the position paper Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models that summarizes recent advancements on the topic of RFT for MLLMs. We focus on answering the following three questions: 1. What background should researchers interested in this field know? 2. What has the community done? 3. What could the community do next? We hope that this position paper will provide valuable insights to the community at this pivotal stage in the advancement toward AGI.

πŸ“§πŸ“§πŸ“§[2025-4-10] Based on existing work in the community, we provide some insights into this field, which you can find in the PowerPoint presentation file.

image

Figure 1: An overview of the works done on reinforcement fine-tuning (RFT) for multimodal large language models (MLLMs). Works are sorted by release time and are collected up to May 15, 2025.

Papers (Sort by Time of Release)πŸ“„

Vision (Image)πŸ‘€

Vision (Video)πŸ“Ή

Medical VisionπŸ₯

Embodied VisionπŸ€–

Multimodal Reward Model πŸ’―

AudioπŸ‘‚

Omni☺️

GUI AgentπŸ“²

Web Agent🌏

Autonomous DrivingπŸš™

3D & Metaverse🌠

Benchmarks and DatasetsπŸ“Š

Open-Source Projects (Repos without Paper)🌐

Training Framework πŸ—Ό

  • EasyR1 πŸ’» EasyR1 (An Efficient, Scalable, Multi-Modality RL Training Framework)

Vision (Image) πŸ‘€

Vision (Video)πŸ“Ή

Agent πŸ‘₯

Contribution and Acknowledgment❀️

This is an active repository and your contributions are always welcome! If you have any question about this opinionated list, do not hesitate to contact me [email protected].

I extend my sincere gratitude to all community members who provided valuable supplementary support.

CitationπŸ“‘

If you find this repository useful for your research and applications, please star us ⭐ and consider citing:

@misc{sun2025reinforcementfinetuningpowersreasoning,
      title={Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models}, 
      author={Haoyuan Sun and Jiaqi Wu and Bo Xia and Yifu Luo and Yifei Zhao and Kai Qin and Xufei Lv and Tiantian Zhang and Yongzhe Chang and Xueqian Wang},
      year={2025},
      eprint={2505.18536},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.18536}, 
}

and

@misc{sun2025RL-Reasoning-MLLMs,
  title={Awesome RL-based Reasoning MLLMs},
  author={Haoyuan Sun, Xueqian Wang},
  year={2025},
  howpublished={\url{https://github.com/Sun-Haoyuan23/Awesome-RL-based-Reasoning-MLLMs}},
  note={Github Repository},
}

Star Chart⭐

Star History Chart

About

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published