GitHub - Artanic30/NoisyGRPO: NeurIPS 2025 Accepted Paper NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimatio
(NeurIPs 2025)

This repository contains the reference code for the paper NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimatio.

🎯 Project web page | Paper |

🤗 HuggingFace Model |

ToDo

Citation

Please cite this work with the following BibTeX:

@article{qiu2025noisygrpo,
  title={NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation},
  author={Qiu, Longtian and Ning, Shan and Sun, Jiaxuan and He, Xuming},
  journal={arXiv preprint arXiv:2510.21122},
  year={2025},
}

Overview

Reinforcement learning (RL) has shown promise in enhancing the general Chain-of-Thought (CoT) reasoning capabilities of multimodal large language models (MLLMs). However, when applied to improve general CoT reasoning, existing RL frameworks often struggle to generalize beyond the training distribution. To address this, we propose NoisyGRPO, a systematic multimodal RL framework that introduces controllable noise into visual inputs for enhanced exploration and explicitly models the advantage estimation process via a Bayesian framework. Specifically, NoisyGRPO improves RL training by: (1) Noise-Injected Exploration Policy: Perturbing visual inputs with Gaussian noise to encourage exploration across a wider range of visual scenarios; and (2) Bayesian Advantage Estimation: Formulating advantage estimation as a principled Bayesian inference problem, where the injected noise level serves as a prior and the observed trajectory reward as the likelihood. This Bayesian modeling fuses both sources of information to compute a robust posterior estimate of trajectory advantage, effectively guiding MLLMs to prefer visually grounded trajectories over noisy ones.

Installation

To create the conda environment named reflectiva use the following instructions. With this environment you have all the packages to run the code inside this repo.

conda create -n noisygrpo python=3.10
conda activate noisygrpo
bash setup.sh

Model

You can access the official model weights for the NoisyGRPO model on 🤗 Hugging Face.

Dataset

The annotation of training dataset is provided in annotations/mm_rlhf_train13k.json. Please note that the JSON file includes only the absolute paths to the images. You may need to change it to fit your own system.

The images can be downloaded from MM-RLHF. After downloading the .zip files, unzip the images in one file and change the image path in annotation file accordingly.

The directory for the image should be as following:

MM_RLHF
├── long
├── mcq
├── safety
├── short

Training

Before starting the training of NoisyGRPO, make sure to set up the environment and download the dataset to your local machine. Additionally, update the absolute paths in the functions starting with fill_abs_path to correctly point to the image locations in your configuration.

Once everything is set up, you can launch the training job using the following command:

To train the NoisyGRPO 3B, use the following scripts:

cd ./NoisyGRPO

bash scripts/noisy_grpo_3B_8gpu.sh

We also provide the scripts for vanilla GRPO, all the scripts are under scripts/.

Acknowledgements

We would like to express our sincere gratitude to DeepSeek, Open-R1, QwenVL, Open-R1-Multimodal and VLM-R1 for providing open-source resources that contributed to the development of this project.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
annotations		annotations
asserts		asserts
scripts		scripts
src/open-r1-multimodal		src/open-r1-multimodal
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimatio
(NeurIPs 2025)

Table of Contents

ToDo

Citation

Overview

Installation

Model

Dataset

Training

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Artanic30/NoisyGRPO

Folders and files

Latest commit

History

Repository files navigation

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimatio (NeurIPs 2025)

Table of Contents

ToDo

Citation

Overview

Installation

Model

Dataset

Training

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimatio
(NeurIPs 2025)

Packages