|
1 | | -# EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework |
| 1 | +# CARE: Improving Context Fidelity via Native Retrieval-Augmented Reasoning |
2 | 2 |
|
3 | | -This project is a clean fork of the original [veRL](https://github.com/volcengine/verl) project to support vision language models, we thank all the authors for providing such a high-performance RL training framework. |
4 | 3 |
|
5 | | -EasyR1 is efficient and scalable due to the design of **[HybirdEngine](https://arxiv.org/abs/2409.19256)** and the latest release of **[vLLM](https://github.com/vllm-project/vllm)**'s SPMD mode. |
6 | 4 |
|
7 | | -## Features |
| 5 | +Large language models (LLMs) often struggle with **context fidelity**, producing inconsistent or hallucinated answers even when relevant information is present. |
| 6 | +We propose **CARE**, a native retrieval-augmented reasoning framework that integrates in-context evidence directly into the reasoning chain. |
| 7 | +This work represents a step toward making LLMs more accurate, reliable, and efficient for knowledge-intensive tasks. |
| 8 | +### Results Overview |
| 9 | +<p align="center"> |
| 10 | + <img src="assets/retrieval_results.png" alt="CARE Results" width="85%" style="display:inline-block;"/> |
| 11 | +</p> |
8 | 12 |
|
9 | | -- Supported models |
10 | | - - Llama3/Qwen2/Qwen2.5 language models |
11 | | - - Qwen2/Qwen2.5-VL vision language models |
12 | | - - DeepSeek-R1 distill models |
| 13 | +### Method Overview |
| 14 | +<p align="center"> |
| 15 | + <img src="assets/method.png" width="80%"> |
| 16 | +</p> |
| 17 | +--- |
13 | 18 |
|
14 | | -- Supported algorithms |
15 | | - - GRPO |
16 | | - - Remax |
17 | | - - others RL algorithms (comming soon) |
18 | 19 |
|
19 | | -- Supported datasets |
20 | | - - Any text, vision-text dataset in a [specific format](#custom-dataset). |
| 20 | +## 🔧 Installation |
21 | 21 |
|
22 | | -- Supported tricks |
23 | | - - Padding-free training |
24 | | - - Resuming from checkpoint |
25 | | - - Wandb & SwanLab tracking |
| 22 | +Requirements: |
| 23 | +- Python **3.9+** |
| 24 | +- [requirements.txt](./requirements.txt) includes: |
| 25 | + - `transformers>=4.51.0` |
| 26 | + - `flash-attn>=2.4.3` |
| 27 | + - `vllm>=0.8.3` |
26 | 28 |
|
27 | | -## Requirements |
| 29 | +Clone and install: |
| 30 | +```bash |
| 31 | +git clone https://github.com/FoundationAgents/CARE |
| 32 | +cd CARE |
| 33 | +pip install -r requirements.txt |
| 34 | +pip install -e . |
28 | 35 |
|
29 | | -### Software Requirements |
30 | 36 |
|
31 | | -- Python 3.9+ |
32 | | -- transformers>=4.49.0 |
33 | | -- flash-attn>=2.4.3 |
34 | | -- vllm>=0.7.3 |
| 37 | +--- |
35 | 38 |
|
36 | | -We provide a [Dockerfile](./Dockerfile) to easily build environments. |
| 39 | +## 📥 Data and Model Download |
37 | 40 |
|
38 | | -We recommend using the [pre-built docker image](https://hub.docker.com/r/hiyouga/verl) in EasyR1. |
| 41 | +Use the provided helper script to download Qwen models and datasets (DROP, MuSiQue): |
39 | 42 |
|
40 | 43 | ```bash |
41 | | -docker pull hiyouga/verl:ngc-th2.5.1-cu120-vllm0.7.4-hotfix |
| 44 | +python CARE/scripts/load_script/load_dataset.py |
42 | 45 | ``` |
43 | | - |
44 | | -### Hardware Requirements |
45 | | - |
46 | | -\* *estimated* |
47 | | - |
48 | | -| Method | Bits | 1.5B | 3B | 7B | |
49 | | -| ------------------------ | ---- | ------ | ------ | ------ | |
50 | | -| GRPO Full Fine-Tuning | AMP | 2*24GB | 4*40GB | 8*40GB | |
51 | | - |
52 | | -> [!NOTE] |
53 | | -> At least 2 GPUs are needed to run EasyR1. |
54 | | -> |
55 | | -> We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates. |
56 | | -
|
57 | | -## Tutorial: Run Qwen2.5-VL GRPO on [Geometry3K](https://huggingface.co/datasets/hiyouga/geometry3k) Dataset in Just 3 Steps |
58 | | - |
59 | | - |
60 | | - |
61 | | -### Installation |
62 | | - |
63 | 46 | ```bash |
64 | | -git clone https://github.com/hiyouga/EasyR1.git |
65 | | -cd EasyR1 |
66 | | -pip install -e . |
| 47 | +python CARE/scripts/load_script/load_model.py |
67 | 48 | ``` |
| 49 | +This will save all resources under `CARE/datasets/`. |
68 | 50 |
|
69 | | -### GRPO Training |
| 51 | +--- |
70 | 52 |
|
71 | | -```bash |
72 | | -bash examples/run_qwen2_5_vl_7b_geo.sh |
73 | | -``` |
| 53 | +## 🚀 Reinforcement Learning |
74 | 54 |
|
75 | | -### Merge Checkpoint in Hugging Face Format |
| 55 | +We provide ready-to-run training examples. |
| 56 | +For Qwen2.5-7B with DROP + MuSiQue: |
76 | 57 |
|
77 | 58 | ```bash |
78 | | -python3 scripts/model_merger.py --local_dir path_to_your_last_actor_checkpoint |
| 59 | +bash CARE/scripts/training_examples/run_qwen2_5_7b_retrieve_mix_musique.sh |
79 | 60 | ``` |
80 | 61 |
|
81 | | -> [!TIP] |
82 | | -> If you encounter issues with connecting to Hugging Face, consider using `export HF_ENDPOINT=https://hf-mirror.com`. |
83 | | -> |
84 | | -> If you want to use SwanLab logger, consider using `bash examples/run_qwen2_5_vl_7b_geo_swanlab.sh`. |
85 | | -
|
86 | | -## Custom Dataset |
87 | | - |
88 | | -Please refer to the example datasets to prepare your own dataset. |
| 62 | +Edit the script to change: |
89 | 63 |
|
90 | | -- Text dataset: https://huggingface.co/datasets/hiyouga/math12k |
91 | | -- Vision-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k |
| 64 | +* **MODEL\_PATH** → local checkpoint path or Hugging Face repo id. |
| 65 | +* **data.train\_files / data.extra\_files / data.val\_files** → datasets. |
| 66 | +* **SYSTEM\_PROMPT** → reasoning style prompt. |
| 67 | +* **trainer.max\_steps / trainer.n\_gpus\_per\_node** → training setup. |
92 | 68 |
|
93 | | -> [!TIP] |
94 | | -> EasyR1 already supports multi-image dataset. |
| 69 | +--- |
95 | 70 |
|
96 | | -## How to Understand GRPO in EasyR1 |
| 71 | +## 📊 Results |
97 | 72 |
|
98 | | - |
99 | 73 |
|
100 | | -- To learn about the GRPO algorithm, you can refer to [Hugging Face's blog](https://huggingface.co/docs/trl/v0.15.2/en/grpo_trainer). |
101 | | -- Different from TRL's GRPO trainer, our trainer supports mini-batch update as described in the [original PPO paper](https://arxiv.org/abs/1707.06347). |
| 74 | +--- |
102 | 75 |
|
103 | | -## Other Baselines |
| 76 | +### Benchmark Comparison |
104 | 77 |
|
105 | | -We also implemented the following two baselines from [R1-V](https://github.com/deep-agent/R1-V) project. |
106 | | -- [CLEVR-70k-Counting](examples/run_qwen2_5_vl_3b_clevr.sh): Train the Qwen2.5-VL-3B-Instruct model on counting problem. |
107 | | -- [GeoQA-8k](examples/run_qwen2_5_vl_3b_geoqa8k.sh): Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem. |
| 78 | +| Model | Method | MFQA | HotpotQA | 2WikiMQA | MuSiQue | Average | |
| 79 | +| ---------------- | ----------- | --------- | --------- | --------- | --------- | --------- | |
| 80 | +| **LLaMA-3.1 8B** | Original | *45.57* | *54.64* | 45.87 | 32.08 | 44.54 | |
| 81 | +| | R1-Searcher | 28.44 | 53.71 | *67.10* | *41.41* | *47.67* | |
| 82 | +| | **CARE** | **49.94** | **63.09** | **75.29** | **51.00** | **59.83** | |
| 83 | +| **Qwen2.5 7B** | Original | 46.94 | *58.47* | 46.96 | 30.78 | 45.79 | |
| 84 | +| | R1-Searcher | 28.36 | 55.43 | *65.79* | *47.09* | *49.17* | |
| 85 | +| | **CARE** | **48.11** | **63.45** | **70.11** | 45.57 | **56.81** | |
| 86 | +| **Qwen2.5 14B** | Original | 47.58 | *61.94* | *59.05* | *37.99* | *51.64* | |
| 87 | +| | CRAG | **50.89** | 44.74 | 34.68 | 28.17 | 39.62 | |
| 88 | +| | **CARE** | *48.81* | **67.75** | **78.68** | **51.27** | **61.63** | |
108 | 89 |
|
109 | | -## Awesome Work using EasyR1 |
| 90 | +--- |
110 | 91 |
|
111 | | -- MMR1: Advancing the Frontiers of Multimodal Reasoning ([repo](https://github.com/LengSicong/MMR1)). |
112 | | -- Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models ([paper](https://arxiv.org/abs/2503.06749), [repo](https://github.com/Osilly/Vision-R1)). |
| 92 | +### Ablation Study |
113 | 93 |
|
114 | | -## TODO |
| 94 | +| Setting | SFT | RL | Retrieval | Curriculum | MFQA | HotpotQA | 2WikiMQA | MuSiQue | CofCA | Average | |
| 95 | +| ------------ | --- | -- | --------- | ---------- | -------- | -----------| -----------| ----------| ----------| ----------| |
| 96 | +| Baseline | ✗ | ✗ | ✗ | ✗ | _46.64_ | 58.47 | 46.96 | 30.78 | 58.38 | 48.25 | |
| 97 | +| SFT Only | ✓ | ✗ | ✗ | ✗ | 42.24 | 47.08 | 61.51 | 33.82 | 59.21 | 48.77 | |
| 98 | +| No Retrieval | ✓ | ✓ | ✗ | ✗ | 37.66 | 62.59 | _70.57_ | 43.85 | 57.26 | 54.39 | |
| 99 | +| No Curriculum| ✓ | ✓ | ✓ | ✗ | 38.33 | **64.10** | **70.69** | **47.49** | _60.60_ | _56.24_ | |
| 100 | +| **CARE** | ✓ | ✓ | ✓ | ✓ | **48.11**| _63.45_ | 70.11 | _45.57_ | **64.56** | **58.36** | |
115 | 101 |
|
116 | | -- Support PPO, Reinforce++ and RLOO for VLMs. |
117 | | -- Support ulysses parallelism for VLMs. |
118 | | -- Support more VLM architectures. |
119 | | - |
120 | | -> [!NOTE] |
121 | | -> We will not provide scripts for supervised fine-tuning and inference in this project. If you have such requirements, we recommend using [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). |
122 | | -
|
123 | | -### Known bugs |
124 | | - |
125 | | -These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates. |
126 | | - |
127 | | -- Vision language models are not compatible with ulysses parallelism yet. |
128 | | - |
129 | | -## Discussion Group |
130 | | - |
131 | | -👋 Join our [WeChat group](assets/wechat.jpg). |
132 | | - |
133 | | -## Citation |
134 | | - |
135 | | -Core contributors: [Yaowei Zheng](https://github.com/hiyouga), [Junting Lu](https://github.com/AL-377), [Shenzhi Wang](https://github.com/Shenzhi-Wang), [Zhangchi Feng](https://github.com/BUAADreamer), [Dongdong Kuang](https://github.com/Kuangdd01) and Yuwen Xiong |
136 | | - |
137 | | -We also thank Guangming Sheng and Chi Zhang for helpful discussions. |
138 | | - |
139 | | -```bibtex |
140 | | -@misc{zheng2025easyr1, |
141 | | - title = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework}, |
142 | | - author = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang, Yuwen Xiong}, |
143 | | - howpublished = {\url{https://github.com/hiyouga/EasyR1}}, |
144 | | - year = {2025} |
145 | | -} |
146 | | -``` |
147 | | - |
148 | | -We recommend to also cite the original work. |
149 | | - |
150 | | -```bibtex |
151 | | -@article{sheng2024hybridflow, |
152 | | - title = {HybridFlow: A Flexible and Efficient RLHF Framework}, |
153 | | - author = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu}, |
154 | | - year = {2024}, |
155 | | - journal = {arXiv preprint arXiv: 2409.19256} |
156 | | -} |
157 | | -``` |
| 102 | +📌 *Whether to enable curriculum learning can be controlled in* |
| 103 | +[`CARE/verl/trainer/config.py`](CARE/verl/trainer/config.py). |
0 commit comments