Skip to content

Commit 248f7d9

Browse files
authored
Merge pull request #1 from FoundationAgents/release/0.1.0
Release/0.1.0
2 parents 85cfe4d + 0ccfc92 commit 248f7d9

33 files changed

Lines changed: 566 additions & 635 deletions

Dockerfile

Lines changed: 0 additions & 58 deletions
This file was deleted.

README.md

Lines changed: 68 additions & 122 deletions
Original file line numberDiff line numberDiff line change
@@ -1,157 +1,103 @@
1-
# EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework
1+
# CARE: Improving Context Fidelity via Native Retrieval-Augmented Reasoning
22

3-
This project is a clean fork of the original [veRL](https://github.com/volcengine/verl) project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.
43

5-
EasyR1 is efficient and scalable due to the design of **[HybirdEngine](https://arxiv.org/abs/2409.19256)** and the latest release of **[vLLM](https://github.com/vllm-project/vllm)**'s SPMD mode.
64

7-
## Features
5+
Large language models (LLMs) often struggle with **context fidelity**, producing inconsistent or hallucinated answers even when relevant information is present.
6+
We propose **CARE**, a native retrieval-augmented reasoning framework that integrates in-context evidence directly into the reasoning chain.
7+
This work represents a step toward making LLMs more accurate, reliable, and efficient for knowledge-intensive tasks.
8+
### Results Overview
9+
<p align="center">
10+
<img src="assets/retrieval_results.png" alt="CARE Results" width="85%" style="display:inline-block;"/>
11+
</p>
812

9-
- Supported models
10-
- Llama3/Qwen2/Qwen2.5 language models
11-
- Qwen2/Qwen2.5-VL vision language models
12-
- DeepSeek-R1 distill models
13+
### Method Overview
14+
<p align="center">
15+
<img src="assets/method.png" width="80%">
16+
</p>
17+
---
1318

14-
- Supported algorithms
15-
- GRPO
16-
- Remax
17-
- others RL algorithms (comming soon)
1819

19-
- Supported datasets
20-
- Any text, vision-text dataset in a [specific format](#custom-dataset).
20+
## 🔧 Installation
2121

22-
- Supported tricks
23-
- Padding-free training
24-
- Resuming from checkpoint
25-
- Wandb & SwanLab tracking
22+
Requirements:
23+
- Python **3.9+**
24+
- [requirements.txt](./requirements.txt) includes:
25+
- `transformers>=4.51.0`
26+
- `flash-attn>=2.4.3`
27+
- `vllm>=0.8.3`
2628

27-
## Requirements
29+
Clone and install:
30+
```bash
31+
git clone https://github.com/FoundationAgents/CARE
32+
cd CARE
33+
pip install -r requirements.txt
34+
pip install -e .
2835

29-
### Software Requirements
3036

31-
- Python 3.9+
32-
- transformers>=4.49.0
33-
- flash-attn>=2.4.3
34-
- vllm>=0.7.3
37+
---
3538

36-
We provide a [Dockerfile](./Dockerfile) to easily build environments.
39+
## 📥 Data and Model Download
3740

38-
We recommend using the [pre-built docker image](https://hub.docker.com/r/hiyouga/verl) in EasyR1.
41+
Use the provided helper script to download Qwen models and datasets (DROP, MuSiQue):
3942

4043
```bash
41-
docker pull hiyouga/verl:ngc-th2.5.1-cu120-vllm0.7.4-hotfix
44+
python CARE/scripts/load_script/load_dataset.py
4245
```
43-
44-
### Hardware Requirements
45-
46-
\* *estimated*
47-
48-
| Method | Bits | 1.5B | 3B | 7B |
49-
| ------------------------ | ---- | ------ | ------ | ------ |
50-
| GRPO Full Fine-Tuning | AMP | 2*24GB | 4*40GB | 8*40GB |
51-
52-
> [!NOTE]
53-
> At least 2 GPUs are needed to run EasyR1.
54-
>
55-
> We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates.
56-
57-
## Tutorial: Run Qwen2.5-VL GRPO on [Geometry3K](https://huggingface.co/datasets/hiyouga/geometry3k) Dataset in Just 3 Steps
58-
59-
![image](assets/qwen2_5_vl_7b_geo.png)
60-
61-
### Installation
62-
6346
```bash
64-
git clone https://github.com/hiyouga/EasyR1.git
65-
cd EasyR1
66-
pip install -e .
47+
python CARE/scripts/load_script/load_model.py
6748
```
49+
This will save all resources under `CARE/datasets/`.
6850

69-
### GRPO Training
51+
---
7052

71-
```bash
72-
bash examples/run_qwen2_5_vl_7b_geo.sh
73-
```
53+
## 🚀 Reinforcement Learning
7454

75-
### Merge Checkpoint in Hugging Face Format
55+
We provide ready-to-run training examples.
56+
For Qwen2.5-7B with DROP + MuSiQue:
7657

7758
```bash
78-
python3 scripts/model_merger.py --local_dir path_to_your_last_actor_checkpoint
59+
bash CARE/scripts/training_examples/run_qwen2_5_7b_retrieve_mix_musique.sh
7960
```
8061

81-
> [!TIP]
82-
> If you encounter issues with connecting to Hugging Face, consider using `export HF_ENDPOINT=https://hf-mirror.com`.
83-
>
84-
> If you want to use SwanLab logger, consider using `bash examples/run_qwen2_5_vl_7b_geo_swanlab.sh`.
85-
86-
## Custom Dataset
87-
88-
Please refer to the example datasets to prepare your own dataset.
62+
Edit the script to change:
8963

90-
- Text dataset: https://huggingface.co/datasets/hiyouga/math12k
91-
- Vision-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k
64+
* **MODEL\_PATH**local checkpoint path or Hugging Face repo id.
65+
* **data.train\_files / data.extra\_files / data.val\_files** → datasets.
66+
* **SYSTEM\_PROMPT** → reasoning style prompt.
67+
* **trainer.max\_steps / trainer.n\_gpus\_per\_node** → training setup.
9268

93-
> [!TIP]
94-
> EasyR1 already supports multi-image dataset.
69+
---
9570

96-
## How to Understand GRPO in EasyR1
71+
## 📊 Results
9772

98-
![image](assets/easyr1_grpo.png)
9973

100-
- To learn about the GRPO algorithm, you can refer to [Hugging Face's blog](https://huggingface.co/docs/trl/v0.15.2/en/grpo_trainer).
101-
- Different from TRL's GRPO trainer, our trainer supports mini-batch update as described in the [original PPO paper](https://arxiv.org/abs/1707.06347).
74+
---
10275

103-
## Other Baselines
76+
### Benchmark Comparison
10477

105-
We also implemented the following two baselines from [R1-V](https://github.com/deep-agent/R1-V) project.
106-
- [CLEVR-70k-Counting](examples/run_qwen2_5_vl_3b_clevr.sh): Train the Qwen2.5-VL-3B-Instruct model on counting problem.
107-
- [GeoQA-8k](examples/run_qwen2_5_vl_3b_geoqa8k.sh): Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem.
78+
| Model | Method | MFQA | HotpotQA | 2WikiMQA | MuSiQue | Average |
79+
| ---------------- | ----------- | --------- | --------- | --------- | --------- | --------- |
80+
| **LLaMA-3.1 8B** | Original | *45.57* | *54.64* | 45.87 | 32.08 | 44.54 |
81+
| | R1-Searcher | 28.44 | 53.71 | *67.10* | *41.41* | *47.67* |
82+
| | **CARE** | **49.94** | **63.09** | **75.29** | **51.00** | **59.83** |
83+
| **Qwen2.5 7B** | Original | 46.94 | *58.47* | 46.96 | 30.78 | 45.79 |
84+
| | R1-Searcher | 28.36 | 55.43 | *65.79* | *47.09* | *49.17* |
85+
| | **CARE** | **48.11** | **63.45** | **70.11** | 45.57 | **56.81** |
86+
| **Qwen2.5 14B** | Original | 47.58 | *61.94* | *59.05* | *37.99* | *51.64* |
87+
| | CRAG | **50.89** | 44.74 | 34.68 | 28.17 | 39.62 |
88+
| | **CARE** | *48.81* | **67.75** | **78.68** | **51.27** | **61.63** |
10889

109-
## Awesome Work using EasyR1
90+
---
11091

111-
- MMR1: Advancing the Frontiers of Multimodal Reasoning ([repo](https://github.com/LengSicong/MMR1)).
112-
- Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models ([paper](https://arxiv.org/abs/2503.06749), [repo](https://github.com/Osilly/Vision-R1)).
92+
### Ablation Study
11393

114-
## TODO
94+
| Setting | SFT | RL | Retrieval | Curriculum | MFQA | HotpotQA | 2WikiMQA | MuSiQue | CofCA | Average |
95+
| ------------ | --- | -- | --------- | ---------- | -------- | -----------| -----------| ----------| ----------| ----------|
96+
| Baseline ||||| _46.64_ | 58.47 | 46.96 | 30.78 | 58.38 | 48.25 |
97+
| SFT Only ||||| 42.24 | 47.08 | 61.51 | 33.82 | 59.21 | 48.77 |
98+
| No Retrieval ||||| 37.66 | 62.59 | _70.57_ | 43.85 | 57.26 | 54.39 |
99+
| No Curriculum||||| 38.33 | **64.10** | **70.69** | **47.49** | _60.60_ | _56.24_ |
100+
| **CARE** ||||| **48.11**| _63.45_ | 70.11 | _45.57_ | **64.56** | **58.36** |
115101

116-
- Support PPO, Reinforce++ and RLOO for VLMs.
117-
- Support ulysses parallelism for VLMs.
118-
- Support more VLM architectures.
119-
120-
> [!NOTE]
121-
> We will not provide scripts for supervised fine-tuning and inference in this project. If you have such requirements, we recommend using [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).
122-
123-
### Known bugs
124-
125-
These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.
126-
127-
- Vision language models are not compatible with ulysses parallelism yet.
128-
129-
## Discussion Group
130-
131-
👋 Join our [WeChat group](assets/wechat.jpg).
132-
133-
## Citation
134-
135-
Core contributors: [Yaowei Zheng](https://github.com/hiyouga), [Junting Lu](https://github.com/AL-377), [Shenzhi Wang](https://github.com/Shenzhi-Wang), [Zhangchi Feng](https://github.com/BUAADreamer), [Dongdong Kuang](https://github.com/Kuangdd01) and Yuwen Xiong
136-
137-
We also thank Guangming Sheng and Chi Zhang for helpful discussions.
138-
139-
```bibtex
140-
@misc{zheng2025easyr1,
141-
title = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework},
142-
author = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang, Yuwen Xiong},
143-
howpublished = {\url{https://github.com/hiyouga/EasyR1}},
144-
year = {2025}
145-
}
146-
```
147-
148-
We recommend to also cite the original work.
149-
150-
```bibtex
151-
@article{sheng2024hybridflow,
152-
title = {HybridFlow: A Flexible and Efficient RLHF Framework},
153-
author = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
154-
year = {2024},
155-
journal = {arXiv preprint arXiv: 2409.19256}
156-
}
157-
```
102+
📌 *Whether to enable curriculum learning can be controlled in*
103+
[`CARE/verl/trainer/config.py`](CARE/verl/trainer/config.py).

assets/easyr1_grpo.png

-845 KB
Binary file not shown.

assets/intro.pdf

406 KB
Binary file not shown.

assets/intro.png

141 KB
Loading

assets/method.pdf

2.96 MB
Binary file not shown.

assets/method.png

1.35 MB
Loading

assets/qwen2_5_vl_7b_geo.png

-81 KB
Binary file not shown.

assets/retrieval_results.pdf

18.5 KB
Binary file not shown.

assets/retrieval_results.png

217 KB
Loading

0 commit comments

Comments
 (0)