Skip to content

Commit e5bbcfd

Browse files
committed
Add logo and polish current readme
1 parent a72b585 commit e5bbcfd

File tree

2 files changed

+211
-52
lines changed

2 files changed

+211
-52
lines changed

README.md

Lines changed: 211 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1,96 +1,255 @@
1-
# APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
2-
## About
3-
### Background: Why the sampling-training loop of synchronous RL is dragged down by the "long tail"
1+
<div align="center">
2+
<img src="imgs/APRIL.png" alt="APRIL Logo" width="400">
43

5-
In on-policy RLHF/GR?O training, the system enters an update phase only after collecting **N** rollout samples in a "round." Due to the inconsistent lengths of generated samples, the system has to wait for a few **long-tail samples** to complete before starting the training phase. This leads to decreased GPU utilization and lower throughput in the later stages of the rollout phase.
4+
# APRIL: Active Partial Rollouts in Reinforcement Learning
65

7-
### What We Did: Active Partial Rollout (APRIL)
6+
**Accelerating LLM Training by Taming Long-tail Generation**
87

9-
**Core Idea**: In each round, we **over-sample** (N' > N) and **actively interrupt** the remaining in-progress requests once the target of **N** completed samples is reached. The **unfinished responses** are stored in a **buffer** and are **prioritized for continued rollout** in the next round, thereby mitigating the efficiency degradation caused by long-tail requests.
8+
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
9+
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
10+
[![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-orange.svg)](https://pytorch.org/)
11+
12+
</div>
13+
14+
## 🚀 Overview
15+
16+
**APRIL** (Active Partial Rollouts) is a compute-efficient method to accelerate rollout generation in reinforcement learning training for Large Language Models (LLMs). By addressing the critical "long-tail" problem in RL training where a few samples with exceptionally long responses cause the entire batch to stall, APRIL delivers:
17+
18+
- **20-35% improvement** in rollout throughput
19+
- **2-5% higher** final model accuracy
20+
- **Faster convergence** during training
21+
- **Hardware agnostic** - supports both NVIDIA and AMD GPUs
22+
23+
### The Problem: Long-tail Generation Bottleneck
24+
25+
In on-policy RL training (RLHF/GRPO/DAPO), the rollout phase dominates runtime, typically accounting for **over 90%** of total training time. Due to the highly variable response lengths across samples, synchronous training paradigms suffer from severe GPU underutilization as faster-generating workers sit idle waiting for the longest-running instances to complete.
26+
27+
### Our Solution: Active Partial Rollouts
28+
29+
APRIL revolutionizes rollout efficiency through an innovative mechanism:
30+
31+
1. **Over-provisioning**: Deliberately initiate more rollout requests than needed (N' > N)
32+
2. **Active interruption**: Once the target batch size is reached, actively stop remaining unfinished rollouts
33+
3. **Intelligent recycling**: Store partial results in a buffer and resume generation in the next iteration
34+
4. **Seamless integration**: Works with existing RL frameworks without modifying inference kernels
1035

1136
![scheduling](./imgs/partial_scheduling.png)
12-
### Highlights
1337

14-
- **Over-sampling**: Assuming the training phase requires `rollout_batch_size=32` complete samples per round, we actually initiate a larger sampling request, i.e., `over_sampling_batch_size=64`.
15-
- **Stop upon collection**: As soon as the number of collected complete sample groups reaches `rollout_batch_size`, an `abort` signal is immediately sent to the sglang router.
16-
- **Collect and reuse**: Upon receiving the `abort` signal, sglang stops the ongoing generation tasks and returns their partially generated portions (half-completed trajectories). This partial data is not discarded but is stored in a buffer. When the next rollout round begins, they continue generating from where they left off, along with new prompts, thus achieving seamless reuse across iteration steps.
17-
- **Elegant implementation**: Slime's partial rollout provides a more native and lightweight optimization solution that is less intrusive to the original pipeline. You can enable it out-of-the-box simply by setting the `--partial-rollout` flag and specifying `--over-sampling-batch-size`.
38+
## ✨ Key Features
39+
40+
- **🔥 Plug-and-play**: Enable with just two command-line flags (`--partial-rollout` and `--over-sampling-batch-size`)
41+
- **🎯 Algorithm-agnostic**: Compatible with GRPO, DAPO, GSPO, and other popular RL algorithms
42+
- **🏗️ Framework-ready**: Already integrated into [slime](https://github.com/THUDM/slime) framework
43+
- **⚡ System-level optimization**: Operates at the scheduling layer, complementary to kernel-level optimizations
44+
- **🔧 Production-tested**: Evaluated on multiple LLMs including DeepSeek-R1, Qwen3, and GLM-4
1845

19-
## Three Steps to Get Started
46+
## 🛠️ Installation
2047

21-
### 1) Environment Setup (Requires an AMD GPU)
48+
### Quick Start with Docker
2249

23-
**Start docker**
50+
#### For AMD GPUs:
2451
```bash
2552
docker run --rm --gpus all --ipc=host --shm-size=16g \
2653
--ulimit memlock=-1 --ulimit stack=67108864 \
2754
-it rlsys/slime:slime_ubuntu22.04_rocm6.3.4-patch-numa-patch_sglang0.4.9_megatron-patch_ray2.47.1_apex_torch-memory-saver0.0.8-patch-vim /bin/bash
2855
```
29-
### 2) Install APRIL
56+
57+
#### For NVIDIA GPUs:
58+
See [NVIDIA setup guide](./docs/en/build.md)
59+
60+
### Install APRIL
3061

3162
```bash
32-
git clone [https://github.com/RLsys-Foundation/APRIL.git](https://github.com/RLsys-Foundation/APRIL.git)
63+
git clone https://github.com/RLsys-Foundation/APRIL.git
3364
cd APRIL
3465
pip install -e .
3566
```
3667

37-
### 3) Run an Example
68+
## 🚦 Quick Start
69+
70+
### Basic Usage
3871

39-
All scripts are in the `scripts/partial_rollout/` directory.
72+
Run a training example with APRIL enabled:
4073

4174
```bash
75+
# Example: Qwen3-4B with DAPO
4276
bash scripts/partial_rollout/qwen/grpo/run-qwen3-4B-dapo-partial.sh
4377
```
44-
### 4) Parameter Details
4578

46-
The core functionality of partial rollout is controlled by the following parameters:
47-
```bash
48-
# Enable the partial rollout feature
49-
# Set this parameter to enable the mechanism of stopping generation upon reaching the target count + recycling unfinished samples
79+
### Key Parameters
80+
81+
```python
82+
# Enable APRIL optimization
5083
--partial-rollout
5184

52-
# The batch size for sampling. This parameter controls the sampling granularity per round.
53-
# If this parameter > rollout_batch_size, over-sampling is performed.
54-
# If this parameter < rollout_batch_size, sampling will continue at this granularity until rollout_batch_size samples are collected.
55-
--over-sampling-batch-size 16
85+
# Set over-sampling batch size (should be > rollout_batch_size)
86+
--over-sampling-batch-size 64 # e.g., 2x the rollout_batch_size
87+
88+
# Standard rollout batch size
89+
--rollout-batch-size 32
5690
```
57-
For other parameters, please refer to the arguments in [arguments.py](./slime/utils/arguments.py). For more details, you can consult the original [slime](https://github.com/THUDM/slime) repository.
58-
## Results and Comparison (Abridged)
5991

60-
| Dataset | Model | Metric | APRIL vs. Baseline |
61-
|---------------|----------|------------------|-----------------------|
62-
| DAPO‑Math‑17k | Qwen3‑4B | Rollout Throughput | **+17%** |
63-
| DeepScaleR | Qwen3‑4B | Rollout Throughput | **+21%** |
64-
| DeepMath‑103K | Qwen3‑4B | Rollout Throughput | **+35%** |
92+
### Advanced Configuration
93+
94+
For detailed parameter explanations, see [arguments.py](./slime/utils/arguments.py).
95+
## 📊 Performance Results
96+
97+
### Throughput Improvements
98+
99+
| Dataset | Model | Algorithm | Throughput Gain | Accuracy Improvement |
100+
|---------------|----------|-----------|-----------------|---------------------|
101+
| DAPO-Math-17k | Qwen3-4B | DAPO | **+17%** | +2.3% |
102+
| DeepScaleR | Qwen3-4B | GRPO | **+21%** | +3.1% |
103+
| DeepMath-103K | Qwen3-4B | GSPO | **+35%** | +4.7% |
104+
| Agent Tasks | DeepSeek-1.5B | GRPO | **+23%** | +2.8% |
105+
106+
### Convergence Analysis
65107

66108
![evaluation](./imgs/eval_dapo_qwen.png)
67109

68-
## Frequently Asked Questions (FAQ)
110+
APRIL not only improves training efficiency but also achieves:
111+
- **Faster convergence**: Reaches target accuracy 15-20% faster
112+
- **Higher final accuracy**: 2-5% improvement in final model performance
113+
- **Stable training**: No additional instability despite partial off-policy samples
114+
115+
## 🏗️ Architecture
116+
117+
### System Design
118+
119+
```
120+
┌─────────────────────────────────────────────────────┐
121+
│ Training Pipeline │
122+
├─────────────────────────────────────────────────────┤
123+
│ │
124+
│ ┌──────────────┐ ┌──────────────┐ │
125+
│ │ Rollout │───▶│ Buffer │ │
126+
│ │ Engine │ │ Management │ │
127+
│ │ (SGLang) │ └──────────────┘ │
128+
│ └──────────────┘ │ │
129+
│ ▲ ▼ │
130+
│ │ ┌──────────────┐ │
131+
│ │ │ Training │ │
132+
│ └────────────│ Engine │ │
133+
│ │ (Megatron/FSDP)│ │
134+
│ └──────────────┘ │
135+
└─────────────────────────────────────────────────────┘
136+
```
137+
138+
### Core Components
139+
140+
| Component | Path | Description |
141+
|-----------|------|-------------|
142+
| **Rollout Engine** | `slime/rollout/sglang_example.py` | Manages generation with active interruption |
143+
| **Buffer System** | `slime/ray/buffer.py` | Stores and prioritizes partial rollouts |
144+
| **Scheduler** | `slime/ray/rollout.py` | Orchestrates over-sampling and batch management |
145+
| **Training Backend** | `slime/backends/` | Supports both Megatron and FSDP |
146+
147+
## ❓ FAQ
148+
149+
### Q: Does APRIL affect training stability?
150+
151+
While APRIL introduces ~40% off-policy tokens per iteration, extensive experiments show:
152+
- No significant training instability
153+
- Improved final model accuracy
154+
- Consistent convergence patterns
155+
156+
> **Note**: For extremely long sequences (e.g., multi-turn agent tasks), additional validation may be needed.
157+
158+
### Q: Is APRIL compatible with other optimizations?
69159

70-
- **Q: Will APRIL affect policy purity and convergence?**
71-
- A: It will definitely have an impact on policy purity; the proportion of off-policy tokens in one round is about 40%. However, from both an engineering and experimental perspective, partial rollout has not introduced significant instability under the current settings. Further verification is needed for tasks with a much larger `max_response_length` (e.g., agent tasks, multi-turn tasks).
160+
Yes! APRIL operates at the **system scheduling layer** and is fully compatible with:
161+
- Kernel optimizations (FlashAttention, continuous batching)
162+
- Inference engines (vLLM, SGLang, TensorRT-LLM)
163+
- Speculative decoding techniques
164+
- Model parallelism strategies
72165

73-
- **Q: Are changes to the decoding kernel required?**
74-
- A: No. APRIL operates at the **system scheduling layer** and does not conflict with inference acceleration techniques like speculative decoding or continuous batching. Instead, they are complementary and can be stacked.
166+
### Q: What hardware is supported?
75167

76-
## Directory Structure
168+
APRIL is hardware-agnostic and tested on:
169+
- **NVIDIA GPUs**: H100
170+
- **AMD GPUs**: MI300X
171+
172+
## 📁 Repository Structure
77173

78174
```
79175
APRIL/
176+
├── imgs/ # Documentation images
177+
│ ├── APRIL.png # Project logo
178+
│ └── partial_scheduling.png # Architecture diagrams
80179
├── scripts/
81-
│ └── partial_rollout/
82-
│ ├── deepseek/ # Experiment code for deepseek-r1-distill-1.5B
83-
│ └── qwen/ # Experiment code for qwen3-4B
84-
├── slime/
85-
│ ├── backends/
180+
│ └── partial_rollout/ # Training scripts
181+
│ ├── deepseek/ # DeepSeek model experiments
182+
│ ├── qwen/ # Qwen model experiments
183+
│ └── README.md # Script documentation
184+
├── slime/ # Core framework
185+
│ ├── backends/ # Training backends
186+
│ │ ├── fsdp_utils/ # FSDP implementation
187+
│ │ └── megatron_utils/ # Megatron-LM support
86188
│ ├── rollout/
87-
│ │ └── sglang_example.py # Core sampling code
88-
│ ├── ray/ # Core scheduling logic
89-
│ │ └── buffer.py # Buffer implementation code
90-
│ └── utils/
91-
└── tools/ # Megatron format conversion tools
189+
│ │ ├── sglang_example.py # Core rollout implementation
190+
│ │ └── rm_hub/ # Reward model integrations
191+
│ ├── ray/ # Distributed orchestration
192+
│ │ ├── buffer.py # Partial rollout buffer
193+
│ │ └── rollout.py # Rollout scheduling
194+
│ └── utils/ # Utilities and helpers
195+
├── docs/ # Documentation
196+
│ ├── en/ # English docs
197+
│ └── zh/ # Chinese docs
198+
└── tools/ # Model conversion utilities
199+
```
200+
201+
## 🔬 Technical Details
202+
203+
### How APRIL Works
204+
205+
1. **Over-provisioning Phase**: Request N' = αN rollouts (α typically 1.5-2.0)
206+
2. **Active Monitoring**: Track completion status across all workers
207+
3. **Intelligent Interruption**: Send abort signal when N samples complete
208+
4. **Buffer Management**: Store partial results with generation state
209+
5. **Seamless Resumption**: Continue partial rollouts in next iteration
210+
211+
### Integration with Existing Frameworks
92212

213+
APRIL is designed as a drop-in enhancement for existing RL training pipelines:
214+
- **Minimal code changes**: Enable with command-line flags
215+
- **Framework agnostic**: Works with OpenRLHF, verl, Areal, slime
216+
- **Automatic optimization**: Self-tuning based on workload characteristics
217+
218+
## 📚 Citation
219+
220+
If you use APRIL in your research, please cite our paper:
221+
222+
```bibtex
223+
@article{april2025,
224+
title={APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation},
225+
author={RLsys Foundation Team},
226+
journal={arXiv preprint},
227+
year={2025}
228+
}
93229
```
94-
## Paper
95230

96-
(TODO: arXiv link for the paper)
231+
## 🤝 Contributing
232+
233+
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
234+
235+
## 📄 License
236+
237+
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
238+
239+
## 🙏 Acknowledgments
240+
241+
APRIL builds upon the excellent work of:
242+
- [slime](https://github.com/THUDM/slime) - The base RL training framework
243+
- [SGLang](https://github.com/sgl-project/sglang) - High-performance inference backend
244+
- [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) - Distributed training backend
245+
246+
## 📬 Contact
247+
248+
For questions and support:
249+
- Open an issue on [GitHub](https://github.com/RLsys-Foundation/APRIL/issues)
250+
251+
---
252+
253+
<div align="center">
254+
<sub>Built with ❤️ by the RLsys Foundation Team</sub>
255+
</div>

imgs/APRIL.png

841 KB
Loading

0 commit comments

Comments
 (0)