Skip to content

Commit 087a35c

Browse files
committed
Merge branch 'main' into dev
Merge upstream changes while preserving CISPO implementation: - Add CISPO to advantage-estimator choices alongside on_policy_distillation - Keep compute_cispo_loss function in ppo_utils.py - Integrate OPSM mask support from main - Use refactored offload_train/onload_rollout helpers from main
2 parents 585a844 + a4a59ea commit 087a35c

File tree

251 files changed

+18945
-6699
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

251 files changed

+18945
-6699
lines changed

.github/workflows/conda-ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ jobs:
7777
micromamba activate slime
7878
export CUDA_HOME="$CONDA_PREFIX"
7979
80-
bash tests/test_qwen3-30B-A3B.sh
80+
SLIME_TEST_USE_DEEPEP=0 SLIME_TEST_USE_FP8_ROLLOUT=0 python tests/test_qwen3_30B_A3B.py
8181
shell: bash
8282

8383
- name: Cleanup
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
from pathlib import Path
2+
import jinja2
3+
4+
5+
def main():
6+
"""
7+
Generates GitHub workflow YAML files from Jinja2 templates.
8+
"""
9+
workflows_dir = Path(__file__).parent
10+
print(f"Scan dir: {workflows_dir}")
11+
env = jinja2.Environment(
12+
loader=jinja2.FileSystemLoader(str(workflows_dir)),
13+
block_start_string="<%",
14+
block_end_string="%>",
15+
variable_start_string="<<",
16+
variable_end_string=">>",
17+
)
18+
19+
for template_path in workflows_dir.glob("*.yml.j2"):
20+
template = env.get_template(template_path.name)
21+
content = template.render()
22+
23+
yaml_path = template_path.with_suffix("")
24+
with open(yaml_path, "w") as f:
25+
f.write(
26+
"#" * 80
27+
+ "\n# This file is auto-generated from the .j2 file via generate_github_workflows.py. Do not edit manually.\n"
28+
+ "#" * 80
29+
+ "\n"
30+
)
31+
f.write(content)
32+
33+
print(f"Generated {yaml_path} from {template_path}")
34+
35+
36+
if __name__ == "__main__":
37+
main()

.github/workflows/pr-test.yml

Lines changed: 99 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
################################################################################
2+
# This file is auto-generated from the .j2 file via generate_github_workflows.py. Do not edit manually.
3+
################################################################################
4+
15
name: PR Test
26

37
on:
@@ -7,15 +11,102 @@ on:
711
pull_request:
812
branches: [main]
913
types: [synchronize, labeled]
14+
workflow_dispatch:
15+
inputs:
16+
infinite_run:
17+
description: 'Run training infinitely'
18+
required: false
19+
type: boolean
20+
default: false
1021

1122
concurrency:
1223
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
1324
cancel-in-progress: true
1425

1526
jobs:
16-
e2e-test:
17-
# TODO may use run-ci label etc
18-
if: github.event.pull_request.draft == false
27+
28+
e2e-test-short:
29+
if: (github.event_name == 'workflow_dispatch') || (github.event.pull_request && contains(github.event.pull_request.labels.*.name, 'run-ci-short'))
30+
runs-on: self-hosted
31+
container:
32+
image: slimerl/slime:latest
33+
options: >
34+
--gpus all
35+
--ipc=host
36+
--shm-size=16g
37+
--ulimit memlock=-1
38+
--ulimit stack=67108864
39+
--memory=0
40+
--memory-swap=0
41+
-v /mnt/nvme0n1/slime_ci:/data/slime_ci
42+
-v /mnt/nvme0n1/slime_ci/models:/root/models
43+
-v /mnt/nvme0n1/slime_ci/datasets:/root/datasets
44+
strategy:
45+
fail-fast: false
46+
matrix:
47+
info: [{"num_gpus": 8, "test_file": "test_quick_start_glm4_9B.py"}, {"num_gpus": 8, "test_file": "test_qwen3_30B_A3B.py"}, {"num_gpus": 8, "test_file": "test_qwen3_4B_ppo.py"}, {"num_gpus": 8, "test_file": "test_moonlight_16B_A3B.py"}, {"num_gpus": 2, "test_file": "test_qwen3_4B_fsdp_true_on_policy.py"}]
48+
defaults:
49+
run:
50+
working-directory: ${{ github.workspace }}
51+
env:
52+
GITHUB_COMMIT_NAME: ${{ github.sha }}_${{ github.event.pull_request.number || 'non-pr' }}
53+
WANDB_API_KEY: ${{ secrets.WANDB_API_KEY }}
54+
SLIME_TEST_ENABLE_INFINITE_RUN: ${{ (github.event_name == 'workflow_dispatch' && github.event.inputs.infinite_run) || 'false' }}
55+
56+
steps:
57+
- name: Checkout repository
58+
uses: actions/checkout@v4
59+
60+
- name: Install
61+
shell: bash
62+
run: cd $GITHUB_WORKSPACE && pip install -e .
63+
64+
- name: Execute
65+
shell: bash
66+
run: python tests/ci/gpu_lock_exec.py --count ${{ matrix.info.num_gpus }} -- python tests/${{ matrix.info.test_file }}
67+
68+
e2e-test-long:
69+
if: (github.event_name == 'workflow_dispatch') || (github.event.pull_request && contains(github.event.pull_request.labels.*.name, 'run-ci-long'))
70+
runs-on: self-hosted
71+
container:
72+
image: slimerl/slime:latest
73+
options: >
74+
--gpus all
75+
--ipc=host
76+
--shm-size=16g
77+
--ulimit memlock=-1
78+
--ulimit stack=67108864
79+
--memory=0
80+
--memory-swap=0
81+
-v /mnt/nvme0n1/slime_ci:/data/slime_ci
82+
-v /mnt/nvme0n1/slime_ci/models:/root/models
83+
-v /mnt/nvme0n1/slime_ci/datasets:/root/datasets
84+
strategy:
85+
fail-fast: false
86+
matrix:
87+
info: [{"num_gpus": 2, "test_file": "test_qwen2.5_0.5B_gsm8k.py"}, {"num_gpus": 2, "test_file": "test_qwen2.5_0.5B_gsm8k_async.py"}, {"num_gpus": 2, "test_file": "test_qwen3_0.6B_fsdp_colocated_2xGPU.py"}, {"num_gpus": 2, "test_file": "test_qwen3_0.6B_fsdp_distributed.py"}]
88+
defaults:
89+
run:
90+
working-directory: ${{ github.workspace }}
91+
env:
92+
GITHUB_COMMIT_NAME: ${{ github.sha }}_${{ github.event.pull_request.number || 'non-pr' }}
93+
WANDB_API_KEY: ${{ secrets.WANDB_API_KEY }}
94+
SLIME_TEST_ENABLE_INFINITE_RUN: ${{ (github.event_name == 'workflow_dispatch' && github.event.inputs.infinite_run) || 'false' }}
95+
96+
steps:
97+
- name: Checkout repository
98+
uses: actions/checkout@v4
99+
100+
- name: Install
101+
shell: bash
102+
run: cd $GITHUB_WORKSPACE && pip install -e .
103+
104+
- name: Execute
105+
shell: bash
106+
run: python tests/ci/gpu_lock_exec.py --count ${{ matrix.info.num_gpus }} -- python tests/${{ matrix.info.test_file }}
107+
108+
e2e-test-precision:
109+
if: (github.event_name == 'workflow_dispatch') || (github.event.pull_request && contains(github.event.pull_request.labels.*.name, 'run-ci-precision'))
19110
runs-on: self-hosted
20111
container:
21112
image: slimerl/slime:latest
@@ -28,21 +119,19 @@ jobs:
28119
--memory=0
29120
--memory-swap=0
30121
-v /mnt/nvme0n1/slime_ci:/data/slime_ci
31-
-v /mnt/nvme0n1/models:/root/models
32-
-v /mnt/nvme0n1/datasets:/root/datasets
122+
-v /mnt/nvme0n1/slime_ci/models:/root/models
123+
-v /mnt/nvme0n1/slime_ci/datasets:/root/datasets
33124
strategy:
34125
fail-fast: false
35126
matrix:
36-
info:
37-
- {test_file: test_quick_start_glm4_9B.py}
38-
- {test_file: test_qwen3_30B_A3B.py}
39-
# TODO use deterministic kernel
127+
info: [{"num_gpus": 8, "test_file": "test_qwen3_0.6B_parallel_check.py"}]
40128
defaults:
41129
run:
42130
working-directory: ${{ github.workspace }}
43131
env:
44132
GITHUB_COMMIT_NAME: ${{ github.sha }}_${{ github.event.pull_request.number || 'non-pr' }}
45133
WANDB_API_KEY: ${{ secrets.WANDB_API_KEY }}
134+
SLIME_TEST_ENABLE_INFINITE_RUN: ${{ (github.event_name == 'workflow_dispatch' && github.event.inputs.infinite_run) || 'false' }}
46135

47136
steps:
48137
- name: Checkout repository
@@ -54,4 +143,4 @@ jobs:
54143

55144
- name: Execute
56145
shell: bash
57-
run: python tests/${{ matrix.info.test_file }}
146+
run: python tests/ci/gpu_lock_exec.py --count ${{ matrix.info.num_gpus }} -- python tests/${{ matrix.info.test_file }}

.github/workflows/pr-test.yml.j2

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
<% set jobs = {
2+
'e2e-test-short': {
3+
'label': 'run-ci-short',
4+
'tests': [
5+
{'test_file': 'test_quick_start_glm4_9B.py', 'num_gpus': 8},
6+
{'test_file': 'test_qwen3_30B_A3B.py', 'num_gpus': 8},
7+
{'test_file': 'test_qwen3_4B_ppo.py', 'num_gpus': 8},
8+
{'test_file': 'test_moonlight_16B_A3B.py', 'num_gpus': 8},
9+
{'test_file': 'test_qwen3_4B_fsdp_true_on_policy.py', 'num_gpus': 2},
10+
],
11+
},
12+
'e2e-test-long': {
13+
'label': 'run-ci-long',
14+
'tests': [
15+
{'test_file': 'test_qwen2.5_0.5B_gsm8k.py', 'num_gpus': 2},
16+
{'test_file': 'test_qwen2.5_0.5B_gsm8k_async.py', 'num_gpus': 2},
17+
{'test_file': 'test_qwen3_0.6B_fsdp_colocated_2xGPU.py', 'num_gpus': 2},
18+
{'test_file': 'test_qwen3_0.6B_fsdp_distributed.py', 'num_gpus': 2},
19+
],
20+
},
21+
'e2e-test-precision': {
22+
'label': 'run-ci-precision',
23+
'tests': [
24+
{'test_file': 'test_qwen3_0.6B_parallel_check.py', 'num_gpus': 8},
25+
],
26+
},
27+
} %>
28+
name: PR Test
29+
30+
on:
31+
# Do not run CI on push to reduce CI time
32+
# push:
33+
# branches: [main]
34+
pull_request:
35+
branches: [main]
36+
types: [synchronize, labeled]
37+
workflow_dispatch:
38+
inputs:
39+
infinite_run:
40+
description: 'Run training infinitely'
41+
required: false
42+
type: boolean
43+
default: false
44+
45+
concurrency:
46+
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
47+
cancel-in-progress: true
48+
49+
jobs:
50+
<% for job_name, config in jobs.items() %>
51+
<< job_name >>:
52+
if: (github.event_name == 'workflow_dispatch') || (github.event.pull_request && contains(github.event.pull_request.labels.*.name, '<< config.label >>'))
53+
runs-on: self-hosted
54+
container:
55+
image: slimerl/slime:latest
56+
options: >
57+
--gpus all
58+
--ipc=host
59+
--shm-size=16g
60+
--ulimit memlock=-1
61+
--ulimit stack=67108864
62+
--memory=0
63+
--memory-swap=0
64+
-v /mnt/nvme0n1/slime_ci:/data/slime_ci
65+
-v /mnt/nvme0n1/slime_ci/models:/root/models
66+
-v /mnt/nvme0n1/slime_ci/datasets:/root/datasets
67+
strategy:
68+
fail-fast: false
69+
matrix:
70+
info: << config.tests | tojson >>
71+
defaults:
72+
run:
73+
working-directory: ${{ github.workspace }}
74+
env:
75+
GITHUB_COMMIT_NAME: ${{ github.sha }}_${{ github.event.pull_request.number || 'non-pr' }}
76+
WANDB_API_KEY: ${{ secrets.WANDB_API_KEY }}
77+
SLIME_TEST_ENABLE_INFINITE_RUN: ${{ (github.event_name == 'workflow_dispatch' && github.event.inputs.infinite_run) || 'false' }}
78+
79+
steps:
80+
- name: Checkout repository
81+
uses: actions/checkout@v4
82+
83+
- name: Install
84+
shell: bash
85+
run: cd $GITHUB_WORKSPACE && pip install -e .
86+
87+
- name: Execute
88+
shell: bash
89+
run: python tests/ci/gpu_lock_exec.py --count ${{ matrix.info.num_gpus }} -- python tests/${{ matrix.info.test_file }}
90+
<% endfor %>

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,4 +190,5 @@ local/
190190

191191
glm/
192192
_examples_synced/
193-
.env
193+
.env
194+
.DS_Store

.pre-commit-config.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,12 @@ repos:
1717
args: ['--maxkb=1000']
1818
- id: requirements-txt-fixer
1919

20+
- repo: https://github.com/astral-sh/ruff-pre-commit
21+
rev: v0.14.7
22+
hooks:
23+
- id: ruff-check
24+
args: [ --fix ]
25+
2026
- repo: https://github.com/PyCQA/autoflake
2127
rev: v2.0.2
2228
hooks:

README.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,14 @@ We also provide examples for some use cases not covered in the quick start guide
5151

5252
slime has powered several novel research projects and production systems. Here are some notable examples:
5353

54+
### ⚛️ P1: Mastering Physics Olympiads with Reinforcement Learning
55+
56+
[**P1**](https://prime-rl.github.io/P1/) is a family of open-source physics reasoning models trained entirely through reinforcement learning. P1 leverages slime as the RL post training framework, and introduces a multi-stage RL training algorithm that progressively enhances reasoning ability through adaptive learnability adjustment and stabilization mechanisms. Enpowered by this training paradigm, P1 delivers breakthrough performance in open-source physics reasoning.
57+
58+
### 📈RLVE: Scaling LM RL with Adaptive Verifiable Environments
59+
60+
[**RLVE**](https://github.com/Zhiyuan-Zeng/RLVE) introduces an approach using verifiable environments that procedurally generate problems and provide algorithmically verifiable rewards, to scale up RL for language models (LMs). With joint training across 400 verifiable environments, RLVE enables each environment to dynamically adapt its problem difficulty distribution to the policy model's capabilities as training progresses.
61+
5462
### ⚡ TritonForge: Agentic RL Training Framework for Kernel Generation
5563

5664
[**TritonForge**](https://github.com/RLsys-Foundation/TritonForge) leverages slime's SFT & RL capabilities to train LLMs that automatically generate optimized GPU kernels. By using a two-stage training approach—supervised fine-tuning followed by reinforcement learning with multi-turn compilation feedback—TritonForge achieves remarkable results in converting PyTorch operations into high-performance Triton kernels.
@@ -65,7 +73,7 @@ These projects showcase slime's versatility—from training code-generation mode
6573

6674
Arguments in slime are divided into three categories:
6775

68-
1. **Megatron arguments**: slime reads all arguments set in Megatron via `PYTHONPATH`. You can configure Megatron by passing arguments like `--tensor-model-parallel-size 2`.
76+
1. **Megatron arguments**: slime reads all arguments in Megatron. You can configure Megatron by passing arguments like `--tensor-model-parallel-size 2`.
6977
2. **SGLang arguments**: All arguments for the installed SGLang are supported. These arguments must be prefixed with `--sglang-`. For example, `--mem-fraction-static` should be passed as `--sglang-mem-fraction-static`.
7078
3. **slime-specific arguments**: Please refer to: [slime/utils/arguments.py](slime/utils/arguments.py)
7179

@@ -93,7 +101,7 @@ pre-commit run --all-files --show-diff-on-failure --color=always
93101
- Special thanks to the following projects & communities: SGLang, Megatron‑LM, mbridge, OpenRLHF, veRL, Pai-Megatron-Patch and others.
94102
- To quote slime, please use:
95103

96-
```bibtext
104+
```bibtex
97105
@misc{slime_github,
98106
author = {Zilin Zhu and Chengxing Xie and Xin Lv and slime Contributors},
99107
title = {slime: An LLM post-training framework for RL Scaling},

README_zh.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ slime 是 [GLM-4.5](https://z.ai/blog/glm-4.5) 与 [GLM-4.6](https://z.ai/blog/g
8282
- 特别感谢以下项目 & 社区:SGLang、Megatron‑LM、mbridge、OpenRLHF、veRL、Pai-Megatron-Patch 等。
8383

8484
- 引用 slime 请使用:
85-
```bibtext
85+
```bibtex
8686
@misc{slime_github,
8787
author = {Zilin Zhu and Chengxing Xie and Xin Lv and slime Contributors},
8888
title = {slime: An LLM post-training framework for RL Scaling},

0 commit comments

Comments
 (0)