You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+10-2Lines changed: 10 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,6 +51,14 @@ We also provide examples for some use cases not covered in the quick start guide
51
51
52
52
slime has powered several novel research projects and production systems. Here are some notable examples:
53
53
54
+
### ⚛️ P1: Mastering Physics Olympiads with Reinforcement Learning
55
+
56
+
[**P1**](https://prime-rl.github.io/P1/) is a family of open-source physics reasoning models trained entirely through reinforcement learning. P1 leverages slime as the RL post training framework, and introduces a multi-stage RL training algorithm that progressively enhances reasoning ability through adaptive learnability adjustment and stabilization mechanisms. Enpowered by this training paradigm, P1 delivers breakthrough performance in open-source physics reasoning.
57
+
58
+
### 📈RLVE: Scaling LM RL with Adaptive Verifiable Environments
59
+
60
+
[**RLVE**](https://github.com/Zhiyuan-Zeng/RLVE) introduces an approach using verifiable environments that procedurally generate problems and provide algorithmically verifiable rewards, to scale up RL for language models (LMs). With joint training across 400 verifiable environments, RLVE enables each environment to dynamically adapt its problem difficulty distribution to the policy model's capabilities as training progresses.
61
+
54
62
### ⚡ TritonForge: Agentic RL Training Framework for Kernel Generation
55
63
56
64
[**TritonForge**](https://github.com/RLsys-Foundation/TritonForge) leverages slime's SFT & RL capabilities to train LLMs that automatically generate optimized GPU kernels. By using a two-stage training approach—supervised fine-tuning followed by reinforcement learning with multi-turn compilation feedback—TritonForge achieves remarkable results in converting PyTorch operations into high-performance Triton kernels.
@@ -65,7 +73,7 @@ These projects showcase slime's versatility—from training code-generation mode
65
73
66
74
Arguments in slime are divided into three categories:
67
75
68
-
1.**Megatron arguments**: slime reads all arguments set in Megatron via `PYTHONPATH`. You can configure Megatron by passing arguments like `--tensor-model-parallel-size 2`.
76
+
1.**Megatron arguments**: slime reads all arguments in Megatron. You can configure Megatron by passing arguments like `--tensor-model-parallel-size 2`.
69
77
2.**SGLang arguments**: All arguments for the installed SGLang are supported. These arguments must be prefixed with `--sglang-`. For example, `--mem-fraction-static` should be passed as `--sglang-mem-fraction-static`.
70
78
3.**slime-specific arguments**: Please refer to: [slime/utils/arguments.py](slime/utils/arguments.py)
71
79
@@ -93,7 +101,7 @@ pre-commit run --all-files --show-diff-on-failure --color=always
93
101
- Special thanks to the following projects & communities: SGLang, Megatron‑LM, mbridge, OpenRLHF, veRL, Pai-Megatron-Patch and others.
94
102
- To quote slime, please use:
95
103
96
-
```bibtext
104
+
```bibtex
97
105
@misc{slime_github,
98
106
author = {Zilin Zhu and Chengxing Xie and Xin Lv and slime Contributors},
99
107
title = {slime: An LLM post-training framework for RL Scaling},
0 commit comments