Commit d20c8ff
Add GRPO and Support RLVR for PPO (#6186)
* add grpo, support rlvr
* add grpo, support rlvr
* tested deepseek r1 pipeline
* add ci
* verify grpo r1
* verify grpo r1
* update readme, remove unused code
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* remove path
* clean code
* fix circular import
* fix ci OOM
* fix ci OOM
* skip kto tp, fix qwen generation
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>1 parent ce0ec40 commit d20c8ff
File tree
39 files changed
+1993
-275
lines changed- .github/workflows
- applications/ColossalChat
- coati
- dataset
- experience_buffer
- experience_maker
- models
- trainer
- utils/reward_score
- conversation_template
- examples
- data_preparation_scripts
- training_scripts
- tests
39 files changed
+1993
-275
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| 64 | + | |
64 | 65 | | |
65 | 66 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
158 | 158 | | |
159 | 159 | | |
160 | 160 | | |
| 161 | + | |
161 | 162 | | |
162 | 163 | | |
163 | 164 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
141 | 141 | | |
142 | 142 | | |
143 | 143 | | |
144 | | - | |
| 144 | + | |
145 | 145 | | |
146 | 146 | | |
147 | 147 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
| 158 | + | |
158 | 159 | | |
159 | 160 | | |
160 | 161 | | |
161 | 162 | | |
162 | 163 | | |
163 | 164 | | |
164 | | - | |
| 165 | + | |
165 | 166 | | |
166 | 167 | | |
167 | 168 | | |
| |||
Lines changed: 15 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
147 | 147 | | |
148 | 148 | | |
149 | 149 | | |
150 | | - | |
151 | 150 | | |
152 | 151 | | |
153 | 152 | | |
| |||
167 | 166 | | |
168 | 167 | | |
169 | 168 | | |
170 | | - | |
171 | 169 | | |
172 | 170 | | |
173 | 171 | | |
| |||
185 | 183 | | |
186 | 184 | | |
187 | 185 | | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | | - | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
194 | 201 | | |
195 | 202 | | |
196 | 203 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
| 31 | + | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
| |||
40 | 42 | | |
41 | 43 | | |
42 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
43 | 48 | | |
44 | 49 | | |
45 | 50 | | |
| |||
52 | 57 | | |
53 | 58 | | |
54 | 59 | | |
55 | | - | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
56 | 64 | | |
57 | 65 | | |
58 | 66 | | |
| |||
0 commit comments