Commit 095208d
committed
feat: Add CISPO (Clipped IS-weight Policy Optimization)
Add support for CISPO algorithm from MiniMax-M1 paper, which addresses
PPO/GRPO's limitation of clipping out low-probability reasoning tokens.
Changes:
- Add compute_cispo_loss() in slime/utils/ppo_utils.py
- Add 'cispo' to advantage_estimator choices
- Update reward normalization to include CISPO
- Use CISPO loss when advantage_estimator='cispo'
Key implementation details:
- Token-level IS with stop-gradient on clipped ratios
- Explicit log probability: ratio_sg * advantages * log_probs
- Upper-only clipping with default eps_clip_high=5.0
- Direct clipfrac calculation: (ratio > eps_clip_high)
Reference: MiniMax-M1 paper (arxiv:2506.13585)1 parent a4a59ea commit 095208d
File tree
4 files changed
+61
-4
lines changed- slime
- backends/megatron_utils
- ray
- utils
4 files changed
+61
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
239 | 240 | | |
240 | 241 | | |
241 | 242 | | |
242 | | - | |
| 243 | + | |
243 | 244 | | |
244 | 245 | | |
245 | 246 | | |
| |||
449 | 450 | | |
450 | 451 | | |
451 | 452 | | |
452 | | - | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
453 | 458 | | |
454 | 459 | | |
455 | 460 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
190 | 190 | | |
191 | 191 | | |
192 | 192 | | |
193 | | - | |
| 193 | + | |
194 | 194 | | |
195 | 195 | | |
196 | 196 | | |
| |||
203 | 203 | | |
204 | 204 | | |
205 | 205 | | |
206 | | - | |
| 206 | + | |
207 | 207 | | |
208 | 208 | | |
209 | 209 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
701 | 701 | | |
702 | 702 | | |
703 | 703 | | |
| 704 | + | |
704 | 705 | | |
705 | 706 | | |
706 | 707 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
144 | 144 | | |
145 | 145 | | |
146 | 146 | | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
147 | 198 | | |
148 | 199 | | |
149 | 200 | | |
| |||
0 commit comments