Commit 0edbebf
committed
feat: Add CISPO (Clipped IS-weight Policy Optimization)
Add support for CISPO algorithm from MiniMax-M1 paper, which addresses
PPO/GRPO's limitation of clipping out low-probability reasoning tokens.
Changes:
- Add compute_cispo_loss() in slime/utils/ppo_utils.py
- Add 'cispo' to advantage_estimator choices
- Update reward normalization to include CISPO
- Use CISPO loss when advantage_estimator='cispo'
Key implementation details:
- Token-level IS with stop-gradient on clipped ratios
- Explicit log probability: ratio_sg * advantages * log_probs
- Upper-only clipping with default eps_clip_high=5.0
- Direct clipfrac calculation: (ratio > eps_clip_high)
Reference: MiniMax-M1 paper (arxiv:2506.13585)1 parent 16b3919 commit 0edbebf
File tree
5 files changed
+63
-7
lines changed- slime
- backends
- fsdp_utils
- megatron_utils
- ray
- utils
5 files changed
+63
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
484 | 484 | | |
485 | 485 | | |
486 | 486 | | |
487 | | - | |
| 487 | + | |
488 | 488 | | |
489 | 489 | | |
490 | 490 | | |
| |||
554 | 554 | | |
555 | 555 | | |
556 | 556 | | |
557 | | - | |
| 557 | + | |
558 | 558 | | |
559 | 559 | | |
560 | 560 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
236 | 237 | | |
237 | 238 | | |
238 | 239 | | |
239 | | - | |
| 240 | + | |
240 | 241 | | |
241 | 242 | | |
242 | 243 | | |
| |||
416 | 417 | | |
417 | 418 | | |
418 | 419 | | |
419 | | - | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
420 | 425 | | |
421 | 426 | | |
422 | 427 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
180 | 180 | | |
181 | 181 | | |
182 | 182 | | |
183 | | - | |
| 183 | + | |
184 | 184 | | |
185 | 185 | | |
186 | 186 | | |
| |||
193 | 193 | | |
194 | 194 | | |
195 | 195 | | |
196 | | - | |
| 196 | + | |
197 | 197 | | |
198 | 198 | | |
199 | 199 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
672 | 672 | | |
673 | 673 | | |
674 | 674 | | |
675 | | - | |
| 675 | + | |
676 | 676 | | |
677 | 677 | | |
678 | 678 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
75 | 126 | | |
76 | 127 | | |
77 | 128 | | |
| |||
0 commit comments