Commit b418d34
[fix] use
# What does this PR do?
Switches back to `token_mean_legacy` for DAPO scripts. For reasoning RL,
the legacy token mean reduction has better performance because it biases
towards larger response lengths, leading to longer reasoning chains.
This is also the reduction used for the original DAPO runs on VeRL. (And
[our own
](https://wandb.ai/sky-posttraining-uc-berkeley/skyrl-train-dapo-aime/reports/SkyRL-Train-DAPO--VmlldzoxNDkyOTc0MQ?accessToken=vxzk3zov4kehgbfe7j897syz1vv5u01wdfnzajnr6hymeh1ne4hrnav4q9d0h2kk)
reproduction runs from January)
---------
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>token_mean_legacy loss reduction for DAPO scripts (#1637)1 parent 2df8f51 commit b418d34
8 files changed
Lines changed: 10 additions & 15 deletions
Lines changed: 1 addition & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
20 | | - | |
| 19 | + | |
21 | 20 | | |
22 | 21 | | |
23 | 22 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
20 | | - | |
| 19 | + | |
21 | 20 | | |
22 | 21 | | |
23 | 22 | | |
| |||
Lines changed: 1 addition & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
20 | | - | |
| 19 | + | |
21 | 20 | | |
22 | 21 | | |
23 | 22 | | |
| |||
Lines changed: 2 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | | - | |
| 21 | + | |
23 | 22 | | |
24 | 23 | | |
25 | 24 | | |
| |||
126 | 125 | | |
127 | 126 | | |
128 | 127 | | |
129 | | - | |
| 128 | + | |
Lines changed: 2 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | | - | |
| 21 | + | |
23 | 22 | | |
24 | 23 | | |
25 | 24 | | |
| |||
121 | 120 | | |
122 | 121 | | |
123 | 122 | | |
124 | | - | |
| 123 | + | |
0 commit comments