Commit dea9da5
Agentic GRPO improvements: sampler-IS correction, eval fix, flash attn
Key changes:
- Fix PeftTrainer early exit when is_managed_externally=True (prevents
spurious max_steps-triggered break inside the externally-managed
agentic training loop)
- Fix eval deduplication in agentic_rl_learner: eval at a step boundary
fired grad_accum_steps times instead of once; guard with
_last_eval_train_step to skip repeat evals within the same train_step
- Add token-level truncated importance-sampling (TIS) correction in
agentic GRPO to account for sampler-trainer log-probability drift in
multi-turn rollouts (sampler_is='token', configurable threshold)
- Add sampler_is_weights field to TrainExample; apply in GRPO loss before
aggregation
- Log sampler-trainer prob_diff and pearson correlation every training
step for diagnosing numerical alignment between rollout and trainer
- Qwen3 model: thread segment_ids through flash-attention (splash kernel)
forward pass so left-padded prompts do not contaminate attention output
- vllm_sampler: fix EOS token handling to prevent off-by-one in
completion mask construction
- trajectory_collect_engine: include sampler logprobs in trajectory
output so the GRPO learner can apply the TIS correction without a
second forward pass; fix conversation mask alignment
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>1 parent 90bb1c5 commit dea9da5
8 files changed
Lines changed: 419 additions & 79 deletions
File tree
- tunix
- generate
- models/qwen3
- rl
- agentic
- trajectory
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
349 | 349 | | |
350 | 350 | | |
351 | 351 | | |
352 | | - | |
353 | | - | |
354 | | - | |
355 | | - | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
356 | 359 | | |
357 | 360 | | |
358 | 361 | | |
| |||
461 | 464 | | |
462 | 465 | | |
463 | 466 | | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
464 | 475 | | |
465 | 476 | | |
466 | 477 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
486 | 486 | | |
487 | 487 | | |
488 | 488 | | |
| 489 | + | |
489 | 490 | | |
490 | 491 | | |
491 | 492 | | |
| |||
571 | 572 | | |
572 | 573 | | |
573 | 574 | | |
574 | | - | |
575 | | - | |
576 | | - | |
577 | | - | |
578 | | - | |
579 | | - | |
580 | | - | |
581 | | - | |
582 | | - | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
583 | 614 | | |
584 | | - | |
585 | | - | |
586 | | - | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
587 | 628 | | |
588 | 629 | | |
589 | 630 | | |
| |||
621 | 662 | | |
622 | 663 | | |
623 | 664 | | |
| 665 | + | |
624 | 666 | | |
625 | 667 | | |
626 | 668 | | |
| |||
629 | 671 | | |
630 | 672 | | |
631 | 673 | | |
632 | | - | |
| 674 | + | |
633 | 675 | | |
634 | 676 | | |
635 | | - | |
| 677 | + | |
636 | 678 | | |
637 | 679 | | |
638 | 680 | | |
| |||
1052 | 1094 | | |
1053 | 1095 | | |
1054 | 1096 | | |
| 1097 | + | |
1055 | 1098 | | |
1056 | 1099 | | |
1057 | 1100 | | |
1058 | 1101 | | |
1059 | 1102 | | |
1060 | 1103 | | |
1061 | 1104 | | |
| 1105 | + | |
1062 | 1106 | | |
1063 | 1107 | | |
1064 | 1108 | | |
| |||
1073 | 1117 | | |
1074 | 1118 | | |
1075 | 1119 | | |
| 1120 | + | |
1076 | 1121 | | |
1077 | 1122 | | |
1078 | 1123 | | |
1079 | 1124 | | |
1080 | 1125 | | |
1081 | | - | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
1082 | 1129 | | |
1083 | | - | |
| 1130 | + | |
| 1131 | + | |
| 1132 | + | |
1084 | 1133 | | |
1085 | 1134 | | |
1086 | 1135 | | |
| |||
1146 | 1195 | | |
1147 | 1196 | | |
1148 | 1197 | | |
| 1198 | + | |
1149 | 1199 | | |
1150 | 1200 | | |
1151 | 1201 | | |
| |||
1155 | 1205 | | |
1156 | 1206 | | |
1157 | 1207 | | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
1158 | 1213 | | |
1159 | 1214 | | |
1160 | 1215 | | |
| |||
1173 | 1228 | | |
1174 | 1229 | | |
1175 | 1230 | | |
| 1231 | + | |
1176 | 1232 | | |
1177 | 1233 | | |
1178 | 1234 | | |
| |||
0 commit comments