Highlights
New example trainer
Weights are shared between vLLM and the trainer, no comms needed to sync weights, and memory saved by using only one copy of the weights!
On Policy/Self Distillation Support
Now support logprobs from a teacher/prompted endpoint, fully supporting on policy distillation/self distillation!
OpenAI Endpoint for managed server
Launch an openai endpoint and collect rollouts from any program that takes in an openai endpoint!
What's Changed
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #215
- Interleaved Tool-Use Within Reasoning Blocks by @interstellarninja in #195
- Pairwise Judgement Environment - improve dataloading, ctx len by @teknium1 in #218
- Add Word Hunt environment by @Aboozle1 in #220
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #222
- qwen tokenizer wrapper & fixed jinja template for tool handling by @shannonsands in #224
- Add arena-hard v1 environment by @teknium1 in #219
- Textworld minimal by @shannonsands in #225
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #228
- Diplomacy trainer env by @shannonsands in #227
- build: update checkout action to v5 by @rejected-l in #233
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #231
- fix: division-by-zero in gradient calculation by @brawncode in #236
- add error logging to collect_trajectories so they don't fail silently by @dmahan93 in #237
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #238
- Update bibtex by @hjc-puro in #235
- Refusalbench v2 by @J-SUPHA in #239
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #241
- Refusalbench v2 by @J-SUPHA in #242
- Fix multiple scored data groups by @shannonsands in #223
- Revert "Fix multiple scored data groups" by @shannonsands in #243
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #246
- fix typo in variable name by @prestoalvarez in #245
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #248
- Multi-Turn Tool-Use RL Environment by @interstellarninja in #160
- WIP: Environments/bleuberi by @aniemerg in #175
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #249
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #251
- refactor(api): improve attribute checking and remove hardcoded values by @DeVikingMark in #250
- fix: correct typos in documentation and comments by @viktorking7 in #254
- [Environment]: smolagents by @aniemerg in #104
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #255
- SmolAgent Env Linting Fixes by @ropresearch in #256
- group temps, sample temps, and logprob api params by @ropresearch in #253
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #257
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #259
- docs: minor fixes to follow code standards by @andrewshab3 in #261
- GZip Compression by @ropresearch in #263
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #266
- docs: few minor fixes by @letmehateu in #265
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #268
- add sglang specific token level logprob handling and server manager/b… by @dmahan93 in #264
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #269
- fix: correct typo and improve code quality by @bobtajson in #267
- add managed vllm server by @dmahan93 in #273
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #275
- refactor: Refactor scored data handling into reusable helper by @ninastef in #272
- feat: dump evaluate subcommand config to YAML in env save dir by @dhyaneesh in #274
- fix some issues by @teknium1 in #279
- docs: fix dead links by @kseniaeremekno in #277
- Convert Environments to ManagedServer for Tinker Integrations by @teknium1 in #278
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #281
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #284
- README updates for Tinker Integration by @samherring99 in #286
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #287
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #288
- docs: fix dead links by @juleennn in #283
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #291
- fix: fix broken links to files by @tonnycro in #292
- Port many benchmarks into atropos by @teknium1 in #294
- Olympiad Coding Environment and LCB Eval by @JoeLi12345 in #296
- big update for letter counting by @teknium1 in #298
- chore: bump license year to 2026 by @rejected-l in #299
- MT-GRPO Turn-Level Advantage Environment by @interstellarninja in #162
- Fix missing logprob by @JustKitting in #293
- Add reversed text environment by @teknium1 in #234
- add eval runner by @dmahan93 in #290
- Feat/sql query env by @PLippmann in #301
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #315
- fix: multiple typos of different importance by @crStiv in #318
- Add support for reasoning models and their variety of providers/endpo… by @teknium1 in #297
- fix: handle nested message format in jsonl2html.py by @Savage890 in #317
- Prevent hangs in kernel evaluation by bounding worker waits by @GHOryy5 in #289
- fix: typo in max_token_length by @windlgrass in #327
- Verifiers Integration by @alt-glitch in #305
- fix: correct typos in instructions.py by @windlgrass in #329
- fix: multiple typos of different importance by @crStiv in #325
- fix: use correct prefix for gradient quantiles with NaN/Inf by @DeVikingMark in #324
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #323
- fix: remove duplicate code in instruction files by @windlgrass in #330
- Fix typos in SLURM.md by @HusseinAdeiza in #334
- fix: initialize current_item in init to prevent AttributeError by @windlgrass in #338
- chore: fix typos by @VolodymyrBg in #339
- Add dummy openai managed server by @dmahan93 in #359
- fix duplicate code + add safety checks by @alireza78a in #370
- add tokenizer name config to set the vllm/sglang tokenizer by @dmahan93 in #373
- [docs] Clarify prerequisites, fix Python version inconsistency, and add troubleshooting section by @Ridwannurudeen in #355
- fix: replace debug print statements with logger by @alireza78a in #365
- Add regression test for TRL vLLM completion wrapper by @ansulx in #362
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #375
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #382
- chore: remove redundant inline comments from swe_rl_env.py imports by @victlop in #377
- Add regex generation environment for community by @johnh4098 in #378
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #391
- fix: add try/finally to guarantee gym environment cleanup by @VolodymyrBg in #390
- fix: handle validation without training by @CreeptoGengar in #389
- fix: pass num_steps to register_to_api by @Ocheretovich in #392
- refactor: replace print statements with self.logger in reasoning_gym_environment.py by @milord12345 in #388
- docs: fix typo by @prestoalvarez in #400
- Opd filtered by @J-SUPHA in #387
- add code-spell and secrects precommit by @dmahan93 in #402
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #404
- Pipeline rl by @J-SUPHA in #322
- fix: use sys.executable instead of hardcoded "python" in tests by @0xbyt4 in #399
- Unified get_logprobs interface across the server stack by @J-SUPHA in #406
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #409
- add tool call parsing based on vllm impl and an openai server endpoint by @dmahan93 in #405
New Contributors
- @Aboozle1 made their first contribution in #220
- @rejected-l made their first contribution in #233
- @brawncode made their first contribution in #236
- @prestoalvarez made their first contribution in #245
- @aniemerg made their first contribution in #175
- @DeVikingMark made their first contribution in #250
- @viktorking7 made their first contribution in #254
- @andrewshab3 made their first contribution in #261
- @letmehateu made their first contribution in #265
- @bobtajson made their first contribution in #267
- @ninastef made their first contribution in #272
- @dhyaneesh made their first contribution in #274
- @kseniaeremekno made their first contribution in #277
- @samherring99 made their first contribution in #286
- @juleennn made their first contribution in #283
- @tonnycro made their first contribution in #292
- @JustKitting made their first contribution in #293
- @Savage890 made their first contribution in #317
- @GHOryy5 made their first contribution in #289
- @windlgrass made their first contribution in #327
- @alt-glitch made their first contribution in #305
- @HusseinAdeiza made their first contribution in #334
- @VolodymyrBg made their first contribution in #339
- @alireza78a made their first contribution in #370
- @Ridwannurudeen made their first contribution in #355
- @ansulx made their first contribution in #362
- @victlop made their first contribution in #377
- @johnh4098 made their first contribution in #378
- @CreeptoGengar made their first contribution in #389
- @Ocheretovich made their first contribution in #392
- @milord12345 made their first contribution in #388
- @0xbyt4 made their first contribution in #399
Full Changelog: v0.3.0...v0.4.0