Release v0.4.0 · NousResearch/atropos

Highlights

New example trainer

Weights are shared between vLLM and the trainer, no comms needed to sync weights, and memory saved by using only one copy of the weights!

On Policy/Self Distillation Support

Now support logprobs from a teacher/prompted endpoint, fully supporting on policy distillation/self distillation!

OpenAI Endpoint for managed server

Launch an openai endpoint and collect rollouts from any program that takes in an openai endpoint!

What's Changed

[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #215
Interleaved Tool-Use Within Reasoning Blocks by @interstellarninja in #195
Pairwise Judgement Environment - improve dataloading, ctx len by @teknium1 in #218
Add Word Hunt environment by @Aboozle1 in #220
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #222
qwen tokenizer wrapper & fixed jinja template for tool handling by @shannonsands in #224
Add arena-hard v1 environment by @teknium1 in #219
Textworld minimal by @shannonsands in #225
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #228
Diplomacy trainer env by @shannonsands in #227
build: update checkout action to v5 by @rejected-l in #233
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #231
fix: division-by-zero in gradient calculation by @brawncode in #236
add error logging to collect_trajectories so they don't fail silently by @dmahan93 in #237
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #238
Update bibtex by @hjc-puro in #235
Refusalbench v2 by @J-SUPHA in #239
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #241
Refusalbench v2 by @J-SUPHA in #242
Fix multiple scored data groups by @shannonsands in #223
Revert "Fix multiple scored data groups" by @shannonsands in #243
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #246
fix typo in variable name by @prestoalvarez in #245
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #248
Multi-Turn Tool-Use RL Environment by @interstellarninja in #160
WIP: Environments/bleuberi by @aniemerg in #175
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #249
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #251
refactor(api): improve attribute checking and remove hardcoded values by @DeVikingMark in #250
fix: correct typos in documentation and comments by @viktorking7 in #254
[Environment]: smolagents by @aniemerg in #104
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #255
SmolAgent Env Linting Fixes by @ropresearch in #256
group temps, sample temps, and logprob api params by @ropresearch in #253
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #257
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #259
docs: minor fixes to follow code standards by @andrewshab3 in #261
GZip Compression by @ropresearch in #263
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #266
docs: few minor fixes by @letmehateu in #265
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #268
add sglang specific token level logprob handling and server manager/b… by @dmahan93 in #264
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #269
fix: correct typo and improve code quality by @bobtajson in #267
add managed vllm server by @dmahan93 in #273
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #275
refactor: Refactor scored data handling into reusable helper by @ninastef in #272
feat: dump evaluate subcommand config to YAML in env save dir by @dhyaneesh in #274
fix some issues by @teknium1 in #279
docs: fix dead links by @kseniaeremekno in #277
Convert Environments to ManagedServer for Tinker Integrations by @teknium1 in #278
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #281
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #284
README updates for Tinker Integration by @samherring99 in #286
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #287
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #288
docs: fix dead links by @juleennn in #283
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #291
fix: fix broken links to files by @tonnycro in #292
Port many benchmarks into atropos by @teknium1 in #294
Olympiad Coding Environment and LCB Eval by @JoeLi12345 in #296
big update for letter counting by @teknium1 in #298
chore: bump license year to 2026 by @rejected-l in #299
MT-GRPO Turn-Level Advantage Environment by @interstellarninja in #162
Fix missing logprob by @JustKitting in #293
Add reversed text environment by @teknium1 in #234
add eval runner by @dmahan93 in #290
Feat/sql query env by @PLippmann in #301
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #315
fix: multiple typos of different importance by @crStiv in #318
Add support for reasoning models and their variety of providers/endpo… by @teknium1 in #297
fix: handle nested message format in jsonl2html.py by @Savage890 in #317
Prevent hangs in kernel evaluation by bounding worker waits by @GHOryy5 in #289
fix: typo in max_token_length by @windlgrass in #327
Verifiers Integration by @alt-glitch in #305
fix: correct typos in instructions.py by @windlgrass in #329
fix: multiple typos of different importance by @crStiv in #325
fix: use correct prefix for gradient quantiles with NaN/Inf by @DeVikingMark in #324
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #323
fix: remove duplicate code in instruction files by @windlgrass in #330
Fix typos in SLURM.md by @HusseinAdeiza in #334
fix: initialize current_item in init to prevent AttributeError by @windlgrass in #338
chore: fix typos by @VolodymyrBg in #339
Add dummy openai managed server by @dmahan93 in #359
fix duplicate code + add safety checks by @alireza78a in #370
add tokenizer name config to set the vllm/sglang tokenizer by @dmahan93 in #373
[docs] Clarify prerequisites, fix Python version inconsistency, and add troubleshooting section by @Ridwannurudeen in #355
fix: replace debug print statements with logger by @alireza78a in #365
Add regression test for TRL vLLM completion wrapper by @ansulx in #362
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #375
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #382
chore: remove redundant inline comments from swe_rl_env.py imports by @victlop in #377
Add regex generation environment for community by @johnh4098 in #378
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #391
fix: add try/finally to guarantee gym environment cleanup by @VolodymyrBg in #390
fix: handle validation without training by @CreeptoGengar in #389
fix: pass num_steps to register_to_api by @Ocheretovich in #392
refactor: replace print statements with self.logger in reasoning_gym_environment.py by @milord12345 in #388
docs: fix typo by @prestoalvarez in #400
Opd filtered by @J-SUPHA in #387
add code-spell and secrects precommit by @dmahan93 in #402
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #404
Pipeline rl by @J-SUPHA in #322
fix: use sys.executable instead of hardcoded "python" in tests by @0xbyt4 in #399
Unified get_logprobs interface across the server stack by @J-SUPHA in #406
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #409
add tool call parsing based on vllm impl and an openai server endpoint by @dmahan93 in #405

New Contributors

@Aboozle1 made their first contribution in #220
@rejected-l made their first contribution in #233
@brawncode made their first contribution in #236
@prestoalvarez made their first contribution in #245
@aniemerg made their first contribution in #175
@DeVikingMark made their first contribution in #250
@viktorking7 made their first contribution in #254
@andrewshab3 made their first contribution in #261
@letmehateu made their first contribution in #265
@bobtajson made their first contribution in #267
@ninastef made their first contribution in #272
@dhyaneesh made their first contribution in #274
@kseniaeremekno made their first contribution in #277
@samherring99 made their first contribution in #286
@juleennn made their first contribution in #283
@tonnycro made their first contribution in #292
@JustKitting made their first contribution in #293
@Savage890 made their first contribution in #317
@GHOryy5 made their first contribution in #289
@windlgrass made their first contribution in #327
@alt-glitch made their first contribution in #305
@HusseinAdeiza made their first contribution in #334
@VolodymyrBg made their first contribution in #339
@alireza78a made their first contribution in #370
@Ridwannurudeen made their first contribution in #355
@ansulx made their first contribution in #362
@victlop made their first contribution in #377
@johnh4098 made their first contribution in #378
@CreeptoGengar made their first contribution in #389
@Ocheretovich made their first contribution in #392
@milord12345 made their first contribution in #388
@0xbyt4 made their first contribution in #399

Full Changelog: v0.3.0...v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0

Choose a tag to compare

Sorry, something went wrong.