03 Dec 17:07

b2a08a0

Latest

Highlights

Asynchronous training: We now support fully asynchronous training in SkyRL, enabling higher throughput for agentic RL: https://skyrl.readthedocs.io/en/latest/tutorials/fully_async.html

Dependency Upgrades:

Upgraded vLLM to 0.11.0, Ray to 2.51.1
Megatron: Migrated from mbridge to the newer Megatron-Bridge library. The latter is expected to have more active development and support from NVIDIA.

The updated installation instructions can be found here.

Recipes: We've consolidated a list of end-to-end recipes with SkyRL here for reference runs on math, Text2SQL and search tasks.

SkyRL on Managed Platforms: Guides for running SkyRL on managed platforms such as Anyscale, Runpod and SkyPilot can be found here.

Miscellaneous: Support for GPT-OSS, integration with Pytorch's OpenEnv, support for IPv6 clusters, and more!

What's Changed

[Examples][Step wise] Support thinking models like Qwen 3 by @SumanthRH in #468
Modal Integration by @benji-cannot-code in #444
[fix] abort all requests before sleep by @vutrung96 in #458
TerminalBenchGenerator: logprobs + session ID by @li-boxuan in #448
Divide-by-Zero when setting NUMA affinity patch by @matthambrecht in #457
[bug] run linter for t-bench generator by @erictang000 in #476
Bump vLLM version to 0.11.0 by @tyler-griggs in #481
[Sequence parallel][train] Support sequence parallelism without sample packing by @SumanthRH in #480
[fix] Resolve timeout and cleanup issues in GPU CI pipeline by @tyler-griggs in #483
Increase timeout for GPU CI by @tyler-griggs in #485
Skypilot: Update Doc by @lynnliu030 in #484
Fix GPU CI Test Failures: Migrating Tests, NCCL P2P Access Errors, and Test Fixture Issues by @devpatelio in #477
[Fix] Fix entropy calculation without sample packing by @SumanthRH in #490
Skypilot: Multi-Node Test by @lynnliu030 in #493
Support exporting environment-specific metrics by @vibha-ctrl in #386
Fix broken import by @tyler-griggs in #500
Revert "Bump vLLM version to 0.11.0" by @erictang000 in #501
Fix broken entropy metric by @tyler-griggs in #504
[fix] Resolve double ray.init() call by @tyler-griggs in #506
[lora] fix lora with vllm offline engine by @erictang000 in #513
Increase GPU CI Timeout to Pass All Tests by @devpatelio in #512
[train] Increase default timeout for placement groups to 180s by @SumanthRH in #525
[dependencies] fix some flash-rl dependency issues by @erictang000 in #530
Add implementation of CISPO loss by @vutrung96 in #523
[skyrl-train] assert that the policy loss type is regular/dual clip for tis by @erictang000 in #546
[Fix] Fix fsdp2_load_state_dict with HSDP by @SumanthRH in #554
[skyrl-train] update defaults for CISPO by @erictang000 in #553
[GPTOSS] Integrate Unsloth's flex attention implementation for attention sink by @SumanthRH in #515
[skyrl-train][logging] rename loss/avg_raw_rewards to loss/avg_final_rewards for clarity by @erictang000 in #544
[Integrations] Support PyTorch OpenEnv by @lynnliu030 in #543
[Docs] Fix image in OpenEnv doc by @SumanthRH in #562
Remove truncation logic, fix corresponding tests by @devpatelio in #508
[megatron][bug fix] reset dist checkpointing asynccallsqueue to allow freeing memory by @erictang000 in #565
[dependencies] separate vllm + megatron + bump vllm back to 0.11.0 + pin minimum uv version for extra-build-dependencies by @erictang000 in #528
[skyrl-train] Enable Inference Engine pipeline parallelism by @pandyamarut in #555
[fix] Broken method call in test by @tyler-griggs in #571
[AsyncRL][1/N] Add abort_generation to vllm engine and pause/continue generation to client by @CharlieFRuan in #537
Update README.md about SkyRL-v0 reproduction by @caoshiyi in #573
[AsyncRL][2/N] Implement /chat/completion with retry on aborted sub requests by @CharlieFRuan in #557
[train][Logging] Set loguru default to INFO, and customizable by LOG_LEVEL by @CharlieFRuan in #578
[skyrl-train][Fix] Fix epoch counter after resuming from checkpoint by @SumanthRH in #589
[skyrl-train] Enforce eager by default by @SumanthRH in #569
[skyrl-train][Fix] sleep only if colocated by @SumanthRH in #595
Fix: Megatron Autograd Warning for Broadcast Kernel by @devpatelio in #588
Comment by @devpatelio in #596
Comment upda by @devpatelio in #597
Cleanup stray doc by @SumanthRH in #599
[skyrl-train] Make libnuma optional for training by @SumanthRH in #601
[skyrl-train][Examples] Support truncated importance sampling for StepWiseGenerator by @SumanthRH in #570
Add YaRN support for VLLM and HF by @sergeypastukhov-ddog in #561
[Docs] Refactor documentation for running SkyRL on managed platforms by @SumanthRH in #608
[train] Remove train_batch_size from fsdp/deepspeed strategy by @CharlieFRuan in #617
[skyrl-train] add option to specify ref model path by @erictang000 in #623
[skyrl-train] Add DAPO 7B recipe, and 32B training script by @erictang000 in #532
[skyrl-train][recipes] add dapo qwen3 1.7b and 4b scripts by @erictang000 in #625
Fix table formatting in DAPO README by @erictang000 in #631
[train][utils] Aggregate rollout metrics and validate output in concat GeneratorOutput by @CharlieFRuan in #620
[skyrl-train] Add example for on-policy distillation by @erictang000 in #585
Support IPv6 addresses in TCP URL construction by @mayavkrishnan25 in #612
[train][TBench] Cherrypick Terminus integration and use Harbor by @CharlieFRuan in #637
[megatron] Added non cuda ipc wt sync to megatron workers by @nikhilbarhate99 in #635
[docs] Add build instructions to README.md by @CharlieFRuan in #648
Fix in README.md by @nrghosh in #653
[skyrl-train][Fix] Fix FSDP1 module wrap policy for HFModelWrapper by @SumanthRH in #654
Return init_prompts in generate_batched by @ebronstein in #652
[Docs] Fix model placement docs by @SumanthRH in #663
[skyrl-train] Support older vllm versions till 0.9.2 by @SumanthRH in #671
[lora] enforce_eager=true slows down generation time dramatically with LoRA by @devpatelio in #665
Conditionally add the generation prompt to the multi-turn chat template by @ebronstein in #676
Add entropy loss by @pbokc in #622
[skyrl-train] Upgrade Ray to 2.51.1 by @SumanthRH in #633
[Docs] Add a recipes page consolidating all E2E recipes by @SumanthRH in #679
[skyrl-train][docs] Add commit for dapo to recipes and add megatron search-r1 results by @erictang000 in #689
[megatron] upgrade from mbridge -> Megatron-Bridge (breaking change) by @erictang000 in #453
[update] Updated RoPE Configuration for HF Model...

Contributors

hezyin, vutrung96, and 21 other contributors

Assets 2

13 Oct 18:12

tyler-griggs

skyrl_train-v0.2.0

1ed499c

SkyRL-Train: v0.2.0

Highlights

This release contains 163 commits from 22 contributors, including 11 new contributors!

Megatron Backend: SkyRL now has full support for the Megatron training backend with 5D parallelism and strong support for large-scale MoE training. Learn more in our Megatron guide and examples.

LoRA Support: SkyRL now supports LoRA training with the FSDP backend and vLLM inference engine. Learn more in our LoRA guide and examples. We will continue aggressively improving LoRA support and performance, tracked in #449.

OpenAI API Compatibility: SkyRL has standardized around the OpenAI API for inference. This means that agents and agent scaffolds can call into the inference engine over the OpenAI API. SkyRL manages the inference engines and will provide a base_url to hit an OpenAI API compatible endpoint.

Integrations: Building on top of our standardization on OpenAI APIs, we integrated several popular environment and agentic projects. A couple highlights include:

Prime Intellect's Environments Hub: A guide and examples can be found here.
mini-swe-agent: A popular, minimal SWE agent implementation. A guide and examples can be found here.

What's Changed

[Doc] LLM-judge by @lynnliu030 in #167
[Gym] Add AIME by @lynnliu030 in #148
[DAPO] Fix data preprocess script by @lynnliu030 in #172
[FlashRL 3/N] Add example for FP8 training with FlashRL by @SumanthRH in #169
get chat_end_index after env init by @etnbrd in #184
[FlashRL N/N] Support Int-8 Rollouts by @SumanthRH in #176
[Generator] Support token-in-token-out rollout by @CharlieFRuan in #152
[Fix] Set skip_special_tokens to True in default sampling params by @pgasawa in #192
[docs] Update libnuma installation instructions by @tyler-griggs in #191
[docs][trivial] Add libnuma error in installation page by @CharlieFRuan in #196
[cleanup] Make agent_loop output a dataclass by @tyler-griggs in #194
[docs] Simplify installation appearance by @tyler-griggs in #200
1/N GPU CI Migration by @tyler-griggs in #195
Increase GPU CI timeout to 1hr by @tyler-griggs in #205
Clarify ObsType and the observation number by @etnbrd in #207
[fix] Skip special tokens for sglang when decoding token ids by @CharlieFRuan in #210
[doc fix] Correcting ObsType in BaseTextEnv by @tyler-griggs in #209
[Doc] Add a doc page for SkyRLGymGenerator, multi-turn rollout/tokenization by @CharlieFRuan in #186
[tests] Fix gpu offload test by @erictang000 in #215
[Fix][Generator] Correct chat history length for retoknenize codepath with env.init by @CharlieFRuan in #214
[Generator][Env] Add stop str, remove need for post-processed action in search and txt2sql by @CharlieFRuan in #190
[Examples] Minor fix after #190 by @SumanthRH in #216
[InfEngines] Lift tokenization up to InferenceEngineClient by @tyler-griggs in #217
[logs] Change print warnings to logger.warning by @CharlieFRuan in #219
[2/N GPU CI Migration] Fix broken tests by @tyler-griggs in #220
Bump vllm to 0.10.1.1 by @tyler-griggs in #225
[Fix][CI] Fix GPU CI where search needs stop param by @CharlieFRuan in #232
[Generator][HTTP] Add OpenAI API inference HTTP server for generator, support /chat/completions by @CharlieFRuan in #230
[cleanup] Remove redundant sampling params code path by @tyler-griggs in #234
[Fix][Generator] Use custom_chat_template in each step retokenization by @CharlieFRuan in #233
[Fix] Fix failing sglang test after #234 by @SumanthRH in #237
[Generator] Run env methods in threadpool executor by @alex-dr in #240
[datasets] Support hugginface datasets by @tyler-griggs in #235
[Generator] Support turn-level rewards in SkyRLGymGenerator by @tyler-griggs in #226
[fix] finish mlflow run by @etnbrd in #243
[trainer] Initial Megatron TP + PP Support by @erictang000 in #223
1/N Terminal Bench Integration by @tyler-griggs in #239
[HTTP][Generator] Let vllm python engine handle OAI request, remove openai_api_protocol by @CharlieFRuan in #238
[fix] Add missing required argument to OpenAIServingChat by @tyler-griggs in #247
Cloud Checkpointing by @tyler-griggs in #248
[fix] Extra argument in cleanup_old_checkpoints by @tyler-griggs in #254
[fix] Broken call to io.exists by @tyler-griggs in #255
Environments Hub integration by @tyler-griggs in #241
[fix] Resolve CI error due to import error by @tyler-griggs in #256
[fix] Bring back pretty log formatting by @tyler-griggs in #250
Update verifiers readme by @tyler-griggs in #258
Update README.md by @tyler-griggs in #259
Revert "[fix] Bring back pretty log formatting" by @CharlieFRuan in #261
[CI][Fix] Fix sglang import error by correcting pytest -m condition by @CharlieFRuan in #262
[Algorithms] Add Clip-Cov and KL-Cov loss functions by @SumanthRH in #251
[trainer][megatron] Fix for DP slower convergence by @erictang000 in #245
[GPU CI] Skip integration tests if uv add fails by @SumanthRH in #264
[uv] Ignore .venv by @SumanthRH in #263
[trainer][megatron] Override fused attention with flash-attn when flash_attn=True for Megatron by @erictang000 in #265
[Fix][CI] Address generator CI test fails when model stop reason is length by @CharlieFRuan in #269
[tiny fix] Remove the example launch command for main files by @tyler-griggs in #268
[InfEngineClient] Extract out routing logic to a helper by @CharlieFRuan in #267
Reuse mlflow if exists by @sergeypastukhov-ddog in #257
[trainer][megatron] Sequence packing + Context Parallel for Megatron by @erictang000 in #274
[trainer][megatron] make megatron config directly accessibile through trainer field by @erictang000 in #275
[Examples] Add an example for training on SWEBench task with Mini-SWE-Agent by @SumanthRH in #222
[Examples][Fix] Minor cleanup for the Mini-SWE-Agent example by @SumanthRH in #281
Remove simplecoder example by @pcmoritz in #282
[Fix] Fix pass_at_k missing for SkRLGymGenerator.agent_loop flow due to token-level rewards by @CharlieFRuan in #271
Add config validation for train batch size vs dp size by @tyler-griggs in #229
Add configurable timeout for placement groups by @SumanthRH in #276
[Feature] Eval Only Entry Point by @benji-cannot-code in #171
[fix] Fix the pretty trainer logging by @tyler-griggs in #270
[Examples] Add a README for the Mini-SWE-Agent example by @SumanthRH in #287
[MoE] Add support for vLLM inference expert parallelism by @tyler-griggs in #159
[Generator][HTTP] Support /completion endpoint for token-in-token-out by @CharlieFRuan in #260
Wandb same-step logging by @tyler-griggs in #289
[GPU CI][E2E] Increase timeout for ...

Contributors

pcmoritz, etnbrd, and 18 other contributors

Assets 2

20 Aug 01:20

SumanthRH

skyrl_train-v0.1.0

6c50026

SkyRL-Train: v0.1.0

What's Changed

SkyRL-Train + SkyGym code by @SumanthRH in #27
Add ReadTheDocs deployment yaml by @SumanthRH in #28
[fix] Update fsdp_config.max_norm to optimizer_config.max_grad_norm by @CharlieFRuan in #30
Rename skgym to skyrl-gym by @tyler-griggs in #31
Improve installation instructions in README and docs by @SumanthRH in #32
fixing minor issues with examples by @erictang000 in #29
[docs] Fix git clone link by @CharlieFRuan in #34
[Docs] Add more examples to the docs by @erictang000 in #37
[Docs] Add gym tools doc by @tyler-griggs in #36
Use skyrl-gym from PyPI for easier dependency management by @SumanthRH in #35
Doc fixes by @tyler-griggs in #33
Update READMEs by @tyler-griggs in #40
[Docs] Add more detailed example for Text2SQL by @erictang000 in #39
[sglang] Add patch for sglang by @SumanthRH in #42
[Docs] Add ray runtime env hook for uv to docs by @SumanthRH in #41
[Docs] Improve docs sidebar by @SumanthRH in #43
Rename docs from SkyRL-Train to SkyRL by @tyler-griggs in #44
Minor edits by @SumanthRH in #45
add sql repro to skyrl-train readme by @erictang000 in #46
[Docs] Minor fixes to venv creation in the docs by @SumanthRH in #47
[fix] In docs and scripts, test->validation.parquet, and ["x"] -> "['x']" by @CharlieFRuan in #48
[Docs] add system overview doc (pt. 1) by @tyler-griggs in #50
[docs] add a short guide on evaluation, and behavior of having multiple eval datasets by @CharlieFRuan in #49
[trainer] Fix fsdp warmup steps + move warmup to optimizer config by @erictang000 in #52
Make deepspeed optional, so it is not initialized if FSDP backend is used by @pcmoritz in #59
fix: Updated utils.py to fix stop token issue by @AtakanTekparmak in #56
[eval] fix eval batch < dp_size edge case by @erictang000 in #62
[Fix] Minor SQL fixes by @tyler-griggs in #61
[generator] Add max_env_workers argument for generator thread pool executor by @erictang000 in #53
Add Apache 2.0 License by @tyler-griggs in #63
[Cleanup] Many small fixes and improvements by @tyler-griggs in #64
Minor cleanup pt 2. by @tyler-griggs in #67
Revert "Minor cleanup pt 2." by @tyler-griggs in #69
[Cleanup] Remove unused normalize_reward codepath in CriticModel by @SumanthRH in #51
[Installation] Use skyrl-gym as a symlink for easier development; Add a developer guide by @SumanthRH in #71
SearchR1 reproduction fixes by @erictang000 in #65
[Data] Seed the dataloader for reproducibility by @tyler-griggs in #77
[skyrl-gym] search env cleanup by @erictang000 in #75
Minor cleanup by @tyler-griggs in #78
fix: move stop_reason checking logic before list truncation by @hank0316 in #81
Very simple local coding sandbox example by @pcmoritz in #80
[Dependencies] Upgrade to torch 2.7 by @SumanthRH in #73
Update README.md by @tyler-griggs in #85
Add docs badge to README by @CharlieFRuan in #86
[Installation] Update docs to include libnuma installation by @SumanthRH in #89
Support token-level loss, make default by @tyler-griggs in #90
[Installation] Fix Dockerfile after CUDA 12.8 upgrade by @SumanthRH in #91
[Cleanup] Remove unwanted NCCL env vars by @SumanthRH in #92
Add pre-commit hook for gitleaks by @SumanthRH in #93
[FIX] Garbage collect temp buffers after checkpoint by @tyler-griggs in #94
[Bugfix] Disable vllm compilation cache due to compilation issues by @SumanthRH in #95
[Bugfix] Fix env vars after #95 by @SumanthRH in #98
[GPU CI 1/N] Init GPU CI on Anyscale by @SumanthRH in #102
[Docs] Add docs for running on an existing ray cluster by @SumanthRH in #105
[Trainer] Support per-token rewards in trainer by @SumanthRH in #109
Add check for whether p2p access is supported - allows code to run on L4/L40S after #73 upgrade to cuda 12.8 by @erictang000 in #108
[dependencies] Upgrade ray to 2.48.0 by @erictang000 in #106
fix issue with #108 that broke gpu ci by @erictang000 in #112
Add warning for certain uv versions due to uv run --with regression by @SumanthRH in #113
[GPU CI] Only trigger workflow for relevant changes in skyrl-train by @SumanthRH in #114
[bug] Loading saved HF weights errors by @erictang000 in #118
[DAPO] Add support for overlong filtering by @tyler-griggs in #111
[skyrl-gym] GSM8k - LLM Judge example by @lynnliu030 in #74
Fix MLFlow logging by @bthecohen in #121
[Trainer] Support registering custom advantage estimators by @tyler-griggs in #115
[checkpointing] Add HF model config and tokenizer config to checkpoint folder by @erictang000 in #124
Fix discord link by @tyler-griggs in #125
Fix broken link by @tyler-griggs in #128
[Trainer/Algorithm] Support registering custom policy loss functions + refactor adv estimator registry to allow registration outside ray workers by @erictang000 in #126
[trainer] add more robust generation output validation by @erictang000 in #132
[Trainer] GSPO support by @bthecohen in #120
[trainer/algorithm] Implement DAPO and Polaris style dynamic sampling + add DAPO docs + example by @erictang000 in #130
[algorithm] Support Dr. GRPO + refactor where policy/critic loss functions are set by @erictang000 in #133
[fix] move algorithm folder -> algorithms by @erictang000 in #136
Forward mlflow env vars to ray runtime env by @etnbrd in #135
[Fix] Add NCCL_CUMEM_ENABLE=0 for vllm to address weight sync error by @CharlieFRuan in #143
[Generator] Support non-remote (e.g. colocated) SGLang engine by @CharlieFRuan in #68
[GPU CI] Add new workflow for GSM8k e2e convergence test by @erictang000 in #146
[algorithm] add rloo and reinforce++ advantage estimators + improve KL penalty handling by @erictang000 in #137
[FlashRL 1/N] Add support for truncated importance sampling by @SumanthRH in #145
[Fix] Reset registries in BaseFunctionRegistry.sync_with_actor() when needed and fix registry reset by @CharlieFRuan in #144
Optimize tuple unpacking in skyrl_gym_generator.generate() by @davenpi in #138
[Fix] revert "Optimize tuple unpacking in skyrl_gym_generator.generate() " by @SumanthRH in #150
[Gemini] Disable PR summaries for Gemini by @SumanthRH in #151
[fix] Fix the multi-engine launch script by @tyler-griggs in #157
[fix] Formatting fixes to silence linter by @tyler-griggs in #158
Support Flexible Scheduler by @benji-cannot-code in #142
[FlashRL 2/N] Support list of weights during weight sync for colocated training b...

Contributors

pcmoritz, bthecohen, and 12 other contributors

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

Contributors

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

Contributors

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Contributors

Uh oh!

Releases: NovaSky-AI/SkyRL

SkyRL-Train: v0.3.0

Highlights

What's Changed

Contributors

Uh oh!

SkyRL-Train: v0.2.0

Highlights

What's Changed

Contributors

Uh oh!

SkyRL-Train: v0.1.0

What's Changed

Contributors

Uh oh!