Highlights
Asynchronous training: We now support fully asynchronous training in SkyRL, enabling higher throughput for agentic RL: https://skyrl.readthedocs.io/en/latest/tutorials/fully_async.html
Dependency Upgrades:
- Upgraded vLLM to 0.11.0, Ray to 2.51.1
- Megatron: Migrated from mbridge to the newer Megatron-Bridge library. The latter is expected to have more active development and support from NVIDIA.
The updated installation instructions can be found here.
Recipes: We've consolidated a list of end-to-end recipes with SkyRL here for reference runs on math, Text2SQL and search tasks.
SkyRL on Managed Platforms: Guides for running SkyRL on managed platforms such as Anyscale, Runpod and SkyPilot can be found here.
Miscellaneous: Support for GPT-OSS, integration with Pytorch's OpenEnv, support for IPv6 clusters, and more!
What's Changed
- [Examples][Step wise] Support thinking models like Qwen 3 by @SumanthRH in #468
- Modal Integration by @benji-cannot-code in #444
- [fix] abort all requests before sleep by @vutrung96 in #458
- TerminalBenchGenerator: logprobs + session ID by @li-boxuan in #448
- Divide-by-Zero when setting NUMA affinity patch by @matthambrecht in #457
- [bug] run linter for t-bench generator by @erictang000 in #476
- Bump vLLM version to 0.11.0 by @tyler-griggs in #481
- [Sequence parallel][train] Support sequence parallelism without sample packing by @SumanthRH in #480
- [fix] Resolve timeout and cleanup issues in GPU CI pipeline by @tyler-griggs in #483
- Increase timeout for GPU CI by @tyler-griggs in #485
- Skypilot: Update Doc by @lynnliu030 in #484
- Fix GPU CI Test Failures: Migrating Tests, NCCL P2P Access Errors, and Test Fixture Issues by @devpatelio in #477
- [Fix] Fix entropy calculation without sample packing by @SumanthRH in #490
- Skypilot: Multi-Node Test by @lynnliu030 in #493
- Support exporting environment-specific metrics by @vibha-ctrl in #386
- Fix broken import by @tyler-griggs in #500
- Revert "Bump vLLM version to 0.11.0" by @erictang000 in #501
- Fix broken entropy metric by @tyler-griggs in #504
- [fix] Resolve double ray.init() call by @tyler-griggs in #506
- [lora] fix lora with vllm offline engine by @erictang000 in #513
- Increase GPU CI Timeout to Pass All Tests by @devpatelio in #512
- [train] Increase default timeout for placement groups to 180s by @SumanthRH in #525
- [dependencies] fix some flash-rl dependency issues by @erictang000 in #530
- Add implementation of CISPO loss by @vutrung96 in #523
- [skyrl-train] assert that the policy loss type is regular/dual clip for tis by @erictang000 in #546
- [Fix] Fix
fsdp2_load_state_dictwith HSDP by @SumanthRH in #554 - [skyrl-train] update defaults for CISPO by @erictang000 in #553
- [GPTOSS] Integrate Unsloth's flex attention implementation for attention sink by @SumanthRH in #515
- [skyrl-train][logging] rename loss/avg_raw_rewards to loss/avg_final_rewards for clarity by @erictang000 in #544
- [Integrations] Support PyTorch OpenEnv by @lynnliu030 in #543
- [Docs] Fix image in OpenEnv doc by @SumanthRH in #562
- Remove truncation logic, fix corresponding tests by @devpatelio in #508
- [megatron][bug fix] reset dist checkpointing asynccallsqueue to allow freeing memory by @erictang000 in #565
- [dependencies] separate vllm + megatron + bump vllm back to 0.11.0 + pin minimum uv version for extra-build-dependencies by @erictang000 in #528
- [skyrl-train] Enable Inference Engine pipeline parallelism by @pandyamarut in #555
- [fix] Broken method call in test by @tyler-griggs in #571
- [AsyncRL][1/N] Add abort_generation to vllm engine and pause/continue generation to client by @CharlieFRuan in #537
- Update README.md about SkyRL-v0 reproduction by @caoshiyi in #573
- [AsyncRL][2/N] Implement /chat/completion with retry on aborted sub requests by @CharlieFRuan in #557
- [train][Logging] Set loguru default to INFO, and customizable by LOG_LEVEL by @CharlieFRuan in #578
- [skyrl-train][Fix] Fix epoch counter after resuming from checkpoint by @SumanthRH in #589
- [skyrl-train] Enforce eager by default by @SumanthRH in #569
- [skyrl-train][Fix] sleep only if colocated by @SumanthRH in #595
- Fix: Megatron Autograd Warning for Broadcast Kernel by @devpatelio in #588
- Comment by @devpatelio in #596
- Comment upda by @devpatelio in #597
- Cleanup stray doc by @SumanthRH in #599
- [skyrl-train] Make
libnumaoptional for training by @SumanthRH in #601 - [skyrl-train][Examples] Support truncated importance sampling for
StepWiseGeneratorby @SumanthRH in #570 - Add YaRN support for VLLM and HF by @sergeypastukhov-ddog in #561
- [Docs] Refactor documentation for running SkyRL on managed platforms by @SumanthRH in #608
- [train] Remove train_batch_size from fsdp/deepspeed strategy by @CharlieFRuan in #617
- [skyrl-train] add option to specify ref model path by @erictang000 in #623
- [skyrl-train] Add DAPO 7B recipe, and 32B training script by @erictang000 in #532
- [skyrl-train][recipes] add dapo qwen3 1.7b and 4b scripts by @erictang000 in #625
- Fix table formatting in DAPO README by @erictang000 in #631
- [train][utils] Aggregate rollout metrics and validate output in concat GeneratorOutput by @CharlieFRuan in #620
- [skyrl-train] Add example for on-policy distillation by @erictang000 in #585
- Support IPv6 addresses in TCP URL construction by @mayavkrishnan25 in #612
- [train][TBench] Cherrypick Terminus integration and use Harbor by @CharlieFRuan in #637
- [megatron] Added non cuda ipc wt sync to megatron workers by @nikhilbarhate99 in #635
- [docs] Add build instructions to README.md by @CharlieFRuan in #648
- Fix in README.md by @nrghosh in #653
- [skyrl-train][Fix] Fix FSDP1 module wrap policy for
HFModelWrapperby @SumanthRH in #654 - Return init_prompts in generate_batched by @ebronstein in #652
- [Docs] Fix model placement docs by @SumanthRH in #663
- [skyrl-train] Support older vllm versions till 0.9.2 by @SumanthRH in #671
- [lora] enforce_eager=true slows down generation time dramatically with LoRA by @devpatelio in #665
- Conditionally add the generation prompt to the multi-turn chat template by @ebronstein in #676
- Add entropy loss by @pbokc in #622
- [skyrl-train] Upgrade Ray to 2.51.1 by @SumanthRH in #633
- [Docs] Add a recipes page consolidating all E2E recipes by @SumanthRH in #679
- [skyrl-train][docs] Add commit for dapo to recipes and add megatron search-r1 results by @erictang000 in #689
- [megatron] upgrade from mbridge -> Megatron-Bridge (breaking change) by @erictang000 in #453
- [update] Updated RoPE Configuration for HF Models (transformers) w. backward-compatible support for vLLM by @devpatelio in #690
- Revert "[skyrl-train] Updated RoPE Configuration for HF Models transformers) w. backward-compatible support for vLLM (#690)" by @SumanthRH in #695
- [skyrl-train][megatron] Remove use of PYTHONPATH for getting around transformer-engine installation by @erictang000 in #697
- [megatron] improving weight syncing - bucketed param gather + cuda ipc flattening by @erictang000 in #487
- [megatron] separate offloading gradients from offloading params for megatron by @erictang000 in #563
- Update trainer docstrings that values has shape batch_size x seqlen by @ebronstein in #687
- [skyrl-train][step-wise] 1/N - Support step-wise training with
step_wise_trainingflag by @SumanthRH in #694 - Revert "[skyrl-train][step-wise] 1/N - Support step-wise training with
step_wise_trainingflag" by @CharlieFRuan in #706 - [AsyncRL][3/N] Support fully async training for any generator by @CharlieFRuan in #579
- [AsyncRL][4/N] Support in-flight weight update for generate() by @CharlieFRuan in #656
- [train][TBench][MiniSwe] Fix custom generator loss masking by @CharlieFRuan in #710
- [skyrl-train] fix docs link to async off by one by @erictang000 in #712
- SkyRL-Agent Release Part 1 by @caoshiyi in #713
New Contributors
- @vutrung96 made their first contribution in #458
- @li-boxuan made their first contribution in #448
- @matthambrecht made their first contribution in #457
- @cwwarren made their first contribution in #466
- @vibha-ctrl made their first contribution in #386
- @atemaguer made their first contribution in #486
- @pandyamarut made their first contribution in #555
- @caoshiyi made their first contribution in #573
- @pbokc made their first contribution in #586
- @mayavkrishnan25 made their first contribution in #612
- @nikhilbarhate99 made their first contribution in #635
- @hezyin made their first contribution in #636
- @nrghosh made their first contribution in #653
- @taroigarashi2001 made their first contribution in #681
Full Changelog: skyrl_train-v0.2.0...skyrl_train-v0.3.0