Skip to content

SkyRL-Train: v0.3.0

Latest

Choose a tag to compare

@SumanthRH SumanthRH released this 03 Dec 17:07

Highlights

Asynchronous training: We now support fully asynchronous training in SkyRL, enabling higher throughput for agentic RL: https://skyrl.readthedocs.io/en/latest/tutorials/fully_async.html

Dependency Upgrades:

  • Upgraded vLLM to 0.11.0, Ray to 2.51.1
  • Megatron: Migrated from mbridge to the newer Megatron-Bridge library. The latter is expected to have more active development and support from NVIDIA.

The updated installation instructions can be found here.

Recipes: We've consolidated a list of end-to-end recipes with SkyRL here for reference runs on math, Text2SQL and search tasks.

SkyRL on Managed Platforms: Guides for running SkyRL on managed platforms such as Anyscale, Runpod and SkyPilot can be found here.

Miscellaneous: Support for GPT-OSS, integration with Pytorch's OpenEnv, support for IPv6 clusters, and more!

What's Changed

  • [Examples][Step wise] Support thinking models like Qwen 3 by @SumanthRH in #468
  • Modal Integration by @benji-cannot-code in #444
  • [fix] abort all requests before sleep by @vutrung96 in #458
  • TerminalBenchGenerator: logprobs + session ID by @li-boxuan in #448
  • Divide-by-Zero when setting NUMA affinity patch by @matthambrecht in #457
  • [bug] run linter for t-bench generator by @erictang000 in #476
  • Bump vLLM version to 0.11.0 by @tyler-griggs in #481
  • [Sequence parallel][train] Support sequence parallelism without sample packing by @SumanthRH in #480
  • [fix] Resolve timeout and cleanup issues in GPU CI pipeline by @tyler-griggs in #483
  • Increase timeout for GPU CI by @tyler-griggs in #485
  • Skypilot: Update Doc by @lynnliu030 in #484
  • Fix GPU CI Test Failures: Migrating Tests, NCCL P2P Access Errors, and Test Fixture Issues by @devpatelio in #477
  • [Fix] Fix entropy calculation without sample packing by @SumanthRH in #490
  • Skypilot: Multi-Node Test by @lynnliu030 in #493
  • Support exporting environment-specific metrics by @vibha-ctrl in #386
  • Fix broken import by @tyler-griggs in #500
  • Revert "Bump vLLM version to 0.11.0" by @erictang000 in #501
  • Fix broken entropy metric by @tyler-griggs in #504
  • [fix] Resolve double ray.init() call by @tyler-griggs in #506
  • [lora] fix lora with vllm offline engine by @erictang000 in #513
  • Increase GPU CI Timeout to Pass All Tests by @devpatelio in #512
  • [train] Increase default timeout for placement groups to 180s by @SumanthRH in #525
  • [dependencies] fix some flash-rl dependency issues by @erictang000 in #530
  • Add implementation of CISPO loss by @vutrung96 in #523
  • [skyrl-train] assert that the policy loss type is regular/dual clip for tis by @erictang000 in #546
  • [Fix] Fix fsdp2_load_state_dict with HSDP by @SumanthRH in #554
  • [skyrl-train] update defaults for CISPO by @erictang000 in #553
  • [GPTOSS] Integrate Unsloth's flex attention implementation for attention sink by @SumanthRH in #515
  • [skyrl-train][logging] rename loss/avg_raw_rewards to loss/avg_final_rewards for clarity by @erictang000 in #544
  • [Integrations] Support PyTorch OpenEnv by @lynnliu030 in #543
  • [Docs] Fix image in OpenEnv doc by @SumanthRH in #562
  • Remove truncation logic, fix corresponding tests by @devpatelio in #508
  • [megatron][bug fix] reset dist checkpointing asynccallsqueue to allow freeing memory by @erictang000 in #565
  • [dependencies] separate vllm + megatron + bump vllm back to 0.11.0 + pin minimum uv version for extra-build-dependencies by @erictang000 in #528
  • [skyrl-train] Enable Inference Engine pipeline parallelism by @pandyamarut in #555
  • [fix] Broken method call in test by @tyler-griggs in #571
  • [AsyncRL][1/N] Add abort_generation to vllm engine and pause/continue generation to client by @CharlieFRuan in #537
  • Update README.md about SkyRL-v0 reproduction by @caoshiyi in #573
  • [AsyncRL][2/N] Implement /chat/completion with retry on aborted sub requests by @CharlieFRuan in #557
  • [train][Logging] Set loguru default to INFO, and customizable by LOG_LEVEL by @CharlieFRuan in #578
  • [skyrl-train][Fix] Fix epoch counter after resuming from checkpoint by @SumanthRH in #589
  • [skyrl-train] Enforce eager by default by @SumanthRH in #569
  • [skyrl-train][Fix] sleep only if colocated by @SumanthRH in #595
  • Fix: Megatron Autograd Warning for Broadcast Kernel by @devpatelio in #588
  • Comment by @devpatelio in #596
  • Comment upda by @devpatelio in #597
  • Cleanup stray doc by @SumanthRH in #599
  • [skyrl-train] Make libnuma optional for training by @SumanthRH in #601
  • [skyrl-train][Examples] Support truncated importance sampling for StepWiseGenerator by @SumanthRH in #570
  • Add YaRN support for VLLM and HF by @sergeypastukhov-ddog in #561
  • [Docs] Refactor documentation for running SkyRL on managed platforms by @SumanthRH in #608
  • [train] Remove train_batch_size from fsdp/deepspeed strategy by @CharlieFRuan in #617
  • [skyrl-train] add option to specify ref model path by @erictang000 in #623
  • [skyrl-train] Add DAPO 7B recipe, and 32B training script by @erictang000 in #532
  • [skyrl-train][recipes] add dapo qwen3 1.7b and 4b scripts by @erictang000 in #625
  • Fix table formatting in DAPO README by @erictang000 in #631
  • [train][utils] Aggregate rollout metrics and validate output in concat GeneratorOutput by @CharlieFRuan in #620
  • [skyrl-train] Add example for on-policy distillation by @erictang000 in #585
  • Support IPv6 addresses in TCP URL construction by @mayavkrishnan25 in #612
  • [train][TBench] Cherrypick Terminus integration and use Harbor by @CharlieFRuan in #637
  • [megatron] Added non cuda ipc wt sync to megatron workers by @nikhilbarhate99 in #635
  • [docs] Add build instructions to README.md by @CharlieFRuan in #648
  • Fix in README.md by @nrghosh in #653
  • [skyrl-train][Fix] Fix FSDP1 module wrap policy for HFModelWrapper by @SumanthRH in #654
  • Return init_prompts in generate_batched by @ebronstein in #652
  • [Docs] Fix model placement docs by @SumanthRH in #663
  • [skyrl-train] Support older vllm versions till 0.9.2 by @SumanthRH in #671
  • [lora] enforce_eager=true slows down generation time dramatically with LoRA by @devpatelio in #665
  • Conditionally add the generation prompt to the multi-turn chat template by @ebronstein in #676
  • Add entropy loss by @pbokc in #622
  • [skyrl-train] Upgrade Ray to 2.51.1 by @SumanthRH in #633
  • [Docs] Add a recipes page consolidating all E2E recipes by @SumanthRH in #679
  • [skyrl-train][docs] Add commit for dapo to recipes and add megatron search-r1 results by @erictang000 in #689
  • [megatron] upgrade from mbridge -> Megatron-Bridge (breaking change) by @erictang000 in #453
  • [update] Updated RoPE Configuration for HF Models (transformers) w. backward-compatible support for vLLM by @devpatelio in #690
  • Revert "[skyrl-train] Updated RoPE Configuration for HF Models transformers) w. backward-compatible support for vLLM (#690)" by @SumanthRH in #695
  • [skyrl-train][megatron] Remove use of PYTHONPATH for getting around transformer-engine installation by @erictang000 in #697
  • [megatron] improving weight syncing - bucketed param gather + cuda ipc flattening by @erictang000 in #487
  • [megatron] separate offloading gradients from offloading params for megatron by @erictang000 in #563
  • Update trainer docstrings that values has shape batch_size x seqlen by @ebronstein in #687
  • [skyrl-train][step-wise] 1/N - Support step-wise training with step_wise_training flag by @SumanthRH in #694
  • Revert "[skyrl-train][step-wise] 1/N - Support step-wise training with step_wise_training flag" by @CharlieFRuan in #706
  • [AsyncRL][3/N] Support fully async training for any generator by @CharlieFRuan in #579
  • [AsyncRL][4/N] Support in-flight weight update for generate() by @CharlieFRuan in #656
  • [train][TBench][MiniSwe] Fix custom generator loss masking by @CharlieFRuan in #710
  • [skyrl-train] fix docs link to async off by one by @erictang000 in #712
  • SkyRL-Agent Release Part 1 by @caoshiyi in #713

New Contributors

Full Changelog: skyrl_train-v0.2.0...skyrl_train-v0.3.0