Releases: NovaSky-AI/SkyRL
SkyRL-Train: v0.3.0
Highlights
Asynchronous training: We now support fully asynchronous training in SkyRL, enabling higher throughput for agentic RL: https://skyrl.readthedocs.io/en/latest/tutorials/fully_async.html
Dependency Upgrades:
- Upgraded vLLM to 0.11.0, Ray to 2.51.1
- Megatron: Migrated from mbridge to the newer Megatron-Bridge library. The latter is expected to have more active development and support from NVIDIA.
The updated installation instructions can be found here.
Recipes: We've consolidated a list of end-to-end recipes with SkyRL here for reference runs on math, Text2SQL and search tasks.
SkyRL on Managed Platforms: Guides for running SkyRL on managed platforms such as Anyscale, Runpod and SkyPilot can be found here.
Miscellaneous: Support for GPT-OSS, integration with Pytorch's OpenEnv, support for IPv6 clusters, and more!
What's Changed
- [Examples][Step wise] Support thinking models like Qwen 3 by @SumanthRH in #468
- Modal Integration by @benji-cannot-code in #444
- [fix] abort all requests before sleep by @vutrung96 in #458
- TerminalBenchGenerator: logprobs + session ID by @li-boxuan in #448
- Divide-by-Zero when setting NUMA affinity patch by @matthambrecht in #457
- [bug] run linter for t-bench generator by @erictang000 in #476
- Bump vLLM version to 0.11.0 by @tyler-griggs in #481
- [Sequence parallel][train] Support sequence parallelism without sample packing by @SumanthRH in #480
- [fix] Resolve timeout and cleanup issues in GPU CI pipeline by @tyler-griggs in #483
- Increase timeout for GPU CI by @tyler-griggs in #485
- Skypilot: Update Doc by @lynnliu030 in #484
- Fix GPU CI Test Failures: Migrating Tests, NCCL P2P Access Errors, and Test Fixture Issues by @devpatelio in #477
- [Fix] Fix entropy calculation without sample packing by @SumanthRH in #490
- Skypilot: Multi-Node Test by @lynnliu030 in #493
- Support exporting environment-specific metrics by @vibha-ctrl in #386
- Fix broken import by @tyler-griggs in #500
- Revert "Bump vLLM version to 0.11.0" by @erictang000 in #501
- Fix broken entropy metric by @tyler-griggs in #504
- [fix] Resolve double ray.init() call by @tyler-griggs in #506
- [lora] fix lora with vllm offline engine by @erictang000 in #513
- Increase GPU CI Timeout to Pass All Tests by @devpatelio in #512
- [train] Increase default timeout for placement groups to 180s by @SumanthRH in #525
- [dependencies] fix some flash-rl dependency issues by @erictang000 in #530
- Add implementation of CISPO loss by @vutrung96 in #523
- [skyrl-train] assert that the policy loss type is regular/dual clip for tis by @erictang000 in #546
- [Fix] Fix
fsdp2_load_state_dictwith HSDP by @SumanthRH in #554 - [skyrl-train] update defaults for CISPO by @erictang000 in #553
- [GPTOSS] Integrate Unsloth's flex attention implementation for attention sink by @SumanthRH in #515
- [skyrl-train][logging] rename loss/avg_raw_rewards to loss/avg_final_rewards for clarity by @erictang000 in #544
- [Integrations] Support PyTorch OpenEnv by @lynnliu030 in #543
- [Docs] Fix image in OpenEnv doc by @SumanthRH in #562
- Remove truncation logic, fix corresponding tests by @devpatelio in #508
- [megatron][bug fix] reset dist checkpointing asynccallsqueue to allow freeing memory by @erictang000 in #565
- [dependencies] separate vllm + megatron + bump vllm back to 0.11.0 + pin minimum uv version for extra-build-dependencies by @erictang000 in #528
- [skyrl-train] Enable Inference Engine pipeline parallelism by @pandyamarut in #555
- [fix] Broken method call in test by @tyler-griggs in #571
- [AsyncRL][1/N] Add abort_generation to vllm engine and pause/continue generation to client by @CharlieFRuan in #537
- Update README.md about SkyRL-v0 reproduction by @caoshiyi in #573
- [AsyncRL][2/N] Implement /chat/completion with retry on aborted sub requests by @CharlieFRuan in #557
- [train][Logging] Set loguru default to INFO, and customizable by LOG_LEVEL by @CharlieFRuan in #578
- [skyrl-train][Fix] Fix epoch counter after resuming from checkpoint by @SumanthRH in #589
- [skyrl-train] Enforce eager by default by @SumanthRH in #569
- [skyrl-train][Fix] sleep only if colocated by @SumanthRH in #595
- Fix: Megatron Autograd Warning for Broadcast Kernel by @devpatelio in #588
- Comment by @devpatelio in #596
- Comment upda by @devpatelio in #597
- Cleanup stray doc by @SumanthRH in #599
- [skyrl-train] Make
libnumaoptional for training by @SumanthRH in #601 - [skyrl-train][Examples] Support truncated importance sampling for
StepWiseGeneratorby @SumanthRH in #570 - Add YaRN support for VLLM and HF by @sergeypastukhov-ddog in #561
- [Docs] Refactor documentation for running SkyRL on managed platforms by @SumanthRH in #608
- [train] Remove train_batch_size from fsdp/deepspeed strategy by @CharlieFRuan in #617
- [skyrl-train] add option to specify ref model path by @erictang000 in #623
- [skyrl-train] Add DAPO 7B recipe, and 32B training script by @erictang000 in #532
- [skyrl-train][recipes] add dapo qwen3 1.7b and 4b scripts by @erictang000 in #625
- Fix table formatting in DAPO README by @erictang000 in #631
- [train][utils] Aggregate rollout metrics and validate output in concat GeneratorOutput by @CharlieFRuan in #620
- [skyrl-train] Add example for on-policy distillation by @erictang000 in #585
- Support IPv6 addresses in TCP URL construction by @mayavkrishnan25 in #612
- [train][TBench] Cherrypick Terminus integration and use Harbor by @CharlieFRuan in #637
- [megatron] Added non cuda ipc wt sync to megatron workers by @nikhilbarhate99 in #635
- [docs] Add build instructions to README.md by @CharlieFRuan in #648
- Fix in README.md by @nrghosh in #653
- [skyrl-train][Fix] Fix FSDP1 module wrap policy for
HFModelWrapperby @SumanthRH in #654 - Return init_prompts in generate_batched by @ebronstein in #652
- [Docs] Fix model placement docs by @SumanthRH in #663
- [skyrl-train] Support older vllm versions till 0.9.2 by @SumanthRH in #671
- [lora] enforce_eager=true slows down generation time dramatically with LoRA by @devpatelio in #665
- Conditionally add the generation prompt to the multi-turn chat template by @ebronstein in #676
- Add entropy loss by @pbokc in #622
- [skyrl-train] Upgrade Ray to 2.51.1 by @SumanthRH in #633
- [Docs] Add a recipes page consolidating all E2E recipes by @SumanthRH in #679
- [skyrl-train][docs] Add commit for dapo to recipes and add megatron search-r1 results by @erictang000 in #689
- [megatron] upgrade from mbridge -> Megatron-Bridge (breaking change) by @erictang000 in #453
- [update] Updated RoPE Configuration for HF Model...
SkyRL-Train: v0.2.0
Highlights
This release contains 163 commits from 22 contributors, including 11 new contributors!
Megatron Backend: SkyRL now has full support for the Megatron training backend with 5D parallelism and strong support for large-scale MoE training. Learn more in our Megatron guide and examples.
LoRA Support: SkyRL now supports LoRA training with the FSDP backend and vLLM inference engine. Learn more in our LoRA guide and examples. We will continue aggressively improving LoRA support and performance, tracked in #449.
OpenAI API Compatibility: SkyRL has standardized around the OpenAI API for inference. This means that agents and agent scaffolds can call into the inference engine over the OpenAI API. SkyRL manages the inference engines and will provide a base_url to hit an OpenAI API compatible endpoint.
Integrations: Building on top of our standardization on OpenAI APIs, we integrated several popular environment and agentic projects. A couple highlights include:
- Prime Intellect's Environments Hub: A guide and examples can be found here.
- mini-swe-agent: A popular, minimal SWE agent implementation. A guide and examples can be found here.
What's Changed
- [Doc] LLM-judge by @lynnliu030 in #167
- [Gym] Add AIME by @lynnliu030 in #148
- [DAPO] Fix data preprocess script by @lynnliu030 in #172
- [FlashRL 3/N] Add example for FP8 training with FlashRL by @SumanthRH in #169
- get chat_end_index after env init by @etnbrd in #184
- [FlashRL N/N] Support Int-8 Rollouts by @SumanthRH in #176
- [Generator] Support token-in-token-out rollout by @CharlieFRuan in #152
- [Fix] Set skip_special_tokens to True in default sampling params by @pgasawa in #192
- [docs] Update libnuma installation instructions by @tyler-griggs in #191
- [docs][trivial] Add libnuma error in installation page by @CharlieFRuan in #196
- [cleanup] Make agent_loop output a dataclass by @tyler-griggs in #194
- [docs] Simplify installation appearance by @tyler-griggs in #200
- 1/N GPU CI Migration by @tyler-griggs in #195
- Increase GPU CI timeout to 1hr by @tyler-griggs in #205
- Clarify ObsType and the observation number by @etnbrd in #207
- [fix] Skip special tokens for sglang when decoding token ids by @CharlieFRuan in #210
- [doc fix] Correcting
ObsTypeinBaseTextEnvby @tyler-griggs in #209 - [Doc] Add a doc page for SkyRLGymGenerator, multi-turn rollout/tokenization by @CharlieFRuan in #186
- [tests] Fix gpu offload test by @erictang000 in #215
- [Fix][Generator] Correct chat history length for retoknenize codepath with env.init by @CharlieFRuan in #214
- [Generator][Env] Add stop str, remove need for post-processed action in search and txt2sql by @CharlieFRuan in #190
- [Examples] Minor fix after #190 by @SumanthRH in #216
- [InfEngines] Lift tokenization up to InferenceEngineClient by @tyler-griggs in #217
- [logs] Change print warnings to logger.warning by @CharlieFRuan in #219
- [2/N GPU CI Migration] Fix broken tests by @tyler-griggs in #220
- Bump vllm to 0.10.1.1 by @tyler-griggs in #225
- [Fix][CI] Fix GPU CI where search needs stop param by @CharlieFRuan in #232
- [Generator][HTTP] Add OpenAI API inference HTTP server for generator, support /chat/completions by @CharlieFRuan in #230
- [cleanup] Remove redundant sampling params code path by @tyler-griggs in #234
- [Fix][Generator] Use custom_chat_template in each step retokenization by @CharlieFRuan in #233
- [Fix] Fix failing sglang test after #234 by @SumanthRH in #237
- [Generator] Run env methods in threadpool executor by @alex-dr in #240
- [datasets] Support hugginface datasets by @tyler-griggs in #235
- [Generator] Support turn-level rewards in SkyRLGymGenerator by @tyler-griggs in #226
- [fix] finish mlflow run by @etnbrd in #243
- [trainer] Initial Megatron TP + PP Support by @erictang000 in #223
- 1/N Terminal Bench Integration by @tyler-griggs in #239
- [HTTP][Generator] Let vllm python engine handle OAI request, remove openai_api_protocol by @CharlieFRuan in #238
- [fix] Add missing required argument to OpenAIServingChat by @tyler-griggs in #247
- Cloud Checkpointing by @tyler-griggs in #248
- [fix] Extra argument in
cleanup_old_checkpointsby @tyler-griggs in #254 - [fix] Broken call to
io.existsby @tyler-griggs in #255 - Environments Hub integration by @tyler-griggs in #241
- [fix] Resolve CI error due to import error by @tyler-griggs in #256
- [fix] Bring back pretty log formatting by @tyler-griggs in #250
- Update verifiers readme by @tyler-griggs in #258
- Update README.md by @tyler-griggs in #259
- Revert "[fix] Bring back pretty log formatting" by @CharlieFRuan in #261
- [CI][Fix] Fix sglang import error by correcting pytest -m condition by @CharlieFRuan in #262
- [Algorithms] Add Clip-Cov and KL-Cov loss functions by @SumanthRH in #251
- [trainer][megatron] Fix for DP slower convergence by @erictang000 in #245
- [GPU CI] Skip integration tests if uv add fails by @SumanthRH in #264
- [uv] Ignore .venv by @SumanthRH in #263
- [trainer][megatron] Override fused attention with flash-attn when flash_attn=True for Megatron by @erictang000 in #265
- [Fix][CI] Address generator CI test fails when model stop reason is length by @CharlieFRuan in #269
- [tiny fix] Remove the example launch command for main files by @tyler-griggs in #268
- [InfEngineClient] Extract out routing logic to a helper by @CharlieFRuan in #267
- Reuse mlflow if exists by @sergeypastukhov-ddog in #257
- [trainer][megatron] Sequence packing + Context Parallel for Megatron by @erictang000 in #274
- [trainer][megatron] make megatron config directly accessibile through trainer field by @erictang000 in #275
- [Examples] Add an example for training on SWEBench task with Mini-SWE-Agent by @SumanthRH in #222
- [Examples][Fix] Minor cleanup for the Mini-SWE-Agent example by @SumanthRH in #281
- Remove simplecoder example by @pcmoritz in #282
- [Fix] Fix pass_at_k missing for SkRLGymGenerator.agent_loop flow due to token-level rewards by @CharlieFRuan in #271
- Add config validation for train batch size vs dp size by @tyler-griggs in #229
- Add configurable timeout for placement groups by @SumanthRH in #276
- [Feature] Eval Only Entry Point by @benji-cannot-code in #171
- [fix] Fix the pretty trainer logging by @tyler-griggs in #270
- [Examples] Add a README for the Mini-SWE-Agent example by @SumanthRH in #287
- [MoE] Add support for vLLM inference expert parallelism by @tyler-griggs in #159
- [Generator][HTTP] Support /completion endpoint for token-in-token-out by @CharlieFRuan in #260
- Wandb same-step logging by @tyler-griggs in #289
- [GPU CI][E2E] Increase timeout for ...
SkyRL-Train: v0.1.0
What's Changed
- SkyRL-Train + SkyGym code by @SumanthRH in #27
- Add ReadTheDocs deployment yaml by @SumanthRH in #28
- [fix] Update fsdp_config.max_norm to optimizer_config.max_grad_norm by @CharlieFRuan in #30
- Rename
skgymtoskyrl-gymby @tyler-griggs in #31 - Improve installation instructions in README and docs by @SumanthRH in #32
- fixing minor issues with examples by @erictang000 in #29
- [docs] Fix git clone link by @CharlieFRuan in #34
- [Docs] Add more examples to the docs by @erictang000 in #37
- [Docs] Add gym tools doc by @tyler-griggs in #36
- Use
skyrl-gymfrom PyPI for easier dependency management by @SumanthRH in #35 - Doc fixes by @tyler-griggs in #33
- Update READMEs by @tyler-griggs in #40
- [Docs] Add more detailed example for Text2SQL by @erictang000 in #39
- [sglang] Add patch for sglang by @SumanthRH in #42
- [Docs] Add ray runtime env hook for uv to docs by @SumanthRH in #41
- [Docs] Improve docs sidebar by @SumanthRH in #43
- Rename docs from SkyRL-Train to SkyRL by @tyler-griggs in #44
- Minor edits by @SumanthRH in #45
- add sql repro to skyrl-train readme by @erictang000 in #46
- [Docs] Minor fixes to venv creation in the docs by @SumanthRH in #47
- [fix] In docs and scripts, test->validation.parquet, and ["x"] -> "['x']" by @CharlieFRuan in #48
- [Docs] add system overview doc (pt. 1) by @tyler-griggs in #50
- [docs] add a short guide on evaluation, and behavior of having multiple eval datasets by @CharlieFRuan in #49
- [trainer] Fix fsdp warmup steps + move warmup to optimizer config by @erictang000 in #52
- Make deepspeed optional, so it is not initialized if FSDP backend is used by @pcmoritz in #59
- fix: Updated utils.py to fix stop token issue by @AtakanTekparmak in #56
- [eval] fix eval batch < dp_size edge case by @erictang000 in #62
- [Fix] Minor SQL fixes by @tyler-griggs in #61
- [generator] Add
max_env_workersargument for generator thread pool executor by @erictang000 in #53 - Add Apache 2.0 License by @tyler-griggs in #63
- [Cleanup] Many small fixes and improvements by @tyler-griggs in #64
- Minor cleanup pt 2. by @tyler-griggs in #67
- Revert "Minor cleanup pt 2." by @tyler-griggs in #69
- [Cleanup] Remove unused
normalize_rewardcodepath in CriticModel by @SumanthRH in #51 - [Installation] Use skyrl-gym as a symlink for easier development; Add a developer guide by @SumanthRH in #71
- SearchR1 reproduction fixes by @erictang000 in #65
- [Data] Seed the dataloader for reproducibility by @tyler-griggs in #77
- [skyrl-gym] search env cleanup by @erictang000 in #75
- Minor cleanup by @tyler-griggs in #78
- fix: move
stop_reasonchecking logic before list truncation by @hank0316 in #81 - Very simple local coding sandbox example by @pcmoritz in #80
- [Dependencies] Upgrade to torch 2.7 by @SumanthRH in #73
- Update README.md by @tyler-griggs in #85
- Add docs badge to README by @CharlieFRuan in #86
- [Installation] Update docs to include
libnumainstallation by @SumanthRH in #89 - Support token-level loss, make default by @tyler-griggs in #90
- [Installation] Fix Dockerfile after CUDA 12.8 upgrade by @SumanthRH in #91
- [Cleanup] Remove unwanted NCCL env vars by @SumanthRH in #92
- Add pre-commit hook for gitleaks by @SumanthRH in #93
- [FIX] Garbage collect temp buffers after checkpoint by @tyler-griggs in #94
- [Bugfix] Disable vllm compilation cache due to compilation issues by @SumanthRH in #95
- [Bugfix] Fix env vars after #95 by @SumanthRH in #98
- [GPU CI 1/N] Init GPU CI on Anyscale by @SumanthRH in #102
- [Docs] Add docs for running on an existing ray cluster by @SumanthRH in #105
- [Trainer] Support per-token rewards in trainer by @SumanthRH in #109
- Add check for whether p2p access is supported - allows code to run on L4/L40S after #73 upgrade to cuda 12.8 by @erictang000 in #108
- [dependencies] Upgrade ray to 2.48.0 by @erictang000 in #106
- fix issue with #108 that broke gpu ci by @erictang000 in #112
- Add warning for certain uv versions due to
uv run --withregression by @SumanthRH in #113 - [GPU CI] Only trigger workflow for relevant changes in
skyrl-trainby @SumanthRH in #114 - [bug] Loading saved HF weights errors by @erictang000 in #118
- [DAPO] Add support for overlong filtering by @tyler-griggs in #111
- [skyrl-gym] GSM8k - LLM Judge example by @lynnliu030 in #74
- Fix MLFlow logging by @bthecohen in #121
- [Trainer] Support registering custom advantage estimators by @tyler-griggs in #115
- [checkpointing] Add HF model config and tokenizer config to checkpoint folder by @erictang000 in #124
- Fix discord link by @tyler-griggs in #125
- Fix broken link by @tyler-griggs in #128
- [Trainer/Algorithm] Support registering custom policy loss functions + refactor adv estimator registry to allow registration outside ray workers by @erictang000 in #126
- [trainer] add more robust generation output validation by @erictang000 in #132
- [Trainer] GSPO support by @bthecohen in #120
- [trainer/algorithm] Implement DAPO and Polaris style dynamic sampling + add DAPO docs + example by @erictang000 in #130
- [algorithm] Support Dr. GRPO + refactor where policy/critic loss functions are set by @erictang000 in #133
- [fix] move algorithm folder -> algorithms by @erictang000 in #136
- Forward mlflow env vars to ray runtime env by @etnbrd in #135
- [Fix] Add NCCL_CUMEM_ENABLE=0 for vllm to address weight sync error by @CharlieFRuan in #143
- [Generator] Support non-remote (e.g. colocated) SGLang engine by @CharlieFRuan in #68
- [GPU CI] Add new workflow for GSM8k e2e convergence test by @erictang000 in #146
- [algorithm] add rloo and reinforce++ advantage estimators + improve KL penalty handling by @erictang000 in #137
- [FlashRL 1/N] Add support for truncated importance sampling by @SumanthRH in #145
- [Fix] Reset registries in BaseFunctionRegistry.sync_with_actor() when needed and fix registry reset by @CharlieFRuan in #144
- Optimize tuple unpacking in skyrl_gym_generator.generate() by @davenpi in #138
- [Fix] revert "Optimize tuple unpacking in skyrl_gym_generator.generate() " by @SumanthRH in #150
- [Gemini] Disable PR summaries for Gemini by @SumanthRH in #151
- [fix] Fix the multi-engine launch script by @tyler-griggs in #157
- [fix] Formatting fixes to silence linter by @tyler-griggs in #158
- Support Flexible Scheduler by @benji-cannot-code in #142
- [FlashRL 2/N] Support list of weights during weight sync for colocated training b...