Highlights
New unified package : This is the first release of the unified skyrl package combining the skyrl-train and skyrl-tx packages. The unified package brings together the FSDP, Megatron and Jax backend under the Tinker API, while still retaining user-facing "frontend" interfaces (Ex: BasePPOTrainer, SkyRLGymGenerator, etc) from the skyrl-train and skyrl-tx packages. For details about the migration, please refer to #1145.
Improved API documentation: We've revamped the API documentation pages for SkyRL The new API documentation pages can be found here: https://docs.skyrl.ai/api-ref
Pythonic Configs: The skyrl-train backend has now fully migrated to using pythonic dataclasses, replacing the older YAML based interface. The configuration hierarchy has also been updated, and the CLI no longer relies on Hydra. Please refer to the documentation for the new configuration hierarchy: https://docs.skyrl.ai/docs/api-ref/skyrl/config
SGLang is no longer supported: SkyRL no longer supports the SGLang inference engine, unifying on vLLM.
vllm 0.16.0 upgrade: This release updates vLLM to 0.16.0
Qwen 3.5 experimental support: SkyRL now has experimental support for Qwen 3.5 models. This is currently limited to the Jax backend.
What's Changed
- [tx] Load test weights from HF cache instead of save_pretrained by @raulchen in #1095
- [tx] Fix stacked sharding for flax >= 0.12.4 by @pcmoritz in #1120
- [tx] Stack weights — Llama3 by @raulchen in #1081
- [Harbor] Rename TerminalBenchGenerator -> HarborGenerator by @CharlieFRuan in #1122
- [tx] Stack weights — DeepSeek by @raulchen in #1082
- [Harbor][Docs] Add Harbor docs and make data prepartion streamlined by @CharlieFRuan in #1124
- [tx] Per-layer gradient checkpointing by @raulchen in #1083
- [tx] Make apply_lora work with flattened states by @tanmaysachan in #1111
- Port #1079 to skyrl folder by @pcmoritz in #1127
- Fix linting on main branch by @pcmoritz in #1128
- Port #1095 to skyrl folder by @pcmoritz in #1129
- Port #1120 to skyrl folder by @pcmoritz in #1130
- [migration] copy old docs, examples, integrations, scripts by @erictang000 in #1133
- [1/N] Remove
skyrlfolder on main by @erictang000 in #1138 - [2/N] Add back skyrl code to top level and delete skyrl-train and skyrl-tx to get git history by @erictang000 in #1139
- [3/N] Add back skyrl-train and skyrl-tx code by @erictang000 in #1140
- [4/N] Fix CI paths for new
skyrlcode now that we have removed one level of nesting by @erictang000 in #1141 - fix linting error in skyrl-cpu CI (ignore agent and examples) and change CI scope by @erictang000 in #1143
- Update README with repo re-organization notice by @CharlieFRuan in #1146
- Update README about reorganizing by @CharlieFRuan in #1147
- [Reorg] Update examples/train scripts to use new skyrl package by @CharlieFRuan in #1148
- [Harbor] Make the data preparation scripts simpler and use
examples/trainas main entrypoint by @CharlieFRuan in #1150 - [Harbor] Move examples/train/harbor to examples/train_integrations/harbor by @CharlieFRuan in #1151
- Redirect to new Harbor integration documentation by @CharlieFRuan in #1152
- [train][logs] Separate infrastructure logs from training progress by @CharlieFRuan in #1088
- [train][Config] Make
trainer.algorithm.max_seq_lenconfigurable, for seq_mean_token_sum_norm by @CharlieFRuan in #1153 - [Harbor] Add overlong filtering support and add Dr. GRPO params to script by @CharlieFRuan in #1157
- [Harbor] Expose configs override_timeout_sec and env-specific auto stop by @CharlieFRuan in #1158
- [tx] Pass through the loss function config by @pcmoritz in #1155
- Port #1081 to skyrl folder by @pcmoritz in #1131
- Port #1082 to skyrl folder by @pcmoritz in #1161
- Port #1083 to skyrl folder by @pcmoritz in #1162
- Port #1111 to skyrl folder by @pcmoritz in #1164
- [tx] Fix loss function config keys and add validation by @pcmoritz in #1159
- Add CISPO loss function to skyrl-tx by @tamoghnokandar in #1144
- [Harbor] Move rate limit out of TrialConfig to +generator.rate_limit by @CharlieFRuan in #1165
- [Harbor][Doc] Update Harbor docs by @CharlieFRuan in #1163
- [Docs] Small fix on doc linking by @CharlieFRuan in #1166
- Port #1155 to skyrl folder by @pcmoritz in #1167
- Port #1159 to skyrl folder by @pcmoritz in #1168
- Port #1144 to skyrl folder by @pcmoritz in #1169
- Update readme for blogposts by @CharlieFRuan in #1171
- Finish skyrl-tx migration to skyrl folder by @pcmoritz in #1170
- [tx] Add metrics to optim_step by @pcmoritz in #1142
- [train] Support logprobs, fix generation config defaults and add more generation tests for the new HTTP inference pathway by @SumanthRH in #1038
- [Harbor] Bump harbor version by @CharlieFRuan in #1174
- [ci][megatron] fix megatron CI by including numpy in override-dependencies by @erictang000 in #1175
- [ci] Migrate nightly e2e CI tests to run against
skyrldirectory by @erictang000 in #1176 - [deps] remove numpy from override dependencies by @erictang000 in #1178
- [docs] Add API reference to new docs site and cleanup old docs by @SumanthRH in #1177
- [CI][Megatron] Use
ray_init_fixtureintest_megatron_policy_weight_syncby @SumanthRH in #1183 - [Docs] Fix Runpod example to use the new
skyrlpackage by @SumanthRH in #1185 - [tests] Fix template path in
test_inference_engine_http_endpointby @SumanthRH in #1186 - [tests] Fix remote server startup command and sleep level for engine generation tests by @SumanthRH in #1190
- [tx] Introduce optimization step metrics dataclass by @pcmoritz in #1191
- [Docs] Fix API reference docs for SkyRLTrain backend by @SumanthRH in #1192
- [Harbor] Replace asyncio.gather with TaskGroup for proper cancellation by @CharlieFRuan in #1193
- [tx] General implementation of trainable Hyper Connections by @tanmaysachan in #1008
- Port #1191 to skyrl folder by @pcmoritz in #1197
- [train] Add explicit
finishcalls for tracker when training ends by @SumanthRH in #1198 - [train][CI] Add regression thresholds for E2E CI runs by @SumanthRH in #1199
- [train][tests] Fix hanging test
test_abort_generation_vllm_engineby @SumanthRH in #1202 - [tests][train] Fix repeated arg in
test_loraand cleanup logic intest_skyrl_gym_generatorby @SumanthRH in #1209 - [train] Pythonic Configs 2/N: Switch to new dataclasses in entrypoint scripts; Change instantiation signatures by @SumanthRH in #1187
- [tx] Port #1008 to skyrl folder by @pcmoritz in #1217
- [tx] Update README.md by @pcmoritz in #1223
- [tx] Fix checkpoint loading for MoE models by @pcmoritz in #1224
- [Docs] Migrate to Griffe2md for fumadocs-compatible API reference by @SumanthRH in #1225
- [tx] Fix engine command on TPUs by @pcmoritz in #1181
- [train] Remove deprecated SGLang backend from
skyrl/by @SumanthRH in #1220 - Deepcopy env_extras dicts for each sample when preparing generator input by @ebronstein in #1189
- [tx] Update README.md and move tx over to skyrl/ folder by @pcmoritz in #1226
- [CI] Fix failing docs build; Fix GPU OOM for new inference codepath; Improve post-processing in Search env by @SumanthRH in #1221
- [Docs] Add source code blocks for methods by @SumanthRH in #1229
- [train] Fix issue with unset
pad_token_idby @SumanthRH in #1232 - [train][tests] Add tests for
RemoteInferenceClientby @SumanthRH in #1211 - [agent] Migrate
skyrl-agentto use the newskyrlpackage by @SumanthRH in #1235 - [agent][Fix] Fix
SkyRLAgentPPOTrainerafter switch toasyncby @SumanthRH in #1237 - [tx] Fix tied embedding closure for TPUs by @pcmoritz in #1233
- Add Daytona to acknowledgements in README by @CharlieFRuan in #1239
- Fix Megatron backend correctness: grad_scale_func, PP seed variation, weight sync pause by @tyler-griggs in #1212
- Add configurable MoE runtime flags to MegatronConfig by @tyler-griggs in #1213
- Register GLM-4.7-Flash bridge and bump megatron-bridge by @tyler-griggs in #1214
- Bump vLLM to 0.16.0 with required dep updates by @tyler-griggs in #1240
- Add GLM-4.7-Flash (30B MoE) training example by @tyler-griggs in #1215
- [train] Cleanup docs references to
skyrl-trainby @SumanthRH in #1236 - [train][examples] Fix 8 broken example scripts from skyrl-train migration by @CharlieFRuan in #1230
- [train][docs] Fix stale test docstrings and remaining docs references from skyrl-train migration by @CharlieFRuan in #1246
- [deps] bump megatron core to 0.16.0 by @erictang000 in #1247
- [train] Fix maximum context length handling and generation tests after 0.16.0 upgrade by @SumanthRH in #1248
- [train][fullyAsync] Fix abort/pause broken after vllm 0.16.0 bump by @CharlieFRuan in #1250
- [train][docs] Remove old docs in
skyrl/train/old_docsby @SumanthRH in #1253 - [train] Remove deprecated
skyrl-trainpackage by @SumanthRH in #1249 - [tx] Implement Qwen 3.5 model architecture by @pcmoritz in #1228
- [train][fix] Temporarily rename
/update_weightsendpoint to prevent conflict with native vllm endpoint by @SumanthRH in #1265 - [megatron] refactor megatron param/grad offload to directly use new megatron built in functions by @erictang000 in #1266
- [megatron] replace deprecated megatron checkpointing default, set "distrib_optim_sharding_type" to "dp_reshardable" by @erictang000 in #1268
- [docs] Fix
examples/paths in the docs afterskyrl-train->skyrlmigration by @SumanthRH in #1269 - [train] Fix
There is no current event loop in thread 'MainThread'.by @SumanthRH in #1270 - [train] Add per-token hard masking for off-policy correction by @tyler-griggs in #1264
- [tx] replace calls to jax.process_index() to resolve rank ordering issue with multi-host TPUs by @andrewsykim in #1252
- [ci] remove incorrect key check in megatron ci test by @erictang000 in #1275
- [train] Make env vars file common by @SumanthRH in #1276
- [tinker] Propagate all
uv runargs from server startup command to backend workers by @SumanthRH in #1255 - [tx] Implement chunked delta rule for DeltaNet by @pcmoritz in #1272
New Contributors
- @tamoghnokandar made their first contribution in #1144
- @andrewsykim made their first contribution in #1252
Full Changelog: skyrl_train-v0.4.0...skyrl-v0.1.0