v0.4.0
Pre-release
Pre-release
What's Changed
- fix(config): correct flavor to 405B in torchtitan/llama3.1_405B.yaml by @Xiaoming-AMD in #189
- perf(torchtitan/config): enable compile for Llama-3.1 (8B/70B/405B) by @Xiaoming-AMD in #193
- disable dump_pp_data when pp size is one by @lhzhang333 in #191
- remove turbo token by @wenxie-amd in #197
- feat(async-tp) change gemm_rs_overlap api for multi-stream method by @llying-001 in #171
- Support for torchtitan with Primus-Turbo by @clairesonglee in #188
- chore: update default rocm/megatron-lm image to v25.8_py310 by @Xiaoming-AMD in #198
- perf(aiter): add AITER_JIT_DIR env for cached build to speed up re-compilation by @Xiaoming-AMD in #199
- feat: align primus-turbo fp8 linear's args to megatron by @RuibinCheung in #195
- Add wandb_enable config and Torchtitan unit tests by @zitree in #194
- fix: wrapper turbo quant config in megatron extension by @RuibinCheung in #202
- feat(cli): add Python-based
primusentrypoint for PATH installation by @Xiaoming-AMD in #200 - feat(zero-bubble): support zero bubble pipeline parallism by @ChengYao-amd in #208
- Primus product matrix by @wenxie-amd in #210
- fix: remove MXQuantConfig from titan and add warning msg by @RuibinCheung in #212
- fix 8B perf regression (v25.9) by @wenxie-amd in #215
- feat(zero-bubble): support GroupGemm wgrad split, add debug_scheduler_table flag by @ChengYao-amd in #213
- add support for grok1 by @JohnQinAMD in #216
- improve torch profiling by @wenxie-amd in #218
- supports: userId for request by @weilei0120 in #214
- support mlflow tracking by @wenxie-amd in #219
- feat: Update Megatron-LM to 8477817(20251011) by @Xiaoming-AMD in #221
- test(megatron): add Qwen2.5-7B and Qwen2.5-72B pretrain cases by @Xiaoming-AMD in #222
- feat(CLI): add unified shell entry scripts for Slurm, container, and direct modes by @Xiaoming-AMD in #209
- Add tensor size print for comm op benchmark by @lorri-rao in #223
- fix(megatron): fix bugs for fitting the newest megatron by @ChengYao-amd in #224
- Docker Release v25.9 by @wenxie-amd in #217
- Add grok2 model support by @wenxie-amd in #227
- Use PRIMUS_xxx env, export all envs for slurm by @wenxie-amd in #229
- feat(deepep): add PrimusTurboDeepEPTokenDispatcher and support syncfree moe stage 0-2 by @zhenhuang12 in #220
- upgrade(torchtitan): sync torchtitan to 5fb7cc2e3bbb9b9dc0ab7af34ed5cc58b5f32021 (2025-10-16) by @Xiaoming-AMD in #228
- chore(docker): update default image to rocm/primus:v25.9_gfx942 by @Xiaoming-AMD in #230
- fix(tests): add missing expecttest dependency for distributed tests by @Xiaoming-AMD in #233
- fix(config): use 1.0e-2 for moe_aux_loss_coeff to ensure correct float parsing by @Xiaoming-AMD in #234
New Contributors
- @clairesonglee made their first contribution in #188
- @zitree made their first contribution in #194
- @lorri-rao made their first contribution in #223
Full Changelog: v0.2.0...v0.4.0