·
19 commits
to release/v25.9
since this release
What's Changed
- fix(config): correct flavor to 405B in torchtitan/llama3.1_405B.yaml by @Xiaoming-AMD in #189
- perf(torchtitan/config): enable compile for Llama-3.1 (8B/70B/405B) by @Xiaoming-AMD in #193
- disable dump_pp_data when pp size is one by @lhzhang333 in #191
- remove turbo token by @wenxie-amd in #197
- feat(async-tp) change gemm_rs_overlap api for multi-stream method by @llying-001 in #171
- Support for torchtitan with Primus-Turbo by @clairesonglee in #188
- chore: update default rocm/megatron-lm image to v25.8_py310 by @Xiaoming-AMD in #198
- perf(aiter): add AITER_JIT_DIR env for cached build to speed up re-compilation by @Xiaoming-AMD in #199
- feat: align primus-turbo fp8 linear's args to megatron by @RuibinCheung in #195
- Add wandb_enable config and Torchtitan unit tests by @zitree in #194
- fix: wrapper turbo quant config in megatron extension by @RuibinCheung in #202
- feat(cli): add Python-based
primusentrypoint for PATH installation by @Xiaoming-AMD in #200 - feat(zero-bubble): support zero bubble pipeline parallism by @ChengYao-amd in #208
- Primus product matrix by @wenxie-amd in #210
- fix: remove MXQuantConfig from titan and add warning msg by @RuibinCheung in #212
- fix 8B perf regression (v25.9) by @wenxie-amd in #215
- feat(zero-bubble): support GroupGemm wgrad split, add debug_scheduler_table flag by @ChengYao-amd in #213
New Contributors
- @clairesonglee made their first contribution in #188
- @zitree made their first contribution in #194
Full Changelog: v0.2.0...v0.3.0