v0.3.0

Latest

Latest

wenxie-amd released this 15 Oct 00:47

· 19 commits to release/v25.9 since this release

e16b27b

What's Changed

fix(config): correct flavor to 405B in torchtitan/llama3.1_405B.yaml by @Xiaoming-AMD in #189
perf(torchtitan/config): enable compile for Llama-3.1 (8B/70B/405B) by @Xiaoming-AMD in #193
disable dump_pp_data when pp size is one by @lhzhang333 in #191
remove turbo token by @wenxie-amd in #197
feat(async-tp) change gemm_rs_overlap api for multi-stream method by @llying-001 in #171
Support for torchtitan with Primus-Turbo by @clairesonglee in #188
chore: update default rocm/megatron-lm image to v25.8_py310 by @Xiaoming-AMD in #198
perf(aiter): add AITER_JIT_DIR env for cached build to speed up re-compilation by @Xiaoming-AMD in #199
feat: align primus-turbo fp8 linear's args to megatron by @RuibinCheung in #195
Add wandb_enable config and Torchtitan unit tests by @zitree in #194
fix: wrapper turbo quant config in megatron extension by @RuibinCheung in #202
feat(cli): add Python-based primus entrypoint for PATH installation by @Xiaoming-AMD in #200
feat(zero-bubble): support zero bubble pipeline parallism by @ChengYao-amd in #208
Primus product matrix by @wenxie-amd in #210
fix: remove MXQuantConfig from titan and add warning msg by @RuibinCheung in #212
fix 8B perf regression (v25.9) by @wenxie-amd in #215
feat(zero-bubble): support GroupGemm wgrad split, add debug_scheduler_table flag by @ChengYao-amd in #213

New Contributors

@clairesonglee made their first contribution in #188
@zitree made their first contribution in #194

Full Changelog: v0.2.0...v0.3.0

Contributors

clairesonglee, wenxie-amd, and 6 other contributors

Assets 2