Release v0.4.0 · AMD-AGI/Primus

What's Changed

fix(config): correct flavor to 405B in torchtitan/llama3.1_405B.yaml by @Xiaoming-AMD in #189
perf(torchtitan/config): enable compile for Llama-3.1 (8B/70B/405B) by @Xiaoming-AMD in #193
disable dump_pp_data when pp size is one by @lhzhang333 in #191
remove turbo token by @wenxie-amd in #197
feat(async-tp) change gemm_rs_overlap api for multi-stream method by @llying-001 in #171
Support for torchtitan with Primus-Turbo by @clairesonglee in #188
chore: update default rocm/megatron-lm image to v25.8_py310 by @Xiaoming-AMD in #198
perf(aiter): add AITER_JIT_DIR env for cached build to speed up re-compilation by @Xiaoming-AMD in #199
feat: align primus-turbo fp8 linear's args to megatron by @RuibinCheung in #195
Add wandb_enable config and Torchtitan unit tests by @zitree in #194
fix: wrapper turbo quant config in megatron extension by @RuibinCheung in #202
feat(cli): add Python-based primus entrypoint for PATH installation by @Xiaoming-AMD in #200
feat(zero-bubble): support zero bubble pipeline parallism by @ChengYao-amd in #208
Primus product matrix by @wenxie-amd in #210
fix: remove MXQuantConfig from titan and add warning msg by @RuibinCheung in #212
fix 8B perf regression (v25.9) by @wenxie-amd in #215
feat(zero-bubble): support GroupGemm wgrad split, add debug_scheduler_table flag by @ChengYao-amd in #213
add support for grok1 by @JohnQinAMD in #216
improve torch profiling by @wenxie-amd in #218
supports: userId for request by @weilei0120 in #214
support mlflow tracking by @wenxie-amd in #219
feat: Update Megatron-LM to 8477817(20251011) by @Xiaoming-AMD in #221
test(megatron): add Qwen2.5-7B and Qwen2.5-72B pretrain cases by @Xiaoming-AMD in #222
feat(CLI): add unified shell entry scripts for Slurm, container, and direct modes by @Xiaoming-AMD in #209
Add tensor size print for comm op benchmark by @lorri-rao in #223
fix(megatron): fix bugs for fitting the newest megatron by @ChengYao-amd in #224
Docker Release v25.9 by @wenxie-amd in #217
Add grok2 model support by @wenxie-amd in #227
Use PRIMUS_xxx env, export all envs for slurm by @wenxie-amd in #229
feat(deepep): add PrimusTurboDeepEPTokenDispatcher and support syncfree moe stage 0-2 by @zhenhuang12 in #220
upgrade(torchtitan): sync torchtitan to 5fb7cc2e3bbb9b9dc0ab7af34ed5cc58b5f32021 (2025-10-16) by @Xiaoming-AMD in #228
chore(docker): update default image to rocm/primus:v25.9_gfx942 by @Xiaoming-AMD in #230
fix(tests): add missing expecttest dependency for distributed tests by @Xiaoming-AMD in #233
fix(config): use 1.0e-2 for moe_aux_loss_coeff to ensure correct float parsing by @Xiaoming-AMD in #234

New Contributors

@clairesonglee made their first contribution in #188
@zitree made their first contribution in #194
@lorri-rao made their first contribution in #223

Full Changelog: v0.2.0...v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.4.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!