[AMD] Add Minimax tp8 with ep and update vllm image for MI355x#927
[AMD] Add Minimax tp8 with ep and update vllm image for MI355x#927chunfangamd merged 9 commits intomainfrom
Conversation
Made-with: Cursor # Conflicts: # benchmarks/single_node/minimaxm2.5_fp8_mi355x.sh # perf-changelog.yaml
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
|
/sweep test-config --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml --config-keys minimaxm2.5-fp8-mi355x-vllm |
|
@benenzhu Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23432526160 |
There was a problem hiding this comment.
LGTM — straightforward config and script extension; the only issue is the nit flagged inline (wrong PR link in changelog).
Extended reasoning...
Overview
This PR touches three files: the AMD benchmark config (amd-master.yaml), the MiniMax MI355X benchmark shell script, and perf-changelog.yaml. The config adds { tp: 8, ep: 8, conc-start: 32, conc-end: 256 } to all three seq-len configs for minimaxm2.5-fp8-mi355x-vllm and bumps the image from v0.15.1 to v0.18.0. The shell script adds EP_SIZE as a required env var, conditionally passes --enable-expert-parallel to vllm serve, and removes the previously-present --disable-log-requests flag (consistent with a recent standalone commit c35e700 that removed it). The changelog gets one new entry.
Security Risks
No security-sensitive code paths are touched. This is purely benchmark configuration and a shell launch script for an inference server. No auth, crypto, or permission logic is involved.
Level of Scrutiny
Low scrutiny is warranted. All three changes are mechanical and follow well-established patterns already present in the repo (the EP conditional mirrors the same pattern used in other MiniMax scripts). The image bump and new TP=8/EP=8 search-space entries are additive and consistent with peer configs (e.g., dsr1-fp4-mi355x-atom uses identical ep-based entries).
Other Factors
The only bug found is a nit: the new perf-changelog.yaml entry references /pull/868 instead of /pull/927. This is a documentation/traceability issue with no functional impact. The functional changes are correct and complete. No outstanding human reviewer comments exist in the timeline.
|
/sweep test-config --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml --config-keys minimaxm2.5-fp8-mi355x-vllm --evals-only |
|
@cquil11 Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23438981061 |
|
Once accuracy check looks good then we can merge. |
|
vllm-project/recipes#300 The vllm recipes PR for this one. |
chunfangamd
left a comment
There was a problem hiding this comment.
1K1K: +21.3%
8K1K: +14.3%
Add tp8 with ep for conc 32 - 256 for Minimax in mi355x.