[AMD] Optimize search space and upgrade Image to 0.19.0 for MiniMax-M2.5#1003
[AMD] Optimize search space and upgrade Image to 0.19.0 for MiniMax-M2.5#1003
Conversation
Fewer GPUs means less inter-GPU communication overhead, and MoE expert parallelism across 2 GPUs is very efficient for this model.
Enable FP8 KV cache + AITER FA for minimaxm2.5-fp8-mi355x-vllm
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
functionstackx
left a comment
There was a problem hiding this comment.
lgtm once validation passes
|
vllm recipes pr updated: vllm-project/recipes#300 |
e2e Test run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23987768210
co-author: @benenzhu