diff --git a/MiniMax/MiniMax-M2.5.md b/MiniMax/MiniMax-M2.5.md index 17edc5d8..bc83b26d 100644 --- a/MiniMax/MiniMax-M2.5.md +++ b/MiniMax/MiniMax-M2.5.md @@ -47,7 +47,50 @@ docker run --gpus all \ --trust-remote-code ``` +### AMD MI355X (FP8) +Running on AMD MI355X GPUs using the vLLM ROCm image. The `VLLM_ROCM_USE_AITER=1` environment variable enables the AITER backend for optimized performance. + +**TP=2 or TP=4** + +```bash +docker run --gpus all \ + -p 8000:8000 \ + --ipc=host \ + -e VLLM_ROCM_USE_AITER=1 \ + -e VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT4 \ + -v ~/.cache/huggingface:/root/.cache/huggingface \ + vllm/vllm-openai-rocm:latest MiniMaxAI/MiniMax-M2.5 \ + --tensor-parallel-size 4 \ + --block-size 32 \ + --tensor-parallel-size 4 \ + --block-size 32 \ + --tool-call-parser minimax_m2 \ + --reasoning-parser minimax_m2_append_think \ + --enable-auto-tool-choice \ + --trust-remote-code +``` + +**TP=8 with EP=8 (expert parallelism for higher throughput):** + +```bash +docker run --gpus all \ + -p 8000:8000 \ + --ipc=host \ + -e VLLM_ROCM_USE_AITER=1 \ + -e VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT4 \ + -v ~/.cache/huggingface:/root/.cache/huggingface \ + vllm/vllm-openai-rocm:latest MiniMaxAI/MiniMax-M2.5 \ + --tensor-parallel-size 8 \ + --enable-expert-parallel \ + --tensor-parallel-size 8 \ + --enable-expert-parallel \ + --block-size 32 \ + --tool-call-parser minimax_m2 \ + --reasoning-parser minimax_m2_append_think \ + --enable-auto-tool-choice \ + --trust-remote-code +``` ## Benchmarking