docs(moonshotai): remove --compilation_config.pass_config.fuse_allreduce_rms from Kimi-K2.5 recipe#325
Conversation
…uce_rms from Kimi-K2.5 recipe This flag is no longer needed as of vLLM v0.17 — fuse_allreduce_rms is now enabled by default for MoE models on Hopper hardware. Removes the flag from all three command examples (Hopper Docker, Blackwell Docker, and vllm serve). Closes vllm-project#324 Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request removes the --compilation_config.pass_config.fuse_allreduce_rms flag from the Kimi-K2.5 deployment documentation for various environments. A review comment suggests that while this flag is default on Hopper hardware, it should be retained for Blackwell (aarch64) examples to ensure optimal performance unless its default status on that architecture is confirmed.
| vllm/vllm-openai:v0.17.0-aarch64-cu130 moonshotai/Kimi-K2.5 \ | ||
| --tensor-parallel-size 4 \ | ||
| --mm-encoder-tp-mode data \ | ||
| --compilation_config.pass_config.fuse_allreduce_rms true \ |
There was a problem hiding this comment.
The justification for removing this flag is that it's enabled by default on Hopper hardware. This change, however, is for a Blackwell (aarch64) example. Since Blackwell is a different architecture, this optimization might not be enabled by default. To ensure optimal performance on Blackwell, it might be better to retain this flag unless it's confirmed to be default on Blackwell as well.
Summary
--compilation_config.pass_config.fuse_allreduce_rms truefrom all three command examples in the Kimi-K2.5 recipe (Hopper Docker, Blackwell Docker, andvllm serve)Closes #324
Test plan