Skip to content

docs(moonshotai): remove --compilation_config.pass_config.fuse_allreduce_rms from Kimi-K2.5 recipe#325

Open
faradawn wants to merge 1 commit intovllm-project:mainfrom
faradawn:fix/kimi-k2.5-remove-fuse-allreduce-flag
Open

docs(moonshotai): remove --compilation_config.pass_config.fuse_allreduce_rms from Kimi-K2.5 recipe#325
faradawn wants to merge 1 commit intovllm-project:mainfrom
faradawn:fix/kimi-k2.5-remove-fuse-allreduce-flag

Conversation

@faradawn
Copy link
Copy Markdown
Collaborator

@faradawn faradawn commented Apr 9, 2026

Summary

  • Removes --compilation_config.pass_config.fuse_allreduce_rms true from all three command examples in the Kimi-K2.5 recipe (Hopper Docker, Blackwell Docker, and vllm serve)
  • This flag is no longer needed as of vLLM v0.17 — it is now enabled by default for MoE models on Hopper hardware (confirmed by Hanjie Qiu and Wei Zhao)

Closes #324

Test plan

  • Verify commands run correctly on vLLM v0.17+ without the flag

…uce_rms from Kimi-K2.5 recipe

This flag is no longer needed as of vLLM v0.17 — fuse_allreduce_rms is
now enabled by default for MoE models on Hopper hardware. Removes the
flag from all three command examples (Hopper Docker, Blackwell Docker,
and vllm serve).

Closes vllm-project#324

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the --compilation_config.pass_config.fuse_allreduce_rms flag from the Kimi-K2.5 deployment documentation for various environments. A review comment suggests that while this flag is default on Hopper hardware, it should be retained for Blackwell (aarch64) examples to ensure optimal performance unless its default status on that architecture is confirmed.

vllm/vllm-openai:v0.17.0-aarch64-cu130 moonshotai/Kimi-K2.5 \
--tensor-parallel-size 4 \
--mm-encoder-tp-mode data \
--compilation_config.pass_config.fuse_allreduce_rms true \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The justification for removing this flag is that it's enabled by default on Hopper hardware. This change, however, is for a Blackwell (aarch64) example. Since Blackwell is a different architecture, this optimization might not be enabled by default. To ensure optimal performance on Blackwell, it might be better to retain this flag unless it's confirmed to be default on Blackwell as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve out-of-the-box recipe for Kimi-K2.5

1 participant