Skip to content

Fix LongCat MLP tensor parallelism#29515

Open
ixxiii wants to merge 4 commits into
sgl-project:mainfrom
ixxiii:fix_longcat_mlps_tp
Open

Fix LongCat MLP tensor parallelism#29515
ixxiii wants to merge 4 commits into
sgl-project:mainfrom
ixxiii:fix_longcat_mlps_tp

Conversation

@ixxiii

@ixxiii ixxiii commented Jun 27, 2026

Copy link
Copy Markdown

Motivation

LongCat-Flash 2P4D dense did not support running with TP size = 1 because the MLP tensor-parallel path assumed TP sharding.
This PR fixes the dense MLP path so it can run correctly without tensor parallelism.

Tests

Successfully ran LongCat-Flash 2P4D with moe_dense_tp_size = 1.


CI States

Latest PR Test (Base): ❌ Run #28291065388
Latest PR Test (Extra): ❌ Run #28291065332

ixxiii and others added 4 commits June 27, 2026 21:38
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates support for fully data-parallel (DP) dense MoE execution in the LongcatFlash model. It updates LongcatFlashMLP to accept tp_rank and tp_size parameters, and configures them based on the status of enable_moe_dense_fully_dp(). When fully DP is enabled, tensor model parallel all-reduce operations are bypassed during the MLP forward pass. There are no review comments, so I have no additional feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@ixxiii

ixxiii commented Jun 27, 2026

Copy link
Copy Markdown
Author

Hi @Fridge003 @ishandhanani @Qiaolin-Yu, this PR fixes LongCat-Flash dense MLP tensor parallelism for TP size = 1. Could you please take a look or help route it to the right reviewer when available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant