You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Feature]Add model-side integration for fused operator DispatchGmmCombineDecode.
This commit adds model-side integration for the previously introduced experimental AscendC fused operator DispatchGmmCombineDecode, used in MoE decoding.
The operator implementation itself was added in a prior PR #4139.
This change only adapts the model execution path to optionally use the fused operator.
When the environment variable VLLM_ASCEND_ENABLE_FUSED_MC2=1 is set, the original MC2 path composed of multiple operators (A8W8 dispatch → GMM → SwiGLU → GMM → combine) is replaced by the single fused operator DispatchGmmCombineDecode.
By default, the existing multi-operator MC2 implementation is preserved.
Signed-off-by: wangqiankun <[email protected]>
0 commit comments