Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,6 @@ def _make_log_backend(backend: UnquantizedMoeBackend):
flashinfer_cutlass_available = (
has_flashinfer_cutlass_fused_moe()
and use_ep
and (not use_dp)
and current_platform.has_device_capability(90)
)
Comment on lines 97 to 101
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change enables the FlashInfer CUTLASS MoE kernel for configurations with Data Parallelism (use_dp=True). However, the corresponding tests in tests/kernels/moe/test_unquantized_backend_selection.py have not been updated to reflect this. The existing test test_select_cuda_flashinfer_cutlass_backend explicitly sets use_dp=False and includes a comment stating that CUTLASS does not support DP. To ensure the correctness of this feature and prevent future regressions, please add a new test case that validates the behavior when use_dp=True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added test

flashinfer_trtllm_moe_enabled = (
Expand Down Expand Up @@ -161,18 +160,13 @@ def _make_log_backend(backend: UnquantizedMoeBackend):
"to enable it for better performance.",
scope="local",
)
elif use_ep and (not use_dp):
elif use_ep:
logger.info_once(
"FlashInfer MoE is available for EP"
" but not enabled, consider setting"
" VLLM_USE_FLASHINFER_MOE_FP16=1 to enable it.",
scope="local",
)
elif use_dp:
logger.info_once(
"FlashInfer CUTLASS MoE is currently not available for DP.",
scope="local",
)
backend = UnquantizedMoeBackend.TRITON
if current_platform.is_xpu():
backend = UnquantizedMoeBackend.XPU
Expand Down
Loading