Potentially superfluous check that disables non gated activations in the cutlass fused moe API

https://github.com/flashinfer-ai/flashinfer/blob/2bb3e9e67aa5f7b05b7f867cac780bba74bf3920/csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_binding.cu#L893-L904

This check is triggered for both non-gated and gated-activations. This should only be triggered if we are using gated activations. 

	TVM_FFI_ICHECK(
	fc1_weight_block.size(0) == num_experts_on_rank &&
	fc1_weight_block.size(1) ==
	TmaWarpSpecializedGroupedGemmInput::alignToSfDim(
	inter_size, TmaWarpSpecializedGroupedGemmInput::MinNDimAlignmentMXFPX) *
	2 &&
	fc1_weight_block.size(2) * FP8_PER_INT32 *
	TmaWarpSpecializedGroupedGemmInput::MXFPXBlockScaleVectorSize ==
	TmaWarpSpecializedGroupedGemmInput::alignToSfDim(
	hidden_size, TmaWarpSpecializedGroupedGemmInput::MinKDimAlignmentMXFPX))
	<< "fc1 weight block size must be (num_experts_on_rank, inter_size * 2, hidden_size // 4 "
	"// block_scale_vector_size)";

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potentially superfluous check that disables non gated activations in the cutlass fused moe API #2731

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potentially superfluous check that disables non gated activations in the cutlass fused moe API #2731

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions