[MIRROR] Feature Request: Support swizzled_input_sf for cutlass fused moe.

Mirror of https://github.com/flashinfer-ai/flashinfer/issues/2200

https://github.com/NVIDIA/TensorRT-LLM/pull/6231 added swizzled_input_sf parameter to cutlass fused moe to specify whether the input scaling factor is swizzled or not. It would be great if this could be integrated into flashinfer.

Currently in sglang, when doing FP4 allgather or FP4 alltoall (quantize before comm), we have to swizzle after the communication so it is not fused with anything. With this change, the swizzle would be fused into moe.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MIRROR] Feature Request: Support swizzled_input_sf for cutlass fused moe. #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[MIRROR] Feature Request: Support swizzled_input_sf for cutlass fused moe. #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions