Skip to content

Reland Cute-DSL FP4 dense GEMM#23590

Open
b8zhong wants to merge 2 commits intomainfrom
brayden/cute-dsl-fp4
Open

Reland Cute-DSL FP4 dense GEMM#23590
b8zhong wants to merge 2 commits intomainfrom
brayden/cute-dsl-fp4

Conversation

@b8zhong
Copy link
Copy Markdown
Collaborator

@b8zhong b8zhong commented Apr 23, 2026

Motivation

Reland #18801, which was reverted earlier due to cutlass-dsl compilation failure. Also it internally supports SwapAB.

Speed Tests and Profiling

Below the speedup on B300, which is still sped up compared to the CUTLASS and cuDNN backends on B300. B200 numbers have greater relative speedup.

image Further perf numbers in flashinfer-ai/flashinfer/pull/2540, and accuracy validated in https://github.com//pull/18801

@mintlify
Copy link
Copy Markdown
Contributor

mintlify Bot commented Apr 23, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
lmsysorg 🟢 Ready View Preview Apr 23, 2026, 11:34 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@b8zhong b8zhong force-pushed the brayden/cute-dsl-fp4 branch from 12793cf to ae875ee Compare April 23, 2026 23:33
@b8zhong
Copy link
Copy Markdown
Collaborator Author

b8zhong commented Apr 23, 2026

/tag-and-rerun-ci again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant