Skip to content

Add mxfp8_blackwell_attentions benchmark to TritonBench#1012

Open
njriasan wants to merge 2 commits intomainfrom
export-D100634593
Open

Add mxfp8_blackwell_attentions benchmark to TritonBench#1012
njriasan wants to merge 2 commits intomainfrom
export-D100634593

Conversation

@njriasan
Copy link
Copy Markdown
Contributor

Summary: Add a new OSS TritonBench operator for benchmarking the TLX MXFP8 Flash Attention kernel on Blackwell GPUs. The operator imports the kernel directly from the TLX tutorials (blackwell_fa_ws_pipelined_persistent_mxfp8.py) and generates quantized MXFP8 inputs with proper scale tensors.

Differential Revision: D100634593

Summary:

The non-persistent fwd-only TLX benchmark is redundant now that the persistent variant (tlx_blackwell_ws_pipelined_persistent) supports both fwd and bwd. Remove the benchmark method, metadata entries, and documentation references.

Differential Revision: D100629607
Summary: Add a new OSS TritonBench operator for benchmarking the TLX MXFP8 Flash Attention kernel on Blackwell GPUs. The operator imports the kernel directly from the TLX tutorials (blackwell_fa_ws_pipelined_persistent_mxfp8.py) and generates quantized MXFP8 inputs with proper scale tensors.

Differential Revision: D100634593
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Apr 13, 2026

@njriasan has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100634593.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant