Add a triton kernel for swizziling #2168

drisspg · 2025-05-03T20:15:52Z

Stacked PRs:

LLama 70b feed forward w/ this kernel:
https://fburl.com/125yv8hh
~632 µs
without:
https://fburl.com/a21gwjmc
~ 801 µs

BF16 for reference:
https://fburl.com/2lgn9xkx

VLLM

Once I figure out why this isnt' always working, I am getting

============ Serving Benchmark Result ============
Successful requests:                     1024      
Benchmark duration (s):                  11.37     
Total input tokens:                      225502    
Total generated tokens:                  189380    
Request throughput (req/s):              90.09     
Output token throughput (tok/s):         16661.09  
Total Token throughput (tok/s):          36500.08  
---------------Time to First Token----------------
Mean TTFT (ms):                          965.17    
Median TTFT (ms):                        880.49    
P99 TTFT (ms):                           1492.64   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          28.41     
Median TPOT (ms):                        29.23     
P99 TPOT (ms):                           44.61     
---------------Inter-token Latency----------------
Mean ITL (ms):                           20.41     
Median ITL (ms):                         14.86     
P99 ITL (ms):                            49.08     
==================================================

Future

We are swizziling the scales for all the weights as well we can shave some micros seconds if we keep around a cached variant

stack-info: PR: #2168, branch: drisspg/stack/53

pytorch-bot · 2025-05-03T20:15:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2168

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f69bd4e with merge base 8369268 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

stack-info: PR: #2168, branch: drisspg/stack/53

drisspg · 2025-05-05T00:13:43Z

Well thats fun

No correctness issues w/ TRITON_INTERPRET=1 and sanitize is clean

stack-info: PR: #2168, branch: drisspg/stack/53

drisspg · 2025-05-05T17:16:56Z

All good now:

=========================================================================================== warnings summary ===========================================================================================
test/prototype/mx_formats/test_mx_linear.py::test_linear_compile[False-False-mxfp8_emulated-hp_dtype0]
  /home/drisspg/meta/pytorch/torch/_inductor/compile_fx.py:246: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================== 257 passed, 63 skipped, 1 warning in 116.63s (0:01:56) ========================================================================
❯

TLDR: dont use reorder=True w/ tl.reshape

stack-info: PR: #2168, branch: drisspg/stack/53

torchao/prototype/mx_formats/custom_cast.py

torchao/prototype/mx_formats/utils.py

vkuzo

lgtm!

drisspg · 2025-05-08T18:05:57Z

stack-info: PR: #2168, branch: drisspg/stack/53

drisspg added a commit that referenced this pull request May 3, 2025

Add a triton kernel for swizziling

a1b6365

stack-info: PR: #2168, branch: drisspg/stack/53

drisspg force-pushed the drisspg/stack/53 branch from 8738dd4 to a1b6365 Compare May 3, 2025 20:15

drisspg mentioned this pull request May 3, 2025

Add subclass based method for inference w/ MXFP8 #2132

Merged

facebook-github-bot added the CLA Signed label May 3, 2025

drisspg added performance topic: performance labels May 3, 2025

drisspg changed the base branch from drisspg/stack/50 to main May 3, 2025 20:30

drisspg added a commit that referenced this pull request May 3, 2025

Add a triton kernel for swizziling

1cd3c3c

stack-info: PR: #2168, branch: drisspg/stack/53

drisspg force-pushed the drisspg/stack/53 branch from a1b6365 to 1cd3c3c Compare May 3, 2025 20:30

drisspg changed the base branch from main to drisspg/stack/50 May 3, 2025 20:30

drisspg requested review from eellison and vkuzo and removed request for eellison May 3, 2025 20:32

drisspg changed the base branch from drisspg/stack/50 to main May 4, 2025 23:11

drisspg added a commit that referenced this pull request May 4, 2025

Add a triton kernel for swizziling

4d68911

stack-info: PR: #2168, branch: drisspg/stack/53

drisspg force-pushed the drisspg/stack/53 branch from 1cd3c3c to 4d68911 Compare May 4, 2025 23:11

drisspg changed the base branch from main to drisspg/stack/50 May 4, 2025 23:11

drisspg changed the base branch from drisspg/stack/50 to main May 5, 2025 00:12

drisspg added a commit that referenced this pull request May 5, 2025

Add a triton kernel for swizziling

5f5b3ef

stack-info: PR: #2168, branch: drisspg/stack/53

drisspg force-pushed the drisspg/stack/53 branch from 4d68911 to 5f5b3ef Compare May 5, 2025 00:12

drisspg changed the base branch from main to drisspg/stack/50 May 5, 2025 00:13

drisspg changed the base branch from drisspg/stack/50 to main May 5, 2025 02:09

drisspg added a commit that referenced this pull request May 5, 2025

Add a triton kernel for swizziling

4621f10

stack-info: PR: #2168, branch: drisspg/stack/53

drisspg force-pushed the drisspg/stack/53 branch from 5f5b3ef to 4621f10 Compare May 5, 2025 02:09

drisspg changed the base branch from main to drisspg/stack/50 May 5, 2025 02:09

drisspg changed the base branch from drisspg/stack/50 to main May 5, 2025 02:19

drisspg force-pushed the drisspg/stack/53 branch from 4621f10 to 606d13d Compare May 5, 2025 02:19

drisspg changed the base branch from main to drisspg/stack/50 May 5, 2025 17:09

drisspg changed the base branch from drisspg/stack/50 to main May 5, 2025 17:16

drisspg changed the base branch from main to drisspg/stack/50 May 5, 2025 17:16

atalman approved these changes May 6, 2025

View reviewed changes

drisspg changed the base branch from drisspg/stack/50 to main May 7, 2025 21:02

drisspg added a commit that referenced this pull request May 7, 2025

Add a triton kernel for swizziling

6b5014c

stack-info: PR: #2168, branch: drisspg/stack/53

drisspg force-pushed the drisspg/stack/53 branch from 9ea3a01 to 6b5014c Compare May 7, 2025 21:02

drisspg changed the base branch from main to drisspg/stack/50 May 7, 2025 21:02

drisspg changed the base branch from drisspg/stack/50 to main May 7, 2025 21:13

drisspg added a commit that referenced this pull request May 7, 2025

Add a triton kernel for swizziling

2ccf303

stack-info: PR: #2168, branch: drisspg/stack/53

drisspg force-pushed the drisspg/stack/53 branch from 6b5014c to 2ccf303 Compare May 7, 2025 21:13

drisspg changed the base branch from main to drisspg/stack/50 May 7, 2025 21:14

drisspg changed the base branch from drisspg/stack/50 to main May 7, 2025 23:08

drisspg added a commit that referenced this pull request May 7, 2025

Add a triton kernel for swizziling

a6c0773

stack-info: PR: #2168, branch: drisspg/stack/53

drisspg force-pushed the drisspg/stack/53 branch from 2ccf303 to a6c0773 Compare May 7, 2025 23:08

drisspg changed the base branch from main to drisspg/stack/50 May 7, 2025 23:08

vkuzo reviewed May 8, 2025

View reviewed changes

torchao/prototype/mx_formats/custom_cast.py Outdated Show resolved Hide resolved

vkuzo reviewed May 8, 2025

View reviewed changes

torchao/prototype/mx_formats/custom_cast.py Outdated Show resolved Hide resolved

vkuzo reviewed May 8, 2025

View reviewed changes

torchao/prototype/mx_formats/utils.py Outdated Show resolved Hide resolved

vkuzo approved these changes May 8, 2025

View reviewed changes

drisspg changed the base branch from drisspg/stack/50 to main May 8, 2025 17:52

drisspg force-pushed the drisspg/stack/53 branch 2 times, most recently from 05171d0 to 809a3db Compare May 8, 2025 18:05

drisspg mentioned this pull request May 8, 2025

Inductor Perf MX to_blocked pytorch/pytorch#153194

Open

Add a triton kernel for swizziling

f69bd4e

stack-info: PR: #2168, branch: drisspg/stack/53

drisspg force-pushed the drisspg/stack/53 branch from 809a3db to f69bd4e Compare May 8, 2025 20:30

drisspg merged commit 81e48a3 into main May 9, 2025
18 checks passed

drisspg mentioned this pull request May 21, 2025

MXFP Inference Tracking Doc #2229

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a triton kernel for swizziling #2168

Add a triton kernel for swizziling #2168

drisspg commented May 3, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 3, 2025 •

edited

Loading

Uh oh!

drisspg commented May 5, 2025 •

edited

Loading

Uh oh!

drisspg commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vkuzo left a comment

Uh oh!

drisspg commented May 8, 2025

Uh oh!

Uh oh!

Add a triton kernel for swizziling #2168

Add a triton kernel for swizziling #2168

Conversation

drisspg commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

VLLM

Future

Uh oh!

pytorch-bot bot commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2168

✅ No Failures

Uh oh!

drisspg commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drisspg commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vkuzo left a comment

Choose a reason for hiding this comment

Uh oh!

drisspg commented May 8, 2025

Uh oh!

Uh oh!

drisspg commented May 3, 2025 •

edited

Loading

pytorch-bot bot commented May 3, 2025 •

edited

Loading

drisspg commented May 5, 2025 •

edited

Loading