Skip to content

Add a triton kernel for swizziling #2168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 9, 2025
Merged

Add a triton kernel for swizziling #2168

merged 1 commit into from
May 9, 2025

Conversation

drisspg
Copy link
Contributor

@drisspg drisspg commented May 3, 2025

Stacked PRs:


LLama 70b feed forward w/ this kernel:
https://fburl.com/125yv8hh
~632 µs
without:
https://fburl.com/a21gwjmc
~ 801 µs

BF16 for reference:
https://fburl.com/2lgn9xkx

VLLM

Once I figure out why this isnt' always working, I am getting

============ Serving Benchmark Result ============
Successful requests:                     1024      
Benchmark duration (s):                  11.37     
Total input tokens:                      225502    
Total generated tokens:                  189380    
Request throughput (req/s):              90.09     
Output token throughput (tok/s):         16661.09  
Total Token throughput (tok/s):          36500.08  
---------------Time to First Token----------------
Mean TTFT (ms):                          965.17    
Median TTFT (ms):                        880.49    
P99 TTFT (ms):                           1492.64   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          28.41     
Median TPOT (ms):                        29.23     
P99 TPOT (ms):                           44.61     
---------------Inter-token Latency----------------
Mean ITL (ms):                           20.41     
Median ITL (ms):                         14.86     
P99 ITL (ms):                            49.08     
==================================================

Future

We are swizziling the scales for all the weights as well we can shave some micros seconds if we keep around a cached variant

drisspg added a commit that referenced this pull request May 3, 2025
stack-info: PR: #2168, branch: drisspg/stack/53
Copy link

pytorch-bot bot commented May 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2168

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f69bd4e with merge base 8369268 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@drisspg drisspg force-pushed the drisspg/stack/53 branch from 8738dd4 to a1b6365 Compare May 3, 2025 20:15
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 3, 2025
@drisspg drisspg added performance topic: performance Use this tag if this PR improves the performance of a feature labels May 3, 2025
@drisspg drisspg changed the base branch from drisspg/stack/50 to main May 3, 2025 20:30
drisspg added a commit that referenced this pull request May 3, 2025
stack-info: PR: #2168, branch: drisspg/stack/53
@drisspg drisspg force-pushed the drisspg/stack/53 branch from a1b6365 to 1cd3c3c Compare May 3, 2025 20:30
@drisspg drisspg changed the base branch from main to drisspg/stack/50 May 3, 2025 20:30
@drisspg drisspg requested review from eellison and vkuzo and removed request for eellison May 3, 2025 20:32
@drisspg drisspg changed the base branch from drisspg/stack/50 to main May 4, 2025 23:11
drisspg added a commit that referenced this pull request May 4, 2025
stack-info: PR: #2168, branch: drisspg/stack/53
@drisspg drisspg force-pushed the drisspg/stack/53 branch from 1cd3c3c to 4d68911 Compare May 4, 2025 23:11
@drisspg drisspg changed the base branch from main to drisspg/stack/50 May 4, 2025 23:11
@drisspg drisspg changed the base branch from drisspg/stack/50 to main May 5, 2025 00:12
drisspg added a commit that referenced this pull request May 5, 2025
stack-info: PR: #2168, branch: drisspg/stack/53
@drisspg drisspg force-pushed the drisspg/stack/53 branch from 4d68911 to 5f5b3ef Compare May 5, 2025 00:12
@drisspg drisspg changed the base branch from main to drisspg/stack/50 May 5, 2025 00:13
@drisspg
Copy link
Contributor Author

drisspg commented May 5, 2025

Screenshot 2025-05-04 at 5 13 35 PM Well thats fun

No correctness issues w/ TRITON_INTERPRET=1 and sanitize is clean

@drisspg drisspg changed the base branch from drisspg/stack/50 to main May 5, 2025 02:09
drisspg added a commit that referenced this pull request May 5, 2025
stack-info: PR: #2168, branch: drisspg/stack/53
@drisspg drisspg force-pushed the drisspg/stack/53 branch from 5f5b3ef to 4621f10 Compare May 5, 2025 02:09
@drisspg drisspg changed the base branch from main to drisspg/stack/50 May 5, 2025 02:09
@drisspg drisspg changed the base branch from drisspg/stack/50 to main May 5, 2025 02:19
@drisspg drisspg force-pushed the drisspg/stack/53 branch from 4621f10 to 606d13d Compare May 5, 2025 02:19
@drisspg drisspg changed the base branch from main to drisspg/stack/50 May 5, 2025 17:09
@drisspg drisspg changed the base branch from drisspg/stack/50 to main May 5, 2025 17:16
@drisspg drisspg changed the base branch from main to drisspg/stack/50 May 5, 2025 17:16
@drisspg
Copy link
Contributor Author

drisspg commented May 5, 2025

All good now:

=========================================================================================== warnings summary ===========================================================================================
test/prototype/mx_formats/test_mx_linear.py::test_linear_compile[False-False-mxfp8_emulated-hp_dtype0]
  /home/drisspg/meta/pytorch/torch/_inductor/compile_fx.py:246: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================== 257 passed, 63 skipped, 1 warning in 116.63s (0:01:56) ========================================================================
❯ 

TLDR: dont use reorder=True w/ tl.reshape

@drisspg drisspg changed the base branch from drisspg/stack/50 to main May 7, 2025 21:02
drisspg added a commit that referenced this pull request May 7, 2025
stack-info: PR: #2168, branch: drisspg/stack/53
@drisspg drisspg force-pushed the drisspg/stack/53 branch from 9ea3a01 to 6b5014c Compare May 7, 2025 21:02
@drisspg drisspg changed the base branch from main to drisspg/stack/50 May 7, 2025 21:02
@drisspg drisspg changed the base branch from drisspg/stack/50 to main May 7, 2025 21:13
drisspg added a commit that referenced this pull request May 7, 2025
stack-info: PR: #2168, branch: drisspg/stack/53
@drisspg drisspg force-pushed the drisspg/stack/53 branch from 6b5014c to 2ccf303 Compare May 7, 2025 21:13
@drisspg drisspg changed the base branch from main to drisspg/stack/50 May 7, 2025 21:14
@drisspg drisspg changed the base branch from drisspg/stack/50 to main May 7, 2025 23:08
drisspg added a commit that referenced this pull request May 7, 2025
stack-info: PR: #2168, branch: drisspg/stack/53
@drisspg drisspg force-pushed the drisspg/stack/53 branch from 2ccf303 to a6c0773 Compare May 7, 2025 23:08
@drisspg drisspg changed the base branch from main to drisspg/stack/50 May 7, 2025 23:08
Copy link
Contributor

@vkuzo vkuzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@drisspg drisspg changed the base branch from drisspg/stack/50 to main May 8, 2025 17:52
@drisspg drisspg force-pushed the drisspg/stack/53 branch 2 times, most recently from 05171d0 to 809a3db Compare May 8, 2025 18:05
@drisspg
Copy link
Contributor Author

drisspg commented May 8, 2025

Screenshot 2025-05-08 at 11 05 47 AM

stack-info: PR: #2168, branch: drisspg/stack/53
@drisspg drisspg force-pushed the drisspg/stack/53 branch from 809a3db to f69bd4e Compare May 8, 2025 20:30
@drisspg drisspg merged commit 81e48a3 into main May 9, 2025
18 checks passed
@drisspg drisspg mentioned this pull request May 21, 2025
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. performance topic: performance Use this tag if this PR improves the performance of a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants