Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding flex_attention benchmark for eager and compile mode #174

Closed
wants to merge 1 commit into from

Conversation

mandroid6
Copy link
Contributor

Summary:
Since we are actively adding TritonKernel optimizations to flex_attention through inductor, its useful to track the perf improvements through tritonbench.

NOTE: Intial version uses fixed sizes for batch_size, seq_len, head_dim and num_heads with followups to make them configurable

Differential Revision: D71137239

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D71137239

Summary:

Since we are actively adding `TritonKernel` optimizations to flex_attention through inductor, its useful to track the perf improvements through `tritonbench`.

NOTE: Intial version uses fixed sizes for `batch_size, seq_len, head_dim and num_heads` with followups to make them configurable

Differential Revision: D71137239
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D71137239

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 73c9b75.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants