Skip to content

Conversation

@kyolebu
Copy link
Contributor

@kyolebu kyolebu commented Aug 4, 2025

Mainly updated autotuner configs.

New performance on fp32 benchmark
Screenshot 2025-08-04 at 5 32 38 PM

Old performance on fp32 benchmark
Screenshot 2025-08-04 at 5 36 30 PM

)
)
def benchmark_gemv(M, K, provider):
A = torch.randn((M, K), device=DEVICE, dtype=torch.float16)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably worth keeping the float16 tests as well. Instead of overwriting the test, clone it and have a benchmark for fp16 and fp32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants