-
Notifications
You must be signed in to change notification settings - Fork 218
Description
Problem Description
Three issues are mentioned in a PR for SageAttention (thu-ml/SageAttention#332):
- Coop API changes in rocWMMA 2+
- This is well documented, and rwfsmith posted a link to the migration docs. No action required.
- Windows support for rocWMMA
- Support for this is now available through TheRock builds, and jammm posted a link to the relevant PR. No action required.
- ~25% performance regression for this workload between rocWMMA 1.7 and 2.0. This may be worth investigating.
According to the comment here: thu-ml/SageAttention#332 (comment) a ~25% performance regression is observed when moving from rocWMMA 1.7 to 2.0 for SageAttention. We should try to understand where this regression is coming from and what can be done to mitigate it. I am not aware of any significant known performance regressions from 1.7 to 2.0 based on our benchmarking, but I am not aware of any benchmarking efforts against the 9070 device, which is where this performance regression is observed.
Since there are some code and API changes between the two pieces of code being run, it is unclear whether this regression is truly originating from rocWMMA, or whether there is something else going on. It is possible a visual comparison of the two implementations may provide insights, otherwise we will need a self-contained reproducer that unambiguously shows this is a true performance regression in rocWMMA to investigate further.
Operating System
Linux
CPU
n/a
GPU
9070
ROCm Version
2.0
ROCm Component
rocWMMA
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response