Commit c287ebb
[AMD] Optimize address increments for buffer loads in loops (#8464)
This PR transfers address computation from offsets in buffer loads to
base pointers, which reuses amount of required computations and lowers
register pressure.
---------
Co-authored-by: Alexander Efimov <efimov.alexander@gmail.com>1 parent 3254b53 commit c287ebb
9 files changed
Lines changed: 1256 additions & 0 deletions
File tree
- bin
- python/src
- test/TritonGPU/amd
- third_party/amd
- backend
- include/TritonAMDGPUTransforms
- lib/TritonAMDGPUTransforms
- python
- test
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
122 | 122 | | |
123 | 123 | | |
124 | 124 | | |
| 125 | + | |
125 | 126 | | |
126 | 127 | | |
127 | 128 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
4 | 9 | | |
5 | 10 | | |
6 | 11 | | |
| |||
Lines changed: 589 additions & 0 deletions
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
281 | 281 | | |
282 | 282 | | |
283 | 283 | | |
| 284 | + | |
284 | 285 | | |
285 | 286 | | |
286 | 287 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
186 | 186 | | |
187 | 187 | | |
188 | 188 | | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
189 | 197 | | |
190 | 198 | | |
191 | 199 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| |||
0 commit comments