Commit e4980c6
authored
Generic Pool Large Kernel Optimization (#23162)
Ticket
N/A
Problem description
Generic Pool's performance is poor for large kernel sizes.
What's changed
- YoloV4's expected perf has been increased from 87.8 to 93.5 FPS.
- Generic pool now supports 32 row reductions
- A bug was fixed with the size of the intermediate / partials CB
- A bug was fixed in the face dimension passed to unpack tilize
- For Max Pool, the fill_with_val in the loop was eliminated. This is
possible since the junk data left from previous iterations do not affect
the max value.
- in_cb initialization has been added for cases where there are not more
intermediate reduction chunks than multibuffering chunks. This is
necessary since the compute kernel always processes
max_rows_per_reduction rows from the in_cb which may include
uninitialized data if multibuffering is enabled. However when we have
enough intermediate reduction chunks, the entire in_cb get's filled with
valid data which cannot contain values larger than the max, thus
initialization is not necessary.
- Clear out tiles is now used for buffer initialization as well as for
Avg Pool's fill_with_val called in the loop resulting in dramatically
better performance in some cases.
Note
- Multi buffering does not require in-loop fill_with_val since one CB
only processes a single top left index at a time, and if necessary the
in_cb was initialized before the loop.
- Junk data from previous top left indices is not an issue since all
kernel positions have the same number of elements.
- For both average pool and max pool we would not need to initialize the
CB with the init value at all since we know we have kernel_HW >
max_rows_per_reduction except that we are using multibuffering so there
will usually be some dead space. It is possible that it is worth it to
turn off multibuffering but more testing is required.
Checklist
### Checklist
- [x] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml)
CI passes:
https://github.com/tenstorrent/tt-metal/actions/runs/15646661523
- [x] [Blackhole Post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml)
CI passes:
(same failure as main, unrelated to changes)
https://github.com/tenstorrent/tt-metal/actions/runs/15646662524
- [x] [Model
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
CI passes:
(same failure as main, unrelated to changes)
https://github.com/tenstorrent/tt-metal/actions/runs/15646665080
- [x] [Device performance
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
CI passes:
https://github.com/tenstorrent/tt-metal/actions/runs/15646663480
- [x] [Nightly
L2](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml)
CI passes:
(wormhole)
https://github.com/tenstorrent/tt-metal/actions/runs/15646668961
(blackhole)
https://github.com/tenstorrent/tt-metal/actions/runs/15646670132
- [x] [Frequent
model](https://github.com/tenstorrent/tt-metal/actions/workflows/fast-dispatch-full-regressions-and-models.yaml)
CI passes:
https://github.com/tenstorrent/tt-metal/actions/runs/15646666798
- [x] New/Existing tests provide coverage for changes1 parent 613ef42 commit e4980c6
File tree
7 files changed
+139
-56
lines changed- models
- demos/yolov4/tests/perf
- experimental/yolov8s_world/tests
- ttnn/cpp/ttnn/operations/pool/generic/device
- kernels
- compute
- dataflow
7 files changed
+139
-56
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
104 | | - | |
| 104 | + | |
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
135 | | - | |
| 135 | + | |
136 | 136 | | |
137 | 137 | | |
138 | 138 | | |
| |||
Lines changed: 15 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| 67 | + | |
67 | 68 | | |
68 | 69 | | |
69 | 70 | | |
| |||
80 | 81 | | |
81 | 82 | | |
82 | 83 | | |
83 | | - | |
| 84 | + | |
84 | 85 | | |
85 | 86 | | |
86 | 87 | | |
| |||
119 | 120 | | |
120 | 121 | | |
121 | 122 | | |
| 123 | + | |
| 124 | + | |
122 | 125 | | |
123 | 126 | | |
124 | 127 | | |
| |||
136 | 139 | | |
137 | 140 | | |
138 | 141 | | |
| 142 | + | |
139 | 143 | | |
140 | | - | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
| 144 | + | |
146 | 145 | | |
147 | 146 | | |
148 | 147 | | |
149 | 148 | | |
150 | | - | |
151 | | - | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
152 | 154 | | |
153 | 155 | | |
154 | 156 | | |
| |||
171 | 173 | | |
172 | 174 | | |
173 | 175 | | |
174 | | - | |
| 176 | + | |
175 | 177 | | |
176 | 178 | | |
177 | 179 | | |
| |||
184 | 186 | | |
185 | 187 | | |
186 | 188 | | |
| 189 | + | |
187 | 190 | | |
188 | 191 | | |
189 | 192 | | |
| |||
200 | 203 | | |
201 | 204 | | |
202 | 205 | | |
203 | | - | |
| 206 | + | |
204 | 207 | | |
205 | 208 | | |
206 | 209 | | |
| |||
213 | 216 | | |
214 | 217 | | |
215 | 218 | | |
| 219 | + | |
216 | 220 | | |
217 | 221 | | |
218 | 222 | | |
| |||
Lines changed: 26 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
Lines changed: 0 additions & 26 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | 16 | | |
43 | 17 | | |
44 | 18 | | |
| |||
Lines changed: 71 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
| |||
54 | 55 | | |
55 | 56 | | |
56 | 57 | | |
| 58 | + | |
| 59 | + | |
57 | 60 | | |
58 | 61 | | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
59 | 66 | | |
60 | 67 | | |
61 | 68 | | |
| |||
64 | 71 | | |
65 | 72 | | |
66 | 73 | | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
67 | 112 | | |
68 | 113 | | |
69 | | - | |
70 | | - | |
| 114 | + | |
| 115 | + | |
71 | 116 | | |
72 | | - | |
73 | 117 | | |
74 | | - | |
75 | 118 | | |
76 | | - | |
77 | | - | |
| 119 | + | |
| 120 | + | |
78 | 121 | | |
79 | 122 | | |
80 | 123 | | |
81 | 124 | | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
82 | 136 | | |
83 | 137 | | |
84 | 138 | | |
| |||
90 | 144 | | |
91 | 145 | | |
92 | 146 | | |
93 | | - | |
94 | 147 | | |
95 | 148 | | |
96 | 149 | | |
| |||
145 | 198 | | |
146 | 199 | | |
147 | 200 | | |
148 | | - | |
149 | | - | |
150 | | - | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
151 | 212 | | |
152 | 213 | | |
153 | 214 | | |
| |||
Lines changed: 25 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
296 | 296 | | |
297 | 297 | | |
298 | 298 | | |
299 | | - | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
300 | 302 | | |
301 | | - | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
302 | 307 | | |
303 | 308 | | |
304 | 309 | | |
| |||
360 | 365 | | |
361 | 366 | | |
362 | 367 | | |
363 | | - | |
| 368 | + | |
| 369 | + | |
364 | 370 | | |
365 | | - | |
| 371 | + | |
| 372 | + | |
366 | 373 | | |
367 | 374 | | |
368 | 375 | | |
369 | 376 | | |
370 | 377 | | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
371 | 384 | | |
372 | 385 | | |
373 | 386 | | |
| |||
441 | 454 | | |
442 | 455 | | |
443 | 456 | | |
444 | | - | |
| 457 | + | |
445 | 458 | | |
446 | 459 | | |
447 | 460 | | |
| |||
540 | 553 | | |
541 | 554 | | |
542 | 555 | | |
543 | | - | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
544 | 560 | | |
545 | 561 | | |
546 | 562 | | |
| |||
589 | 605 | | |
590 | 606 | | |
591 | 607 | | |
592 | | - | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
593 | 611 | | |
594 | 612 | | |
595 | 613 | | |
| |||
0 commit comments