Commit dd187a2
Better vector interleaves (#8925)
* Specialized x86 implementation of interleave_vectors
* Update test to be more exhaustive
* Fix comment.
The previous comment reported a time that seemed to have regressed. It
was not 8.2ms on main - more like 11
* Comment fix
* clang-tidy fixes
* Make variable names more consistent
* Simplify code with helper lambda
* Comment tweaks
* Don't do half-width unpcks
* Use optimization fences in the base class too
Before:
Computing best tile sizes for each type
.................................................
bytes, tile width, tile height, bandwidth (GB/s):
1 8 8 20.9997
1 16 8 20.8329
1 8 16 18.5702
1 8 32 17.2463
1 8 64 14.312
2 8 16 19.2047
2 8 8 18.8368
2 16 8 17.0593
2 8 32 17.0591
2 4 8 15.7681
4 8 8 24.9364
4 4 16 22.9699
4 8 16 22.5743
4 4 32 22.255
4 4 8 20.4468
8 8 8 38.4094
8 16 4 28.4167
8 16 8 27.6184
8 8 4 27.6062
8 8 16 26.8693
After:
Computing best tile sizes for each type
.................................................
bytes, tile width, tile height, bandwidth (GB/s):
1 16 32 34.1921
1 16 16 31.8399
1 8 16 25.575
1 16 64 25.1665
1 32 16 25.0061
2 8 32 28.2635
2 8 16 27.7648
2 16 16 27.2126
2 16 32 23.9034
2 8 8 23.6345
4 8 16 34.5303
4 8 8 28.3653
4 16 8 26.8521
4 8 32 26.084
4 16 16 24.4519
8 8 8 33.7163
8 8 4 29.1339
8 4 16 26.418
8 16 4 25.4663
8 2 8 24.3949
* Use Catanzaro's algorithm for non-power-of-two interleaves
* Support more interleave and deinterleave patterns
* clang-tidy fix
* Handle multiple let injections at same site
Also better algorithm for innermost containing stmt
* better simplification and better handling of composite factors
* Fix innermost_containing_node
* Fix some simd op check failures
* Fix infinite recursion issue and missed case in interleave codegen
* Adjust expectations in stage_strided_loads test
* Allow reversed suffix or not in sve test
* Don't use optimization fences on hexagon
* Fix infinite simplifier loop
* Don't hoist transposes on hexagon
* Make distinct strided load nodes in the IR distinct in memory too
* arm-32 has no vst2 for 64-bit elements
* Windows bad filename fix in simd op check
* Temporary dumping of cpu info to debug github actions issue
* dump cpuinfo in makefile testing workflow
To help diagnose occasional illegal instruction errors
* Address review comments
* Remove duplicate function body
* Use slice of predicate
* clang-format
* SVE fixes
Co-authored-by: Claude Code <noreply@anthropic.com>
* Move optimization_fence back
* Try to thread the needle with webassembly nonsense
* Fix msvc warning
* Skip simd_op_check_sve2 on old llvms
* Skip test on sve2 with llvm 21
* Skip block transpose performance test for sve2 on llvm 21
* Skip sub-test that triggers llvm bug
* Test should hopefully now work with llvm main
* Back out dump of cpuinfo
* Fix bad merge
* Use structured bindings in transpose_idioms.cpp
* Avoid bad simplify rule
These rules were broken for triply-nested ramps. A better version of
these rules will come in a later PR.
---------
Co-authored-by: Martijn Courteaux <courteauxmartijn@gmail.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
Co-authored-by: Alex Reinking <areinking@adobe.com>1 parent c936df9 commit dd187a2
33 files changed
Lines changed: 1838 additions & 188 deletions
File tree
- apps/iir_blur
- src
- test
- correctness
- performance
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
| 28 | + | |
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | | - | |
| 39 | + | |
40 | 40 | | |
41 | | - | |
42 | | - | |
| 41 | + | |
| 42 | + | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
| 46 | + | |
45 | 47 | | |
46 | | - | |
47 | | - | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
48 | 55 | | |
49 | 56 | | |
50 | 57 | | |
51 | | - | |
| 58 | + | |
52 | 59 | | |
53 | 60 | | |
54 | 61 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
239 | 239 | | |
240 | 240 | | |
241 | 241 | | |
242 | | - | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
243 | 271 | | |
244 | | - | |
245 | | - | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
246 | 275 | | |
247 | 276 | | |
248 | 277 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1499 | 1499 | | |
1500 | 1500 | | |
1501 | 1501 | | |
1502 | | - | |
1503 | | - | |
1504 | | - | |
1505 | | - | |
| 1502 | + | |
| 1503 | + | |
| 1504 | + | |
| 1505 | + | |
| 1506 | + | |
1506 | 1507 | | |
1507 | 1508 | | |
1508 | 1509 | | |
| |||
1978 | 1979 | | |
1979 | 1980 | | |
1980 | 1981 | | |
| 1982 | + | |
1981 | 1983 | | |
1982 | 1984 | | |
1983 | 1985 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
| 98 | + | |
98 | 99 | | |
99 | 100 | | |
100 | 101 | | |
| |||
1301 | 1302 | | |
1302 | 1303 | | |
1303 | 1304 | | |
| 1305 | + | |
| 1306 | + | |
| 1307 | + | |
| 1308 | + | |
| 1309 | + | |
| 1310 | + | |
1304 | 1311 | | |
1305 | 1312 | | |
1306 | 1313 | | |
| |||
1409 | 1416 | | |
1410 | 1417 | | |
1411 | 1418 | | |
1412 | | - | |
1413 | | - | |
1414 | | - | |
1415 | | - | |
1416 | 1419 | | |
1417 | 1420 | | |
1418 | 1421 | | |
| |||
0 commit comments