Commit fec942a
bank: restore wasm SIMD autovectorization in hot loop
On wasm32+simd128, `f32::mul_add` lowered to per-lane `fmaf` calls and
defeated autovectorization of the EWMA loop, leaving the "SIMD" build
roughly 10x slower than it should be. Three changes in bank.rs recover the
full speedup:
- `mul_add(a, b, c)` helper: unfused (a*b + c) on wasm32+simd128 to keep
the vector loop; `f32::mul_add` on native where vector FMA exists.
- `process_samples` chunks to the next stabilization boundary so the
per-sample modulo check moves out of the hot path.
- `process_sample_inner` hoists the ten backing `Vec` fields to local
`&mut [f32]` of known length `n`, letting LLVM drop bounds checks,
hoist the length-min across slices, and trust disjointedness.
Browser bench throughput (ns/sample, 48 kHz x 1 s):
bins before after speedup
88 730 54 13.5x
264 2097 143 14.7x
440 3449 238 14.5x
880 6888 470 14.7x
Native `cargo bench --bench bank` (aarch64): within noise at every bin count.
Co-authored-by: Pengo Wray <pengowray@users.noreply.github.com>1 parent 3eee482 commit fec942a
1 file changed
Lines changed: 69 additions & 22 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | | - | |
125 | | - | |
126 | | - | |
| 107 | + | |
127 | 108 | | |
128 | 109 | | |
129 | 110 | | |
| |||
133 | 114 | | |
134 | 115 | | |
135 | 116 | | |
136 | | - | |
137 | | - | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
138 | 167 | | |
139 | 168 | | |
140 | 169 | | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
141 | 175 | | |
142 | 176 | | |
143 | 177 | | |
| |||
231 | 265 | | |
232 | 266 | | |
233 | 267 | | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
234 | 281 | | |
235 | 282 | | |
236 | 283 | | |
| |||
0 commit comments