Skip to content

DSP perf: latch mode/shape enums, fuse matrix mul, skip FB math when …#372

Merged
baconpaul merged 1 commit into
mainfrom
perf-fixes
May 11, 2026
Merged

DSP perf: latch mode/shape enums, fuse matrix mul, skip FB math when …#372
baconpaul merged 1 commit into
mainfrom
perf-fixes

Conversation

@baconpaul

Copy link
Copy Markdown
Owner

…idle

OpSource now caches all mode/shape enums at attack time (waveform, extended mode, phase-map shape, resonant-sweep window/depth, noise mode/type/lfsr mode) and the dispatch + AUDIO_IN check read the cached values instead of round+casting the patch params every block. Heavy state (st waveform, stWindow, extendedLagM/N) was already latched at attack, so the dispatch now matches.

MatrixNodeFrom::applyBlock fuses the prior mul_block + scalar convert two-pass into a single loop that computes modlev * fromOut inline, dropping the intermediate mod[8] buffer. Parenthesisation matches the original multiply order — bit-exact output.

Self-feedback is now gated by a UsesFB template arg on innerLoopImpl. MatrixNodeSelf::applyBlock signals OpSource via hasActiveFeedback; the inner-loop dispatcher branches once at block start and picks the right template instantiation, which constexpr-skips the fb math and the fbv[] shift when self-feedback is off. The bigger win turns out to be that removing the per-sample backward dependency through fbv lets the compiler reorder/vectorize across the whole inner loop. -40 to -73% on no-feedback scenarios; flat on dense FB. std::signbit on int32 is also replaced with a plain int compare in the same path.

Bit-exact audio confirmed on all non-NOISE scenarios via the six-sines-perf harness.

Assisted-by: Claude Opus 4.7 noreply@anthropic.com

…idle

OpSource now caches all mode/shape enums at attack time (waveform,
extended mode, phase-map shape, resonant-sweep window/depth, noise
mode/type/lfsr mode) and the dispatch + AUDIO_IN check read the cached
values instead of round+casting the patch params every block. Heavy
state (st waveform, stWindow, extendedLagM/N) was already latched at
attack, so the dispatch now matches.

MatrixNodeFrom::applyBlock fuses the prior `mul_block + scalar convert`
two-pass into a single loop that computes modlev * fromOut inline,
dropping the intermediate mod[8] buffer. Parenthesisation matches the
original multiply order — bit-exact output.

Self-feedback is now gated by a `UsesFB` template arg on
innerLoopImpl. MatrixNodeSelf::applyBlock signals OpSource via
hasActiveFeedback; the inner-loop dispatcher branches once at block
start and picks the right template instantiation, which constexpr-skips
the fb math and the fbv[] shift when self-feedback is off. The bigger
win turns out to be that removing the per-sample backward dependency
through fbv lets the compiler reorder/vectorize across the whole
inner loop. -40 to -73% on no-feedback scenarios; flat on dense FB.
std::signbit on int32 is also replaced with a plain int compare in
the same path.

Bit-exact audio confirmed on all non-NOISE scenarios via the
six-sines-perf harness.

Assisted-by: Claude Opus 4.7 <noreply@anthropic.com>
@baconpaul baconpaul merged commit cdec215 into main May 11, 2026
6 checks passed
@baconpaul baconpaul deleted the perf-fixes branch May 13, 2026 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant