Add Conv-based STFT variants for the ONNX preprocessors by intexcor · Pull Request #124 · istupakov/onnx-asr

intexcor · 2026-05-17T12:37:01Z

ONNX Runtime has no provider-agnostic implementation of op.STFT. TensorRT and DirectML each ship their own lowering (applied at model load), but the CUDA execution provider has no STFT kernel — the STFT node falls back to CPU with host/device copies around it — and the CPU implementation is slow for non-power-of-2 FFT sizes.

This PR does that lowering once, at the ONNX-graph level: the windowed DFT is expressed as a 1d convolution (the cos/sin Fourier basis multiplied by the analysis window, as a fixed Conv kernel). The resulting graph uses only operators that have kernels on every execution provider, so a single ONNX preprocessor runs natively on CPU, CUDA, TensorRT, CoreML and DirectML.

Changes

preprocessors/stft.py — shared helper: stft_conv_weights() builds the Conv kernel, conv_power_spectrogram() is the Conv-based STFT subgraph.
Conv-based variants of every STFT preprocessor: gigaam_v2/v3, nemo80/128, whisper80/128, kaldi, built as <name>_conv.onnx.
use_conv_preprocessors flag in PreprocessorRuntimeConfig — selects the Conv variants; auto-enabled for CUDA / TensorRT providers. When enabled, the CUDA provider is no longer excluded from the preprocessor session.
Preprocessor tests parametrized over the Conv variants; Manager tests cover the new flag.

Numerical equivalence

The Conv graphs are numerically equivalent to the STFT graphs — the existing tests/preprocessors checks pass for the Conv variants against the torchaudio / kaldi-native-fbank references, with the existing tolerances.

Benchmark (16 s audio, batch 1)

CPU — preprocessor latency:

preprocessor	NumPy	Conv ONNX	STFT ONNX
gigaam_v2	1.5 ms	2.9 ms	39 ms
whisper80	3.3 ms	5.4 ms	74 ms
nemo80	2.9 ms	4.7 ms	11 ms

On CPU the existing NumPy preprocessors are the fastest option and remain the default — this PR does not change that.

CUDA (RTX 3090 Ti, onnxruntime-gpu 1.26):

preprocessor	STFT ONNX	Conv ONNX
gigaam_v2	156 ms	0.96 ms
whisper80	292 ms	1.38 ms
nemo80	35 ms	2.74 ms

On the CUDA EP the STFT graph runs the STFT node on CPU with memcpy nodes around it; the Conv graph runs entirely on the GPU. This lets the preprocessor stay on-device in a GPU pipeline instead of using the NumPy/CPU fallback.

Scope

The value of the Conv variant is running the preprocessor on an accelerator (CUDA EP) — on CPU the NumPy preprocessors are faster, and this PR keeps them as the CPU default.
TensorRT and DirectML already lower op.STFT internally, so the Conv variant is performance-neutral there; it is auto-enabled for CUDA/TensorRT so that a single ONNX graph runs across every provider.
wespeaker is left unchanged — it uses op.DFT + op.Scan (the slow/accurate preprocessor path), which is a separate case.

op.STFT has no kernel in the onnxruntime CUDA execution provider: a preprocessor graph that uses it gets split, and the STFT node runs on CPU with host/device copies around it. Accelerators such as CoreML do not support it either, and for non-power-of-2 FFT sizes it is slow on CPU. Add a shared preprocessors/stft.py helper that expresses the windowed DFT as a 1d convolution with a fixed kernel, plus Conv-based variants of every STFT-using preprocessor: gigaam_v2/v3, nemo80/128, whisper80/128 and kaldi. The new graphs use only operators with kernels on every execution provider, so they run fully on GPU; they are numerically equivalent to the STFT graphs.

PreprocessorRuntimeConfig gains a use_conv_preprocessors flag that selects the Conv-based ONNX preprocessor variants. It defaults to auto: enabled when a CUDA or TensorRT execution provider is used, disabled otherwise. When the Conv preprocessors are used the CUDA provider is no longer excluded from the preprocessor session (op.STFT has no CUDA kernel, the Conv graph does), so preprocessing runs on the GPU instead of falling back to a NumPy/CPU implementation.

Parametrize the preprocessor tests over the Conv variants, update the build file counts, and cover use_conv_preprocessors selection in the Manager and preprocessor-option tests.

intexcor added 3 commits May 17, 2026 15:25

Test Conv-based preprocessor variants

ffc1ed0

Parametrize the preprocessor tests over the Conv variants, update the build file counts, and cover use_conv_preprocessors selection in the Manager and preprocessor-option tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Conv-based STFT variants for the ONNX preprocessors#124

Add Conv-based STFT variants for the ONNX preprocessors#124
intexcor wants to merge 3 commits into
istupakov:mainfrom
intexcor:conv-stft-preprocessors

intexcor commented May 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

intexcor commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Numerical equivalence

Benchmark (16 s audio, batch 1)

Scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

intexcor commented May 17, 2026 •

edited

Loading