feat(nemotron): offline CoreML/ANE encoder (drop-in for the MLX Conformer) by beshkenadze · Pull Request #13 · beshkenadze/mlx-audio-swift

beshkenadze · 2026-06-09T16:14:56Z

Stacked on Blaizzy#199 (Parakeet CoreML/ANE encoder) — base feat/parakeet-coreml-ane-encoder; reuses the generic ConformerCoreMLEncoder it introduces.

What

Runs the Nemotron 3.5 ASR FastConformer encoder on the ANE via CoreML for the offline (decode) path. The FastConformer has the same fixed-shape I/O as Parakeet's Conformer, so the offline encoder is the same ConformerCoreMLEncoder (typealias, not duplicated). The prompt MLP + RNN-T decoder stay in MLX.

Changes

NemotronASRModel: optional coreMLEncoder; decode() runs it (falls back to MLX on any failure); generate() auto-clamps chunkDuration to the model's fixed length so overlap-merge stitches arbitrary-length audio.
enableCoreMLEncoder(modelURL:) + --coreml-encoder <path> for Nemotron in the STT CLI.
ParakeetCoreMLEncoder.fixedFrames made public (used for the clamp).
CI-safe tests (NemotronCoreMLEncoderTests).

Measured (M1 Max, Python hybrid via `coremltools.predict` on the ANE)

Encoder on ANE ≈ 2× faster than MLX-fp32 (43 vs 80 ms / 10 s); ~1.31× end-to-end offline; GPU power ÷~9 (encoder off the GPU). MLX-Nemotron runs fp32 (no bf16 path) — why ANE wins here (filed beshkenadze/mlx-audio#25 to add bf16).

Conversion

nvidia/nemotron-speech-streaming-en-0.6b (or 3.5) encoder → CoreML .mlpackage via tools/coreml-ane/convert_encoder.py --model <id> (99% ANE-native ops); pass it to --coreml-encoder.

Follow-ups

Prebuilt HF .mlpackage artifact + --ane auto-download (the published encoder must match the MLX weights; the 3.5 multilingual encoder needs NeMo main to convert).
Streaming CoreML encoder (cache-aware, functional cache I/O) — validated convertible (98% ANE), Swift integration pending.

Production-safe: public MLModel + MLComputeUnits only (macOS 14+), no private APIs.

Add `--palettize N` to convert_encoder.py: `8` = 8-bit uniform palettize, `-1` = per-channel linear int8 (robust to weight outliers). Smaller model (~2x) + faster ANE compute, accuracy validated per-model. Also port the `aten::Int` coremltools patch (the converter otherwise breaks on torch >= 2.8: "only 0-dimensional arrays can be converted to Python scalars"). Findings: 8-bit works cleanly for RNN-T encoders; Parakeet's TDT decoder is more quant-sensitive — 8-bit uniform crushes its outlier-heavy weights (encoder cosine 0.21), per-channel linear int8 recovers it (word-identical per-window), and long audio needs a smaller chunk (TDT + padded final chunk + quant). Runners: _remote_parakeet_linear.sh, _remote_offline_palettize.sh.

…rmer) Run the Nemotron 3.5 ASR FastConformer encoder on the ANE via CoreML for the offline (decode) path, mirroring the Parakeet CoreML/ANE encoder. The FastConformer has the same fixed-shape I/O, so the generic ConformerCoreMLEncoder is reused (typealias, not duplicated). The prompt MLP + RNN-T decoder stay in MLX. - NemotronASRModel: optional coreMLEncoder; decode() runs it (falls back to MLX on any failure); generate() auto-clamps chunkDuration to the model's fixed length so overlap-merge stitches long audio. - enableCoreMLEncoder(modelURL:) + --coreml-encoder <path> for Nemotron. - ParakeetCoreMLEncoder.fixedFrames made public (used for the clamp). - CI-safe tests. Streaming CoreML is a follow-up.

…oder Adds enableCoreMLEncoder(repo:) + defaultANEEncoderRepo (the matched 3.5 .mlpackage published on HF), reusing Parakeet's downloader (the encoder package is generic). mlx-audio-swift-stt --ane now auto-downloads + runs the Nemotron encoder on the ANE (no manual --coreml-encoder path). Verified end-to-end: downloads + transcribes.

Re-upload the offline Nemotron 3.5 encoder palettized to 8-bit (564 MB, ~2x smaller than fp16; transcript word-identical to MLX — only the int8/fp16-vs-bf16 floor) under the existing production package name, so `--ane` is unchanged. Card + upload script. Convert with `convert_encoder.py --model nvidia/nemotron-3.5-asr-streaming-0.6b --frames 1000 --palettize 8`.

Machine-specific conversion runners + HF upload/card scripts are dev scaffolding, not part of the upstream feature (the converter convert_encoder.py + the Swift encoder are).

beshkenadze · 2026-06-10T17:26:41Z

Superseded by upstream Blaizzy#202 (offline Nemotron ANE).

beshkenadze mentioned this pull request Jun 9, 2026

feat(nemotron): cache-aware streaming CoreML/ANE encoder #14

Closed

beshkenadze force-pushed the feat/nemotron-coreml-ane-encoder branch from 34dc197 to dcf49b8 Compare June 10, 2026 13:50

beshkenadze force-pushed the feat/parakeet-coreml-ane-encoder branch from 02976b7 to cb58a8f Compare June 10, 2026 14:22

beshkenadze force-pushed the feat/nemotron-coreml-ane-encoder branch from dcf49b8 to f1db220 Compare June 10, 2026 14:22

beshkenadze added 3 commits June 10, 2026 17:26

beshkenadze force-pushed the feat/parakeet-coreml-ane-encoder branch from cb58a8f to c332b3b Compare June 10, 2026 14:26

beshkenadze force-pushed the feat/nemotron-coreml-ane-encoder branch from f1db220 to 1343501 Compare June 10, 2026 14:26

chore(coreml-ane): drop dev-only tooling from the PR

ae4e672

Machine-specific conversion runners + HF upload/card scripts are dev scaffolding, not part of the upstream feature (the converter convert_encoder.py + the Swift encoder are).

beshkenadze marked this pull request as draft June 10, 2026 14:47

beshkenadze closed this Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(nemotron): offline CoreML/ANE encoder (drop-in for the MLX Conformer)#13

feat(nemotron): offline CoreML/ANE encoder (drop-in for the MLX Conformer)#13
beshkenadze wants to merge 5 commits into
feat/parakeet-coreml-ane-encoderfrom
feat/nemotron-coreml-ane-encoder

beshkenadze commented Jun 9, 2026

Uh oh!

beshkenadze commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

beshkenadze commented Jun 9, 2026

What

Changes

Measured (M1 Max, Python hybrid via coremltools.predict on the ANE)

Conversion

Follow-ups

Uh oh!

beshkenadze commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Measured (M1 Max, Python hybrid via `coremltools.predict` on the ANE)