Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML) by beshkenadze · Pull Request #12 · beshkenadze/mlx-audio-swift

beshkenadze · 2026-06-08T10:56:02Z

Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML)

Summary

Adds an optional CoreML/ANE path for the Parakeet Conformer encoder, behind a flag.
The encoder (≈95% of the compute) runs on the Apple Neural Engine via CoreML while the
TDT decoder and chunking stay in MLX. Same transcript, lower power, and a small speedup
on top.

It plugs into the existing EncoderExecutionImplementation hook in ParakeetModel
(new .coreML case + enableCoreMLEncoder(modelURL:)), so the decode path is untouched.
It falls back to the MLX encoder if CoreML is unavailable. Public MLModel +
MLComputeUnits only — no private ANE APIs.

# auto-downloads the prebuilt encoder from Hugging Face
mlx-audio-swift-stt --model beshkenadze/parakeet-tdt-0.6b-v3-mlx-fp16 \
    --audio in.wav --output-path out --ane --chunk-duration 9.95

// .off (default) · .on (default HF repo) · .repo("id") · .package(localURL)
let model = try await ParakeetModel.fromPretrained(repo, aneEncoder: .on)

Why

ANE has no public API — CoreML is the only sanctioned route, and MLX is GPU/Metal only.
Splitting the graph at the encoder boundary (static feed-forward → CoreML/ANE; the
autoregressive TDT loop → MLX) is a clean, lossless way to reach the ANE. The payoff is
mostly power/thermal (the encoder leaves the GPU), with a speedup as a bonus.

Results

Measured on M1 Max, parakeet-tdt-0.6b-v3, a 20.8-min TED-LIUM 3 talk, chunk 9.95s.

Metric	all-MLX	hybrid (CoreML/ANE encoder)
ANE residency (encoder)	—	100% (0 CPU / 0 GPU ops, 0 graph interruptions)
WER vs reference	7.28%	7.11% · agreement 1.07%
RTF (Swift release, interleaved)	~95×	~131× → ~1.38×
GPU power (sustained)	17.3 W	3.0 W (÷5.8)
Package power	23.4 W	10.3 W (÷2.3) — ANE encoder ≈ 0.9 W

The transcript is reproduced ~1:1 (2786 vs 2802 words); the residual is the
fp16-vs-bf16 difference (CoreML-fp16 is actually closer to fp32 than the shipped
MLX-bf16 encoder).
Residency holds at the 1.1b variant too (100%, 0 interruptions).

How it works

Fixed input shape (a fixed mel-frame count, e.g. 1000 = 10s) — required for ANE
residency; a dynamic (RangeDim) time axis drops it to 0%. The Swift wrapper pads each
chunk's mel to the fixed length and crops the output via the subsampling formula. Keep
--chunk-duration ≤ frames·10ms.
The ANE output MLMultiArray is stride-padded, so the wrapper reads it by strides.
Conversion (NeMo → .mlpackage) lives in tools/coreml-ane/ (convert_encoder.py +
convert_traced.py); see the README. --fp16-io gives 100% ANE / 0 CPU ops.

Scope

Swift: ParakeetCoreMLEncoder.swift, ParakeetModel.swift (the .coreML case),
App.swift (the --coreml-encoder flag).
Tooling: tools/coreml-ane/ converter + README.

Limitations / follow-ups

The .mlpackage is not bundled (it's large). A prebuilt one is hosted on Hugging Face
(beshkenadze/parakeet-tdt-0.6b-v3-coreml-ane);
users can also convert it via the tooling.
MLX↔CoreML marshaling currently uses CPU copies; a zero-copy IOSurface-backed
MLMultiArray would lift the Swift RTF further (the power win is independent).
RTF numbers are M1 Max; newer ANE generations should do better.

Testing

New CI-safe unit tests (Tests/ParakeetCoreMLEncoderTests.swift, swift-testing): the
output-length math matches the dw-striding formula, and a missing .mlpackage throws
(→ MLX fallback). No ANE/model/network needed; swift test: 2/2 pass.
Builds clean (release); transcript parity verified against the all-MLX path on the
full talk.
The decode path is unchanged (the encoder is swapped behind the existing hook).

…aming) (Blaizzy#193) * perf(sortformer): single bulk readback in predsToSegments predsToSegments built its result with a per-frame .item() GPU->CPU read (one synchronous round-trip per frame, per speaker, per call). On long streaming runs that is ~95k syncs and dominates the non-encoder time. Replace it with a single bulk asArray() readback followed by pure-Swift change detection. Output is bit-identical (verified 0.0% DER on a 32-min 2-speaker file vs the previous implementation; the existing SortformerPostprocessingTests cover basic / empty / min-duration / merge-gap / sorted cases). ~1.8x faster streaming end to end. * test(sortformer): pin predsToSegments boundary times incl. trailing segment The existing post-processing tests assert only segment counts/order, not exact times, and never exercise a speaker active through the final frame (the tail branch). Add predsToSegmentsExactBoundaries which pins the start edge, the inactive-close edge, and the active-to-last-frame case — locking the exact frame->time mapping the bulk-readback refactor preserves.

Co-authored-by: vanch <vanchye@outlook.com>

…aizzy#196) - Fix blocking weight-load crash: prompt_kernel used integer @ModuleInfo keys (0/2) -> MLX-swift array misinterpretation. Remap to linear0/linear2. - Add cache-aware streaming (NemotronASRStreaming.swift): per-layer attention + causal-conv caches + incremental causal subsampling (16-frame mel cache); generateStream now streams O(n) with no recompute. Validated vs NeMo CUDA reference (FLEURS en-US 200u): offline 9.62%, streaming 9.43% (CUDA 9.58%); single-clip token-exact.

ParakeetCoreMLEncoder is a drop-in for the MLX encoder that runs the Conformer on the Apple Neural Engine via CoreML, wired through the existing EncoderExecutionImplementation hook (.coreML case + enableCoreMLEncoder). Decoder and chunking stay in MLX. The model is fixed-shape (ANE requirement): chunk mel is padded to the fixed length and the output cropped via the subsampling formula; the stride-padded ANE output is read by strides. Falls back to the MLX encoder if CoreML is unavailable. CLI: --coreml-encoder <path>. Public MLModel + MLComputeUnits only.

convert_encoder.py traces the Conformer encoder at a fixed shape; convert_traced.py runs the coremltools conversion in an isolated numpy<2 env (coremltools 9.0 + numpy>=2 fails on a folded aten::Int const). Produces the fp16 MLProgram .mlpackage for --coreml-encoder. README documents conversion + usage.

Expose subsampledLength as a static helper and verify it matches the dw-striding output-length formula; assert a missing .mlpackage throws (model then falls back to MLX). No ANE/model/network needed -> runs in CI. swift test: 2/2 pass.

Add ANEEncoder enum (.off default / .on / .repo(String) / .package(URL)) and ParakeetModel.fromPretrained(aneEncoder:) so callers just flip it on. .on/.repo download the .mlpackage from Hugging Face (default beshkenadze/parakeet-tdt-0.6b-v3-coreml-ane) via the existing HubClient, cached. CLI gains --ane. Verified end-to-end (download + transcribe, 2802 words, ~105x RT); swift test 3/3.

beshkenadze · 2026-06-08T13:24:21Z

Superseded by upstream PR Blaizzy#199 (rebased onto upstream/main).

beshkenadze and others added 3 commits June 5, 2026 09:13

Add Nemotron ASR streaming model support (Blaizzy#195)

2766d9b

Co-authored-by: vanch <vanchye@outlook.com>

beshkenadze force-pushed the feat/parakeet-coreml-ane-encoder branch from 648dac3 to bb8ac70 Compare June 8, 2026 10:58

beshkenadze added 5 commits June 8, 2026 16:23

docs(coreml-ane): link prebuilt .mlpackage on Hugging Face

5a6fb55

beshkenadze force-pushed the feat/parakeet-coreml-ane-encoder branch from 6b0cd38 to 1a7b407 Compare June 8, 2026 13:24

beshkenadze closed this Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML)#12

Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML)#12
beshkenadze wants to merge 8 commits into
mainfrom
feat/parakeet-coreml-ane-encoder

beshkenadze commented Jun 8, 2026 •

edited

Loading

Uh oh!

beshkenadze commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

beshkenadze commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML)

Summary

Why

Results

How it works

Scope

Limitations / follow-ups

Testing

Uh oh!

beshkenadze commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

beshkenadze commented Jun 8, 2026 •

edited

Loading