[Perf] Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML) by beshkenadze · Pull Request #199 · Blaizzy/mlx-audio-swift

beshkenadze · 2026-06-08T13:24:06Z

Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML)

Summary

Adds an optional CoreML/ANE path for the Parakeet Conformer encoder, behind a flag.
The encoder (≈95% of the compute) runs on the Apple Neural Engine via CoreML while the
TDT decoder and chunking stay in MLX. Same transcript, lower power, and a small speedup
on top.

It plugs into the existing EncoderExecutionImplementation hook in ParakeetModel
(new .coreML case + enableCoreMLEncoder(modelURL:)), so the decode path is untouched.
It falls back to the MLX encoder if CoreML is unavailable. Public MLModel +
MLComputeUnits only — no private ANE APIs.

# auto-downloads the prebuilt encoder from Hugging Face
mlx-audio-swift-stt --model beshkenadze/parakeet-tdt-0.6b-v3-mlx-fp16 \
    --audio in.wav --output-path out --ane --chunk-duration 9.95

// .off (default) · .on (default HF repo) · .repo("id") · .package(localURL)
let model = try await ParakeetModel.fromPretrained(repo, aneEncoder: .on)

Why

ANE has no public API — CoreML is the only sanctioned route, and MLX is GPU/Metal only.
Splitting the graph at the encoder boundary (static feed-forward → CoreML/ANE; the
autoregressive TDT loop → MLX) is a clean, lossless way to reach the ANE. The payoff is
mostly power/thermal (the encoder leaves the GPU), with a speedup as a bonus.

Results

Measured on M1 Max, parakeet-tdt-0.6b-v3, a 20.8-min TED-LIUM 3 talk, chunk 9.95s.

Metric	all-MLX	hybrid (CoreML/ANE encoder)
ANE residency (encoder)	—	100% (0 CPU / 0 GPU ops, 0 graph interruptions)
WER vs reference	7.28%	7.11% · agreement 1.07%
RTF (Swift release, interleaved)	~95×	~131× → ~1.38×
GPU power (sustained)	17.3 W	3.0 W (÷5.8)
Package power	23.4 W	10.3 W (÷2.3) — ANE encoder ≈ 0.9 W

The transcript is reproduced ~1:1 (2786 vs 2802 words); the residual is the
fp16-vs-bf16 difference (CoreML-fp16 is actually closer to fp32 than the shipped
MLX-bf16 encoder).
Residency holds at the 1.1b variant too (100%, 0 interruptions).

How it works

Fixed input shape (a fixed mel-frame count, e.g. 1000 = 10s) — required for ANE
residency; a dynamic (RangeDim) time axis drops it to 0%. The Swift wrapper pads each
chunk's mel to the fixed length and crops the output via the subsampling formula. Keep
--chunk-duration ≤ frames·10ms.
The ANE output MLMultiArray is stride-padded, so the wrapper reads it by strides.
Conversion (NeMo → .mlpackage) lives in tools/coreml-ane/ (convert_encoder.py +
convert_traced.py); see the README. --fp16-io gives 100% ANE / 0 CPU ops.

Scope

Swift: ParakeetCoreMLEncoder.swift, ParakeetModel.swift (the .coreML case),
App.swift (the --coreml-encoder flag).
Tooling: tools/coreml-ane/ converter + README.

Limitations / follow-ups

The .mlpackage is not bundled (it's large). A prebuilt one is hosted on Hugging Face
(beshkenadze/parakeet-tdt-0.6b-v3-coreml-ane);
users can also convert it via the tooling.
MLX↔CoreML marshaling currently uses CPU copies; a zero-copy IOSurface-backed
MLMultiArray would lift the Swift RTF further (the power win is independent).
RTF numbers are M1 Max; newer ANE generations should do better.

Testing

New CI-safe unit tests (Tests/ParakeetCoreMLEncoderTests.swift, swift-testing): the
output-length math matches the dw-striding formula, and a missing .mlpackage throws
(→ MLX fallback). No ANE/model/network needed; swift test: 2/2 pass.
Builds clean (release); transcript parity verified against the all-MLX path on the
full talk.
The decode path is unchanged (the encoder is swapped behind the existing hook).

lucasnewman · 2026-06-12T17:02:28Z

@beshkenadze It seems like the CoreML export should live in the python repo? Having python tools in this repo doesn't make a lot of sense to me as the toolchains are very distinct. Thoughts?

beshkenadze · 2026-06-14T17:09:24Z

Agreed. I'll move tools/coreml-ane/ out and co-locate each converter with its .mlpackage in the HF model repo — artifact + the exact script that produced it. Sound good?

Run the Parakeet Conformer encoder on the Apple Neural Engine via CoreML — the .mlpackage is auto-downloaded from Hugging Face while the TDT decoder stays in MLX. Opt-in encoder API (aneEncoder:) + CI-safe tests.

beshkenadze mentioned this pull request Jun 8, 2026

Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML) beshkenadze/mlx-audio-swift#12

Closed

beshkenadze changed the title ~~Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML)~~ [Perf] Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML) Jun 8, 2026

beshkenadze mentioned this pull request Jun 9, 2026

feat(nemotron): offline CoreML/ANE encoder (drop-in for the MLX Conformer) beshkenadze/mlx-audio-swift#13

Closed

beshkenadze force-pushed the feat/parakeet-coreml-ane-encoder branch 2 times, most recently from cb58a8f to c332b3b Compare June 10, 2026 14:26

This was referenced Jun 10, 2026

feat(nemotron): offline CoreML/ANE encoder (drop-in for the MLX Conformer) #202

Draft

feat(nemotron): cache-aware streaming CoreML/ANE encoder #203

Draft

feat(parakeet): optional CoreML/ANE Conformer encoder

81f284d

Run the Parakeet Conformer encoder on the Apple Neural Engine via CoreML — the .mlpackage is auto-downloaded from Hugging Face while the TDT decoder stays in MLX. Opt-in encoder API (aneEncoder:) + CI-safe tests.

beshkenadze force-pushed the feat/parakeet-coreml-ane-encoder branch from c931113 to 81f284d Compare June 14, 2026 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Perf] Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML)#199

[Perf] Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML)#199
beshkenadze wants to merge 1 commit into
Blaizzy:mainfrom
beshkenadze:feat/parakeet-coreml-ane-encoder

beshkenadze commented Jun 8, 2026

Uh oh!

lucasnewman commented Jun 12, 2026

Uh oh!

beshkenadze commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

beshkenadze commented Jun 8, 2026

Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML)

Summary

Why

Results

How it works

Scope

Limitations / follow-ups

Testing

Uh oh!

lucasnewman commented Jun 12, 2026

Uh oh!

beshkenadze commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants