[Perf] Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML)#199
Open
beshkenadze wants to merge 1 commit into
Open
[Perf] Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML)#199beshkenadze wants to merge 1 commit into
beshkenadze wants to merge 1 commit into
Conversation
cb58a8f to
c332b3b
Compare
This was referenced Jun 10, 2026
Collaborator
|
@beshkenadze It seems like the CoreML export should live in the python repo? Having python tools in this repo doesn't make a lot of sense to me as the toolchains are very distinct. Thoughts? |
Contributor
Author
|
Agreed. I'll move |
Run the Parakeet Conformer encoder on the Apple Neural Engine via CoreML — the .mlpackage is auto-downloaded from Hugging Face while the TDT decoder stays in MLX. Opt-in encoder API (aneEncoder:) + CI-safe tests.
c931113 to
81f284d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Run the Parakeet Conformer encoder on the Apple Neural Engine (CoreML)
Summary
Adds an optional CoreML/ANE path for the Parakeet Conformer encoder, behind a flag.
The encoder (≈95% of the compute) runs on the Apple Neural Engine via CoreML while the
TDT decoder and chunking stay in MLX. Same transcript, lower power, and a small speedup
on top.
It plugs into the existing
EncoderExecutionImplementationhook inParakeetModel(new
.coreMLcase +enableCoreMLEncoder(modelURL:)), so the decode path is untouched.It falls back to the MLX encoder if CoreML is unavailable. Public
MLModel+MLComputeUnitsonly — no private ANE APIs.# auto-downloads the prebuilt encoder from Hugging Face mlx-audio-swift-stt --model beshkenadze/parakeet-tdt-0.6b-v3-mlx-fp16 \ --audio in.wav --output-path out --ane --chunk-duration 9.95// .off (default) · .on (default HF repo) · .repo("id") · .package(localURL) let model = try await ParakeetModel.fromPretrained(repo, aneEncoder: .on)Why
ANE has no public API — CoreML is the only sanctioned route, and MLX is GPU/Metal only.
Splitting the graph at the encoder boundary (static feed-forward → CoreML/ANE; the
autoregressive TDT loop → MLX) is a clean, lossless way to reach the ANE. The payoff is
mostly power/thermal (the encoder leaves the GPU), with a speedup as a bonus.
Results
Measured on M1 Max,
parakeet-tdt-0.6b-v3, a 20.8-min TED-LIUM 3 talk, chunk 9.95s.fp16-vs-bf16 difference (CoreML-fp16 is actually closer to fp32 than the shipped
MLX-bf16 encoder).
How it works
residency; a dynamic (
RangeDim) time axis drops it to 0%. The Swift wrapper pads eachchunk's mel to the fixed length and crops the output via the subsampling formula. Keep
--chunk-duration ≤ frames·10ms.MLMultiArrayis stride-padded, so the wrapper reads it by strides..mlpackage) lives intools/coreml-ane/(convert_encoder.py+convert_traced.py); see the README.--fp16-iogives 100% ANE / 0 CPU ops.Scope
ParakeetCoreMLEncoder.swift,ParakeetModel.swift(the.coreMLcase),App.swift(the--coreml-encoderflag).tools/coreml-ane/converter + README.Limitations / follow-ups
.mlpackageis not bundled (it's large). A prebuilt one is hosted on Hugging Face(
beshkenadze/parakeet-tdt-0.6b-v3-coreml-ane);users can also convert it via the tooling.
MLMultiArraywould lift the Swift RTF further (the power win is independent).Testing
Tests/ParakeetCoreMLEncoderTests.swift, swift-testing): theoutput-length math matches the dw-striding formula, and a missing
.mlpackagethrows(→ MLX fallback). No ANE/model/network needed;
swift test: 2/2 pass.release); transcript parity verified against the all-MLX path on thefull talk.