feat(nemotron): cache-aware streaming CoreML/ANE encoder by beshkenadze · Pull Request #203 · Blaizzy/mlx-audio-swift

beshkenadze · 2026-06-10T17:26:26Z

Cache-aware streaming CoreML/ANE encoder for Nemotron

Runs the Nemotron 3.5 streaming FastConformer encoder on the ANE in generateStream
(validated uniform-121 feeding + manual cache threading; the prompt MLP and RNN-T decode stay
in MLX). 8-bit palettized (~28% faster ANE, transcript word-identical). --stream --ane
(auto-download) / --coreml-stream-encoder.

This is a power/thermal feature, not a speed one — and honestly a partial GPU offload:
only the encoder moves to the ANE; the decoder stays on the GPU each chunk. For realtime
streaming the speed-vs-MLX difference is moot (RTF ~45–58×); the value is freeing the GPU for
concurrent work + lower power / cooler for always-on / battery streaming.

⚠️ Stacked on #199 + the offline Nemotron PR. Diff includes those until they merge.
Kept as draft until then.

Run the Parakeet Conformer encoder on the Apple Neural Engine via CoreML — the .mlpackage is auto-downloaded from Hugging Face while the TDT decoder stays in MLX. Opt-in encoder API (aneEncoder:) + CI-safe tests.

…rmer) Offline CoreML/ANE encoder for Nemotron 3.5 ASR — a drop-in for the MLX Conformer encoder, with the .mlpackage auto-downloaded from Hugging Face. Stacks on the Parakeet CoreML/ANE encoder.

Cache-aware streaming FastConformer encoder on the ANE via CoreML (caches as explicit in/out tensors); the .mlpackage is auto-downloaded from Hugging Face. Stacks on the offline CoreML/ANE encoder.

beshkenadze mentioned this pull request Jun 10, 2026

feat(nemotron): cache-aware streaming CoreML/ANE encoder beshkenadze/mlx-audio-swift#14

Closed

beshkenadze added 3 commits June 14, 2026 23:42

feat(parakeet): optional CoreML/ANE Conformer encoder

81f284d

Run the Parakeet Conformer encoder on the Apple Neural Engine via CoreML — the .mlpackage is auto-downloaded from Hugging Face while the TDT decoder stays in MLX. Opt-in encoder API (aneEncoder:) + CI-safe tests.

feat(nemotron): offline CoreML/ANE encoder (drop-in for the MLX Confo…

e7f12d4

…rmer) Offline CoreML/ANE encoder for Nemotron 3.5 ASR — a drop-in for the MLX Conformer encoder, with the .mlpackage auto-downloaded from Hugging Face. Stacks on the Parakeet CoreML/ANE encoder.

feat(nemotron): cache-aware streaming CoreML/ANE encoder

6907cee

Cache-aware streaming FastConformer encoder on the ANE via CoreML (caches as explicit in/out tensors); the .mlpackage is auto-downloaded from Hugging Face. Stacks on the offline CoreML/ANE encoder.

beshkenadze force-pushed the feat/nemotron-coreml-ane-streaming branch from 74a580e to 6907cee Compare June 14, 2026 20:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(nemotron): cache-aware streaming CoreML/ANE encoder#203

feat(nemotron): cache-aware streaming CoreML/ANE encoder#203
beshkenadze wants to merge 3 commits into
Blaizzy:mainfrom
beshkenadze:feat/nemotron-coreml-ane-streaming

beshkenadze commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

beshkenadze commented Jun 10, 2026

Cache-aware streaming CoreML/ANE encoder for Nemotron

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant