Skip to content

feat(nemotron): cache-aware streaming CoreML/ANE encoder#203

Draft
beshkenadze wants to merge 3 commits into
Blaizzy:mainfrom
beshkenadze:feat/nemotron-coreml-ane-streaming
Draft

feat(nemotron): cache-aware streaming CoreML/ANE encoder#203
beshkenadze wants to merge 3 commits into
Blaizzy:mainfrom
beshkenadze:feat/nemotron-coreml-ane-streaming

Conversation

@beshkenadze

Copy link
Copy Markdown
Contributor

Cache-aware streaming CoreML/ANE encoder for Nemotron

Runs the Nemotron 3.5 streaming FastConformer encoder on the ANE in generateStream
(validated uniform-121 feeding + manual cache threading; the prompt MLP and RNN-T decode stay
in MLX). 8-bit palettized (~28% faster ANE, transcript word-identical). --stream --ane
(auto-download) / --coreml-stream-encoder.

This is a power/thermal feature, not a speed one — and honestly a partial GPU offload:
only the encoder moves to the ANE; the decoder stays on the GPU each chunk. For realtime
streaming the speed-vs-MLX difference is moot (RTF ~45–58×); the value is freeing the GPU for
concurrent work + lower power / cooler for always-on / battery streaming.

⚠️ Stacked on #199 + the offline Nemotron PR. Diff includes those until they merge.
Kept as draft until then.

Run the Parakeet Conformer encoder on the Apple Neural Engine via CoreML — the
.mlpackage is auto-downloaded from Hugging Face while the TDT decoder stays in
MLX. Opt-in encoder API (aneEncoder:) + CI-safe tests.
…rmer)

Offline CoreML/ANE encoder for Nemotron 3.5 ASR — a drop-in for the MLX Conformer
encoder, with the .mlpackage auto-downloaded from Hugging Face. Stacks on the
Parakeet CoreML/ANE encoder.
Cache-aware streaming FastConformer encoder on the ANE via CoreML (caches as
explicit in/out tensors); the .mlpackage is auto-downloaded from Hugging Face.
Stacks on the offline CoreML/ANE encoder.
@beshkenadze beshkenadze force-pushed the feat/nemotron-coreml-ane-streaming branch from 74a580e to 6907cee Compare June 14, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant