Skip to content

Add Whisper model family#192

Merged
Blaizzy merged 4 commits into
Blaizzy:mainfrom
JacobLinCool:add-whisper
Jun 12, 2026
Merged

Add Whisper model family#192
Blaizzy merged 4 commits into
Blaizzy:mainfrom
JacobLinCool:add-whisper

Conversation

@JacobLinCool

Copy link
Copy Markdown
Contributor

Summary

  • New WhisperModel in Sources/MLXAudioSTT/Models/Whisper/ conforming to STTGenerationModel.
  • Supports every openai/whisper-* size (tiny → large-v3-turbo, plus .en) and mlx-community/whisper-* mirrors. Both the HuggingFace transformers layout and the OpenAI / mlx-whisper layout (encoder.blocks.*, n_audio_state, etc.) load through the same path.
  • Long audio is decoded as non-overlapping 30 s chunks; streaming yields BPE decode-and-diff deltas so non-Latin scripts come out as clean UTF-8.
  • mlx-community Whisper repos ship weights only — the loader fetches the matching tokenizer from the sibling openai/whisper-* repo on demand (no weight re-download).
  • Renames GLMASR's WhisperConfig / WhisperAttention / WhisperEncoder / WhisperEncoderLayer to GLMASRWhisper* (the SmartTurn module already follows this prefix style) to free up the canonical names.

Test plan

  • xcodebuild build -scheme MLXAudioSTT — clean
  • xcodebuild test … -only-testing:MLXAudioTests/WhisperTests -only-testing:MLXAudioTests/GLMASRModuleSetupTests — 26 tests pass
  • CLI: openai/whisper-tiny / openai/whisper-tiny.en / mlx-community/whisper-large-v3-turbo on short + 66 s tiled clips, English + French + 中文 (Breeze-ASR-25 fine-tune)

JacobLinCool and others added 2 commits May 29, 2026 02:34
Supports every openai/whisper-* size (tiny → large-v3-turbo, plus .en
variants) in both HuggingFace and OpenAI/mlx-whisper checkpoint layouts.
Streaming via decode-and-diff, 30-second chunked decoding for long audio,
tokenizer fallback fetch when mlx-community repos ship weights only.

GLMASR's Whisper* layer types are renamed to GLMASRWhisper* to free the
canonical names for the new model.
lucasnewman
lucasnewman previously approved these changes Jun 5, 2026

@lucasnewman lucasnewman left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great!

Signed-off-by: Lucas Newman <lucas@future.fit>
@Blaizzy Blaizzy merged commit 0c71eba into Blaizzy:main Jun 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants