Releases: soniqo/speech-swift
v0.0.20
What's Changed
New
- Nemotron-3.5 ASR Streaming (multilingual) — 40 language-locales, native punctuation and capitalization, cache-aware FastConformer-RNN-T (600 M) on the Apple Neural Engine. New
NemotronStreamingASRtarget. Default model is the CoreML INT8 bundle (612 MB, RTF ≈ 0.07 on M5 Pro). Language is a BCP-47 tag (en-US,de-DE,ja-JP, ...). #294 - English-only Nemotron bundle still supported through the same target — the runtime introspects the encoder's input description for
language_maskand adapts. Older bundles continue to work without code changes. #295
Performance & quality
- Qwen3-ASR CoreML encoder rebuild — chunked-attention encoder + bench scaffold for cross-engine ASR comparison (Qwen3-ASR, Parakeet TDT, Nemotron, Whisper). #292
- Parakeet TDT CoreML — default to a single fixed-shape 30 s export (drop
EnumeratedShapes) to fix ANE-compile hangs on iOS. #290
Docs
- CoreML encoder interface description refreshed; M2 Max baselines retired in favour of current M-series numbers. #293
CI
CLI
brew upgrade soniqo/tap/speech
speech transcribe recording.wav --engine nemotron --language en-US
speech transcribe meeting.wav --engine nemotron --language de-DE
speech transcribe interview.wav --engine nemotron --language ja-JP--engine nemotron defaults to the multilingual CoreML INT8 bundle from aufklarer/Nemotron-3.5-ASR-Streaming-0.6B-CoreML-INT8 (downloaded on first use, cached under ~/Library/Caches/qwen3-speech/).
v0.0.19
What's Changed
New
- HTDemucs (Demucs v4) — higher-quality music source separation, +3.01 dB SDR over Open-Unmix (largest gains on bass/drums).
speech separate --engine htdemucs. #288 - OpenAI-compatible transcription —
/v1/audio/transcriptionsserver endpoint. #273 - Qwen3-TTS on CoreML — full Apple Neural Engine routing + chunked decode. #269
Performance & quality
- 7× faster CoreML ASR — split + batched-prefill decoder, ANE-safe. #282
- Mastering-grade resampler — drains the filter tail, exact output length, phase-aligned stereo; mastering SRC for music/upsampling, standard for speech. #284
- Qwen3-TTS bf16 / non-quantized support + ICL stability. #272
- CosyVoice bf16 bundle support. #280
Fixes
- Abort wedged Hugging Face downloads instead of hanging. #283
- Stride-aware, NaN-safe argmax in the CoreML ASR decoder. #278
CLI
speech separate song.wav --engine htdemucs # Demucs v4 (higher quality)
speech separate song.wav # UMX (default)v0.0.18
What's Changed
- StableAudio3MusicGen: Stable Audio 3 Medium INT8 as the new default music-gen engine for `speech compose`. Bit-perfect parity with Stability AI's pure-MLX Python reference, ~16× realtime, stereo 44.1 kHz. MAGNeT stays available via `--engine magnet`. #270
CLI
```bash
speech compose "lofi house loop" # SA3 (default)
speech compose "ambient pad" --engine magnet # old engine
speech compose "..." --sa3-variant medium-int4 --seconds 30
```
v0.0.17
What's Changed
- release: ship all .bundle resources (fixes compose/magnet on brew) by @ivan-digital in #267
Full Changelog: v0.0.16...v0.0.17
v0.0.16
What's Changed
- ci(tests): drop stale homebrew-lint job by @ivan-digital in #246
- feat(speak/cosyvoice): --seed for reproducible synthesis by @ivan-digital in #245
- cosyvoice: native zero-shot voice cloning via prompt_token + prompt_feat by @ivan-digital in #247
- docs: link READMEs to use-case hubs on soniqo.audio by @ivan-digital in #250
- Add native VoxCPM2 TTS backend by @DrMaks22 in #249
- docs: link YouTube overview video from every README by @ivan-digital in #251
- tests: auto-download or bundle resources so 47 E2E tests stop skipping by @ivan-digital in #252
- fix: weightsExist recognises CoreML bundle layouts (.mlmodelc, .mlpac… by @johnsacco in #253
- SourceSeparation: MLX.compile fused LSTM step by @ivan-digital in #258
- SourceSeparation: end-to-end MLX (gate fusion + compile + Wiener + iSTFT) by @ivan-digital in #257
- docs: add Product Hunt badge to README and translations by @ivan-digital in #259
- MAGNeT text-to-music (Meta MAGNeT Small/Medium, MLX INT4/INT8) by @ivan-digital in #260
- FlashSR: add audio super-resolution module (MLX INT4/INT8) by @ivan-digital in #261
- MagpieTTS: add 9-language TTS (NVIDIA Magpie 357M, MLX INT4/INT8) by @ivan-digital in #263
- deps: pin swift-websocket to 1.5.x to unblock release CI by @ivan-digital in #266
New Contributors
Full Changelog: v0.0.15...v0.0.16
v0.0.15
Maintenance release
Wraps up the audio → speech rename from v0.0.14 and wires up the Homebrew tap publishing pipeline.
What's Changed
- docs(readme): fix Homebrew install command across all translations by @ivan-digital in #243 — the previous snippet pointed at a tap repo that didn't exist. The canonical install is now
brew install soniqo/tap/speech, against the live tap at https://github.com/soniqo/homebrew-tap. - ci(release): publish Homebrew formula bumps to soniqo/homebrew-tap by @ivan-digital in #244 — release tarballs now auto-update the tap via a short-lived installation token minted from the
soniqo-release-botGitHub App.brew update && brew upgrade speechpicks up new versions automatically with no manual formula edit.
Notes
No user-facing CLI / API changes. Same binaries, same flags, same tarball layout as v0.0.14. The release tarball is still speech-macos-arm64.tar.gz.
Full Changelog: v0.0.14...v0.0.15
v0.0.14
Highlights
🎉 Binary renamed: audio → speech (#242)
The CLI binary audio is renamed to speech to make room for an eventual brew install speech (the old name is too generic for homebrew-core). The old name still works as a deprecated alias that prints a one-line stderr warning — no scripts break. The release tarball is now speech-macos-arm64.tar.gz. Aliases will be removed in a future major release.
🚀 VibeVoice 1.5B cold-start ~270× faster (#240)
The autoregressive LM step is now shapeless-compiled, eliminating per-token recompilation overhead on first synthesis.
What's Changed
- Rename binary
audio→speech(with backward-compat alias) by @ivan-digital in #242 - fix(vibevoice 0.5b): fail fast when minting voice cache from encoder-less checkpoint by @ivan-digital in #241
- perf(vibevoice 1.5b): shapeless-compile the autoregressive LM step by @ivan-digital in #240
- fix(vibevoice): CLI routes 1.5B model id correctly + downloader honors offlineMode by @ivan-digital in #239
Notes
- Release asset is now
speech-macos-arm64.tar.gz(wasaudio-macos-arm64.tar.gz). Anyone with a hardcoded download URL needs to update —brew install soniqo/tap/speechusers are unaffected because the formula auto-bumps.
Full Changelog: v0.0.13...v0.0.14
v0.0.13
What's Changed
- audio align: forward --language to forced aligner by @ivan-digital in #224
- audio align: lift implicit 2-min ASR cap by @ivan-digital in #225
- audio align: preserve punctuation on aligned words by @ivan-digital in #226
- KokoroTTS: use .process for Resources so iOS bundle layout is flat by @HiroProt in #228
- Qwen3ASR: double-buffer asyncEval greedy decode by @hhh2210 in #230
- test+docs(Qwen3ASR): greedy determinism snapshot + asyncEval doc note by @ivan-digital in #231
- fix(ParakeetASR): make iOS-5s model work in iOS Simulator by @ivan-digital in #232
- Trim Kokoro trailing artifacts; pin iOSEchoDemo speaker route by @ivan-digital in #235
- Chunk long audio in audio align so trailing words don't pile up by @ivan-digital in #237
New Contributors
Full Changelog: v0.0.12...v0.0.13
v0.0.12
New
-
MADLAD-400 translation, on-device, 400+ languages (#222). New
MADLADTranslationmodule — T5 v1.1 encoder-decoder via MLX, INT4 / INT8 quantized, Apache 2.0. Greedy decode by default; temperature / top-k / top-p sampling available. Encode source once, reuse cross-attention KV cache across decode steps. Streaming decoder yields suffix diffs against the accumulated decoded text so SentencePiece word boundaries materialize correctly.import MADLADTranslation let translator = try await MADLADTranslator.fromPretrained() let es = try translator.translate("Hello, how are you?", to: "es") // → "Hola, ¿cómo estás?"
CLI:
audio translate "..." --to eswith--quantization int4|int8,--json,--stream, and stdin pipe support —audio transcribe x.wav | audio translate --to es.Default repo:
aufklarer/MADLAD400-3B-MT-MLX. -
Discord — join our server for questions, support, model requests, and updates. Linked from every README and the landing page.
Upgrading
brew upgrade soniqo/tap/speech
v0.0.11
Fixes
- Kokoro TTS: all 54 voice presets now usable out of the box (#212).
fromPretrainedwas only downloading the defaultaf_heartvoice, so--list-voicesreported a single entry and any other voice (jf_alpha,ff_siwis,zf_xiaobei, …) failed withvoiceNotFound. The loader now pulls the full voice catalog on first run. - Kokoro TTS: French / Portuguese / Hindi no longer crash on Homebrew installs (#212). The SwiftPM resource bundle that carries the pronunciation dictionaries was missing from the release tarball, so
--language fr|pt|hihit a fatalBundle.moduleerror. The bundle is now shipped alongside the binaries and installed intolibexecby the formula.
Upgrading
brew upgrade soniqo/tap/speech