Releases · soniqo/speech-swift

04 Jun 16:30

ivan-digital

v0.0.20

27cef95

v0.0.20 Latest

Latest

What's Changed

New

Nemotron-3.5 ASR Streaming (multilingual) — 40 language-locales, native punctuation and capitalization, cache-aware FastConformer-RNN-T (600 M) on the Apple Neural Engine. New NemotronStreamingASR target. Default model is the CoreML INT8 bundle (612 MB, RTF ≈ 0.07 on M5 Pro). Language is a BCP-47 tag (en-US, de-DE, ja-JP, ...). #294
English-only Nemotron bundle still supported through the same target — the runtime introspects the encoder's input description for language_mask and adapts. Older bundles continue to work without code changes. #295

Performance & quality

Qwen3-ASR CoreML encoder rebuild — chunked-attention encoder + bench scaffold for cross-engine ASR comparison (Qwen3-ASR, Parakeet TDT, Nemotron, Whisper). #292
Parakeet TDT CoreML — default to a single fixed-shape 30 s export (drop EnumeratedShapes) to fix ANE-compile hangs on iOS. #290

Docs

CoreML encoder interface description refreshed; M2 Max baselines retired in favour of current M-series numbers. #293

CI

CoreML targets pinned to CPU+GPU compute units in CI to avoid ANE-compile hangs. #286 #287 #289

CLI

brew upgrade soniqo/tap/speech

speech transcribe recording.wav --engine nemotron --language en-US
speech transcribe meeting.wav   --engine nemotron --language de-DE
speech transcribe interview.wav --engine nemotron --language ja-JP

--engine nemotron defaults to the multilingual CoreML INT8 bundle from aufklarer/Nemotron-3.5-ASR-Streaming-0.6B-CoreML-INT8 (downloaded on first use, cached under ~/Library/Caches/qwen3-speech/).

Assets 3

28 May 12:25

ivan-digital

v0.0.19

10aef25

v0.0.19

What's Changed

New

HTDemucs (Demucs v4) — higher-quality music source separation, +3.01 dB SDR over Open-Unmix (largest gains on bass/drums). speech separate --engine htdemucs. #288
OpenAI-compatible transcription — /v1/audio/transcriptions server endpoint. #273
Qwen3-TTS on CoreML — full Apple Neural Engine routing + chunked decode. #269

Performance & quality

7× faster CoreML ASR — split + batched-prefill decoder, ANE-safe. #282
Mastering-grade resampler — drains the filter tail, exact output length, phase-aligned stereo; mastering SRC for music/upsampling, standard for speech. #284
Qwen3-TTS bf16 / non-quantized support + ICL stability. #272
CosyVoice bf16 bundle support. #280

Fixes

Abort wedged Hugging Face downloads instead of hanging. #283
Stride-aware, NaN-safe argmax in the CoreML ASR decoder. #278

CLI

speech separate song.wav --engine htdemucs    # Demucs v4 (higher quality)
speech separate song.wav                       # UMX (default)

Assets 3

25 May 15:32

ivan-digital

v0.0.18

9d355b5

v0.0.18

What's Changed

StableAudio3MusicGen: Stable Audio 3 Medium INT8 as the new default music-gen engine for `speech compose`. Bit-perfect parity with Stability AI's pure-MLX Python reference, ~16× realtime, stereo 44.1 kHz. MAGNeT stays available via `--engine magnet`. #270

CLI

```bash
speech compose "lofi house loop" # SA3 (default)
speech compose "ambient pad" --engine magnet # old engine
speech compose "..." --sa3-variant medium-int4 --seconds 30
```

Assets 3

24 May 06:01

ivan-digital

v0.0.17

6af1620

v0.0.17

What's Changed

release: ship all .bundle resources (fixes compose/magnet on brew) by @ivan-digital in #267

Full Changelog: v0.0.16...v0.0.17

Contributors

ivan-digital

Assets 3

24 May 05:39

ivan-digital

v0.0.16

6d5db08

v0.0.16

What's Changed

ci(tests): drop stale homebrew-lint job by @ivan-digital in #246
feat(speak/cosyvoice): --seed for reproducible synthesis by @ivan-digital in #245
cosyvoice: native zero-shot voice cloning via prompt_token + prompt_feat by @ivan-digital in #247
docs: link READMEs to use-case hubs on soniqo.audio by @ivan-digital in #250
Add native VoxCPM2 TTS backend by @DrMaks22 in #249
docs: link YouTube overview video from every README by @ivan-digital in #251
tests: auto-download or bundle resources so 47 E2E tests stop skipping by @ivan-digital in #252
fix: weightsExist recognises CoreML bundle layouts (.mlmodelc, .mlpac… by @johnsacco in #253
SourceSeparation: MLX.compile fused LSTM step by @ivan-digital in #258
SourceSeparation: end-to-end MLX (gate fusion + compile + Wiener + iSTFT) by @ivan-digital in #257
docs: add Product Hunt badge to README and translations by @ivan-digital in #259
MAGNeT text-to-music (Meta MAGNeT Small/Medium, MLX INT4/INT8) by @ivan-digital in #260
FlashSR: add audio super-resolution module (MLX INT4/INT8) by @ivan-digital in #261
MagpieTTS: add 9-language TTS (NVIDIA Magpie 357M, MLX INT4/INT8) by @ivan-digital in #263
deps: pin swift-websocket to 1.5.x to unblock release CI by @ivan-digital in #266

New Contributors

@DrMaks22 made their first contribution in #249

Full Changelog: v0.0.15...v0.0.16

Contributors

johnsacco, ivan-digital, and DrMaks22

Assets 3

11 May 20:04

ivan-digital

v0.0.15

10bcb46

v0.0.15

Maintenance release

Wraps up the audio → speech rename from v0.0.14 and wires up the Homebrew tap publishing pipeline.

What's Changed

docs(readme): fix Homebrew install command across all translations by @ivan-digital in #243 — the previous snippet pointed at a tap repo that didn't exist. The canonical install is now brew install soniqo/tap/speech, against the live tap at https://github.com/soniqo/homebrew-tap.
ci(release): publish Homebrew formula bumps to soniqo/homebrew-tap by @ivan-digital in #244 — release tarballs now auto-update the tap via a short-lived installation token minted from the soniqo-release-bot GitHub App. brew update && brew upgrade speech picks up new versions automatically with no manual formula edit.

Notes

No user-facing CLI / API changes. Same binaries, same flags, same tarball layout as v0.0.14. The release tarball is still speech-macos-arm64.tar.gz.

Full Changelog: v0.0.14...v0.0.15

Contributors

ivan-digital

Assets 3

11 May 18:26

ivan-digital

v0.0.14

e85a78b

v0.0.14

Highlights

🎉 Binary renamed: audio → speech (#242)

The CLI binary audio is renamed to speech to make room for an eventual brew install speech (the old name is too generic for homebrew-core). The old name still works as a deprecated alias that prints a one-line stderr warning — no scripts break. The release tarball is now speech-macos-arm64.tar.gz. Aliases will be removed in a future major release.

🚀 VibeVoice 1.5B cold-start ~270× faster (#240)

The autoregressive LM step is now shapeless-compiled, eliminating per-token recompilation overhead on first synthesis.

What's Changed

Rename binary audio → speech (with backward-compat alias) by @ivan-digital in #242
fix(vibevoice 0.5b): fail fast when minting voice cache from encoder-less checkpoint by @ivan-digital in #241
perf(vibevoice 1.5b): shapeless-compile the autoregressive LM step by @ivan-digital in #240
fix(vibevoice): CLI routes 1.5B model id correctly + downloader honors offlineMode by @ivan-digital in #239

Notes

Release asset is now speech-macos-arm64.tar.gz (was audio-macos-arm64.tar.gz). Anyone with a hardcoded download URL needs to update — brew install soniqo/tap/speech users are unaffected because the formula auto-bumps.

Full Changelog: v0.0.13...v0.0.14

Contributors

ivan-digital

Assets 3

07 May 20:18

ivan-digital

v0.0.13

d8eba23

v0.0.13

What's Changed

audio align: forward --language to forced aligner by @ivan-digital in #224
audio align: lift implicit 2-min ASR cap by @ivan-digital in #225
audio align: preserve punctuation on aligned words by @ivan-digital in #226
KokoroTTS: use .process for Resources so iOS bundle layout is flat by @HiroProt in #228
Qwen3ASR: double-buffer asyncEval greedy decode by @hhh2210 in #230
test+docs(Qwen3ASR): greedy determinism snapshot + asyncEval doc note by @ivan-digital in #231
fix(ParakeetASR): make iOS-5s model work in iOS Simulator by @ivan-digital in #232
Trim Kokoro trailing artifacts; pin iOSEchoDemo speaker route by @ivan-digital in #235
Chunk long audio in audio align so trailing words don't pile up by @ivan-digital in #237

New Contributors

@HiroProt made their first contribution in #228
@hhh2210 made their first contribution in #230

Full Changelog: v0.0.12...v0.0.13

Contributors

HiroProt, ivan-digital, and hhh2210

Assets 3

26 Apr 20:47

ivan-digital

v0.0.12

5117333

v0.0.12

New

MADLAD-400 translation, on-device, 400+ languages (#222). New MADLADTranslation module — T5 v1.1 encoder-decoder via MLX, INT4 / INT8 quantized, Apache 2.0. Greedy decode by default; temperature / top-k / top-p sampling available. Encode source once, reuse cross-attention KV cache across decode steps. Streaming decoder yields suffix diffs against the accumulated decoded text so SentencePiece word boundaries materialize correctly.
```
import MADLADTranslation

let translator = try await MADLADTranslator.fromPretrained()
let es = try translator.translate("Hello, how are you?", to: "es")
// → "Hola, ¿cómo estás?"
```
CLI: audio translate "..." --to es with --quantization int4|int8, --json, --stream, and stdin pipe support — audio transcribe x.wav | audio translate --to es.

Default repo: aufklarer/MADLAD400-3B-MT-MLX.
Discord — join our server for questions, support, model requests, and updates. Linked from every README and the landing page.

Upgrading

brew upgrade soniqo/tap/speech

Assets 3

19 Apr 06:39

ivan-digital

v0.0.11

451ec24

v0.0.11

Fixes

Kokoro TTS: all 54 voice presets now usable out of the box (#212). fromPretrained was only downloading the default af_heart voice, so --list-voices reported a single entry and any other voice (jf_alpha, ff_siwis, zf_xiaobei, …) failed with voiceNotFound. The loader now pulls the full voice catalog on first run.
Kokoro TTS: French / Portuguese / Hindi no longer crash on Homebrew installs (#212). The SwiftPM resource bundle that carries the pronunciation dictionaries was missing from the release tarball, so --language fr|pt|hi hit a fatal Bundle.module error. The bundle is now shipped alongside the binaries and installed into libexec by the formula.

Upgrading

brew upgrade soniqo/tap/speech

Assets 3

Releases: soniqo/speech-swift

v0.0.20

What's Changed

New

Performance & quality

Docs

CI

CLI

Uh oh!

v0.0.19

What's Changed

New

Performance & quality

Fixes

CLI

Uh oh!

v0.0.18

What's Changed

CLI

Uh oh!

v0.0.17

What's Changed

Contributors

Uh oh!

v0.0.16

What's Changed

New Contributors

Contributors

Uh oh!

v0.0.15

Maintenance release

What's Changed

Notes

Contributors

Uh oh!

v0.0.14

Highlights

What's Changed

Notes

Contributors

Uh oh!

v0.0.13

What's Changed

New Contributors

Contributors

Uh oh!

v0.0.12

New

Upgrading

Uh oh!

v0.0.11

Fixes

Upgrading

Uh oh!