Skip to content

Releases: soniqo/speech-swift

v0.0.20

04 Jun 16:30
27cef95

Choose a tag to compare

What's Changed

New

  • Nemotron-3.5 ASR Streaming (multilingual) — 40 language-locales, native punctuation and capitalization, cache-aware FastConformer-RNN-T (600 M) on the Apple Neural Engine. New NemotronStreamingASR target. Default model is the CoreML INT8 bundle (612 MB, RTF ≈ 0.07 on M5 Pro). Language is a BCP-47 tag (en-US, de-DE, ja-JP, ...). #294
  • English-only Nemotron bundle still supported through the same target — the runtime introspects the encoder's input description for language_mask and adapts. Older bundles continue to work without code changes. #295

Performance & quality

  • Qwen3-ASR CoreML encoder rebuild — chunked-attention encoder + bench scaffold for cross-engine ASR comparison (Qwen3-ASR, Parakeet TDT, Nemotron, Whisper). #292
  • Parakeet TDT CoreML — default to a single fixed-shape 30 s export (drop EnumeratedShapes) to fix ANE-compile hangs on iOS. #290

Docs

  • CoreML encoder interface description refreshed; M2 Max baselines retired in favour of current M-series numbers. #293

CI

  • CoreML targets pinned to CPU+GPU compute units in CI to avoid ANE-compile hangs. #286 #287 #289

CLI

brew upgrade soniqo/tap/speech

speech transcribe recording.wav --engine nemotron --language en-US
speech transcribe meeting.wav   --engine nemotron --language de-DE
speech transcribe interview.wav --engine nemotron --language ja-JP

--engine nemotron defaults to the multilingual CoreML INT8 bundle from aufklarer/Nemotron-3.5-ASR-Streaming-0.6B-CoreML-INT8 (downloaded on first use, cached under ~/Library/Caches/qwen3-speech/).

v0.0.19

28 May 12:25
10aef25

Choose a tag to compare

What's Changed

New

  • HTDemucs (Demucs v4) — higher-quality music source separation, +3.01 dB SDR over Open-Unmix (largest gains on bass/drums). speech separate --engine htdemucs. #288
  • OpenAI-compatible transcription/v1/audio/transcriptions server endpoint. #273
  • Qwen3-TTS on CoreML — full Apple Neural Engine routing + chunked decode. #269

Performance & quality

  • 7× faster CoreML ASR — split + batched-prefill decoder, ANE-safe. #282
  • Mastering-grade resampler — drains the filter tail, exact output length, phase-aligned stereo; mastering SRC for music/upsampling, standard for speech. #284
  • Qwen3-TTS bf16 / non-quantized support + ICL stability. #272
  • CosyVoice bf16 bundle support. #280

Fixes

  • Abort wedged Hugging Face downloads instead of hanging. #283
  • Stride-aware, NaN-safe argmax in the CoreML ASR decoder. #278

CLI

speech separate song.wav --engine htdemucs    # Demucs v4 (higher quality)
speech separate song.wav                       # UMX (default)

v0.0.18

25 May 15:32
9d355b5

Choose a tag to compare

What's Changed

  • StableAudio3MusicGen: Stable Audio 3 Medium INT8 as the new default music-gen engine for `speech compose`. Bit-perfect parity with Stability AI's pure-MLX Python reference, ~16× realtime, stereo 44.1 kHz. MAGNeT stays available via `--engine magnet`. #270

CLI

```bash
speech compose "lofi house loop" # SA3 (default)
speech compose "ambient pad" --engine magnet # old engine
speech compose "..." --sa3-variant medium-int4 --seconds 30
```

v0.0.17

24 May 06:01
6af1620

Choose a tag to compare

What's Changed

  • release: ship all .bundle resources (fixes compose/magnet on brew) by @ivan-digital in #267

Full Changelog: v0.0.16...v0.0.17

v0.0.16

24 May 05:39
6d5db08

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.0.15...v0.0.16

v0.0.15

11 May 20:04
10bcb46

Choose a tag to compare

Maintenance release

Wraps up the audiospeech rename from v0.0.14 and wires up the Homebrew tap publishing pipeline.

What's Changed

  • docs(readme): fix Homebrew install command across all translations by @ivan-digital in #243 — the previous snippet pointed at a tap repo that didn't exist. The canonical install is now brew install soniqo/tap/speech, against the live tap at https://github.com/soniqo/homebrew-tap.
  • ci(release): publish Homebrew formula bumps to soniqo/homebrew-tap by @ivan-digital in #244 — release tarballs now auto-update the tap via a short-lived installation token minted from the soniqo-release-bot GitHub App. brew update && brew upgrade speech picks up new versions automatically with no manual formula edit.

Notes

No user-facing CLI / API changes. Same binaries, same flags, same tarball layout as v0.0.14. The release tarball is still speech-macos-arm64.tar.gz.

Full Changelog: v0.0.14...v0.0.15

v0.0.14

11 May 18:26
e85a78b

Choose a tag to compare

Highlights

🎉 Binary renamed: audiospeech (#242)

The CLI binary audio is renamed to speech to make room for an eventual brew install speech (the old name is too generic for homebrew-core). The old name still works as a deprecated alias that prints a one-line stderr warning — no scripts break. The release tarball is now speech-macos-arm64.tar.gz. Aliases will be removed in a future major release.

🚀 VibeVoice 1.5B cold-start ~270× faster (#240)

The autoregressive LM step is now shapeless-compiled, eliminating per-token recompilation overhead on first synthesis.

What's Changed

  • Rename binary audiospeech (with backward-compat alias) by @ivan-digital in #242
  • fix(vibevoice 0.5b): fail fast when minting voice cache from encoder-less checkpoint by @ivan-digital in #241
  • perf(vibevoice 1.5b): shapeless-compile the autoregressive LM step by @ivan-digital in #240
  • fix(vibevoice): CLI routes 1.5B model id correctly + downloader honors offlineMode by @ivan-digital in #239

Notes

  • Release asset is now speech-macos-arm64.tar.gz (was audio-macos-arm64.tar.gz). Anyone with a hardcoded download URL needs to update — brew install soniqo/tap/speech users are unaffected because the formula auto-bumps.

Full Changelog: v0.0.13...v0.0.14

v0.0.13

07 May 20:18
d8eba23

Choose a tag to compare

What's Changed

  • audio align: forward --language to forced aligner by @ivan-digital in #224
  • audio align: lift implicit 2-min ASR cap by @ivan-digital in #225
  • audio align: preserve punctuation on aligned words by @ivan-digital in #226
  • KokoroTTS: use .process for Resources so iOS bundle layout is flat by @HiroProt in #228
  • Qwen3ASR: double-buffer asyncEval greedy decode by @hhh2210 in #230
  • test+docs(Qwen3ASR): greedy determinism snapshot + asyncEval doc note by @ivan-digital in #231
  • fix(ParakeetASR): make iOS-5s model work in iOS Simulator by @ivan-digital in #232
  • Trim Kokoro trailing artifacts; pin iOSEchoDemo speaker route by @ivan-digital in #235
  • Chunk long audio in audio align so trailing words don't pile up by @ivan-digital in #237

New Contributors

Full Changelog: v0.0.12...v0.0.13

v0.0.12

26 Apr 20:47
5117333

Choose a tag to compare

New

  • MADLAD-400 translation, on-device, 400+ languages (#222). New MADLADTranslation module — T5 v1.1 encoder-decoder via MLX, INT4 / INT8 quantized, Apache 2.0. Greedy decode by default; temperature / top-k / top-p sampling available. Encode source once, reuse cross-attention KV cache across decode steps. Streaming decoder yields suffix diffs against the accumulated decoded text so SentencePiece word boundaries materialize correctly.

    import MADLADTranslation
    
    let translator = try await MADLADTranslator.fromPretrained()
    let es = try translator.translate("Hello, how are you?", to: "es")
    // → "Hola, ¿cómo estás?"

    CLI: audio translate "..." --to es with --quantization int4|int8, --json, --stream, and stdin pipe support — audio transcribe x.wav | audio translate --to es.

    Default repo: aufklarer/MADLAD400-3B-MT-MLX.

  • Discordjoin our server for questions, support, model requests, and updates. Linked from every README and the landing page.

Upgrading

brew upgrade soniqo/tap/speech

v0.0.11

19 Apr 06:39
451ec24

Choose a tag to compare

Fixes

  • Kokoro TTS: all 54 voice presets now usable out of the box (#212). fromPretrained was only downloading the default af_heart voice, so --list-voices reported a single entry and any other voice (jf_alpha, ff_siwis, zf_xiaobei, …) failed with voiceNotFound. The loader now pulls the full voice catalog on first run.
  • Kokoro TTS: French / Portuguese / Hindi no longer crash on Homebrew installs (#212). The SwiftPM resource bundle that carries the pronunciation dictionaries was missing from the release tarball, so --language fr|pt|hi hit a fatal Bundle.module error. The bundle is now shipped alongside the binaries and installed into libexec by the formula.

Upgrading

brew upgrade soniqo/tap/speech