Releases: second-state/cohere_transcribe_rs
Releases · second-state/cohere_transcribe_rs
v0.1.1
MLX backend optimizations
- GPU-side argmax: Decoder now computes argmax on GPU via
mlx_argmax_axis, transferring a single i32 per step instead of the full 16,384-element logits vector (64 KB → 4 bytes per token) - GPU-native batch norm: Replaced CPU round-trip
sqrt/recipwithmlx_rsqrt, eliminating 96 GPU→CPU→GPU transfers per encoder forward pass (48 layers × 2 ops) - O(1) weight cloning:
shallow_clone()usesmlx_array_setref-counted sharing instead of full CPU round-trip, reducing encoder construction time from ~75s to near-instant
Other changes
- README updated with clearer title and streamlined intro
- Added key learnings 13–19 to CLAUDE.md
v0.1.0
Cohere Transcribe RS v0.1.0
First release — pure-Rust CLI and OpenAI-compatible API server for the CohereLabs/cohere-transcribe-03-2026 speech recognition model.
Features
- Two binaries:
transcribe(CLI) andtranscribe-server(HTTP API) - OpenAI Whisper API compatible — drop-in replacement, works with any OpenAI client
- No Python or PyTorch at runtime — fully self-contained binaries
- 14 languages: English, French, German, Spanish, Italian, Portuguese, Dutch, Polish, Greek, Arabic, Japanese, Chinese, Vietnamese, Korean
- Multiple audio formats: WAV, FLAC, MP3, AAC, OGG (via symphonia)
- Long audio support: automatic chunking with overlap for files > 35s
Platforms
| Asset | Platform | Backend |
|---|---|---|
transcribe-linux-x86_64.zip |
Linux x86_64 (CPU) | libtorch |
transcribe-linux-x86_64-cuda.zip |
Linux x86_64 (CUDA 12.6) | libtorch |
transcribe-linux-aarch64.zip |
Linux aarch64 (CPU, SVE) | libtorch |
transcribe-linux-aarch64-cuda.zip |
Linux aarch64 (CUDA 12.6) | libtorch |
transcribe-macos-aarch64.zip |
macOS Apple Silicon | MLX (Metal GPU) |
Each zip contains both binaries, vocab.json, and platform-specific runtime libraries (libtorch/ on Linux, mlx.metallib on macOS). No LD_LIBRARY_PATH needed — RPATH is baked in.
Quick Start
See the README for setup instructions.