All notable changes to OmniVoice Studio.
The format is loosely based on Keep a Changelog.
Versions track the desktop app (tauri.conf.json + frontend/src-tauri/Cargo.toml).
The bundled TTS model package (pyproject.toml) is versioned independently.
- Frameless dictation widget. Global dictation upgraded from an in-app FAB to a true OS-level floating widget that hovers over any application. Transparent, decorations-free, always-on-top secondary Tauri window activated by
⌘+⇧+Space. Auto-hides 2.5 s after a successful paste. - Standalone
CaptureWidgetcomponent. RefactoredCaptureButtonintoCaptureWidget, running on an isolated route (/?window=widget). - Social preview image. Added
social-preview.pngfor GitHub SEO.
- README overhaul. Compact 3-column feature grid, reorganized Quickstart (one-command install, Docker, Desktop App tips), updated comparison table, roadmap, and footer CTA.
- Relicensed Studio under Functional Source License (FSL-1.1-ALv2). Free for personal, educational, internal-team, and non-commercial use. Each release converts automatically to Apache License, Version 2.0 on the second anniversary of its publication.
- The bundled
omnivoice/Python TTS model package remains separately licensed under Apache 2.0 by its upstream authors — not relicensed here. - In-app Commercial License page no longer publishes pricing tiers. Pricing is being finalized; the page now invites quote requests and links the FSL terms.
- Single-instance enforcement. Launching a second copy now focuses the existing window instead of starting a second backend that races for port 3900. Powered by
tauri-plugin-single-instance. - Close-to-tray. Clicking the window X (or
Cmd+Won macOS) now hides the window and keeps the backend + tray menu alive. The tray "Quit" item is the only path that fully exits and shuts down the Python backend (cleanup moved toRunEvent::ExitRequested). - Recording-state tray icon. Tray icon flips to a red-dot variant while a dictation recording is active and reverts when it stops or errors out.
- Customizable global dictation hotkey. New Settings → Capture tab. Record any modifier-plus-key combo, save it, and it's persisted in
config.jsonand re-registered on every launch. Failed registrations (combo already taken by the OS) roll back to the previously-working binding instead of leaving the user with no shortcut. - WebSocket-final dictation path. Capture now treats the streaming
finalmessage as the source of truth and skips the duplicate HTTPPOST /transcribethat used to run on every dictation. Audio is transcribed once instead of twice — typical dictation latency roughly halved. New EOF text-frame protocol (server also accepts an empty binary frame as EOF). HTTP POST kept as fallback for WS error / timeout / WS-never-opened. - Chunk queueing during WS handshake. The first 250 ms of audio is no longer dropped from the server's
finaltranscript.MediaRecorderchunks captured while the WebSocket is still inCONNECTINGstate are queued and drained inws.onopen.
- Docker default bind is loopback.
docker-compose.ymlnow publishes127.0.0.1:3900:3900instead of3900:3900— the API is no longer reachable from the LAN out of the box. To expose it deliberately, change the mapping to0.0.0.0:3900:3900. README documents the trade-off and recommends a reverse proxy with auth (Caddybasic_auth, nginx + htpasswd, Tailscale) for any non-loopback exposure. - Donate page trimmed. Removed Patreon and the Bitcoin / Ethereum / Solana cryptocurrency cards. Removed the bundled
qrcode.reactdependency. The "Commercial License" CTA moves from the bottom of the page to the top-right of the page header. - WS dictation hostname now derived from the configured
API_BASEinstead of a hardcodedlocalhost:3900, so deployments behind reverse proxies route correctly. - HTTP POST fallback timeout scales with recording length (
max(15s, recordedMs + 10s)) so long-form dictations don't trip the fallback and run the model twice.
- Backend was killed on every window close even if the user only intended to dismiss the window. Backend shutdown now fires only on real-quit (
RunEvent::ExitRequested), not on the close-to-hide path. - Hotkey rollback.
set_dictation_shortcutpreviously left the user with no global shortcut ifregister(new)failed afterunregister(old)succeeded. The previous binding is now restored on failure. - WebSocket dictation pipeline lost the first audio chunk.
MediaRecorderwas started before the WebSocket finished its handshake, so the first 250 ms chunk — which carries the WebM EBML header — was dropped from the WS stream. Every subsequent server-side ffmpeg conversion then failed withexit status 183("Invalid data found when processing input"), partials never appeared, and the HTTP fallback only fired after the full timeout. The WebSocket is now constructed before the recorder, every chunk is queued throughwsPendingRefuntilws.onopendrains it, and a servererrormessage (or unexpectedoncloseafter the recorder has stopped) fires the HTTP fallback immediately instead of waiting out the timeout. - Microphone access prompt on macOS. Added an
Info.plistwithNSMicrophoneUsageDescription(andNSCameraUsageDescriptionfor forward-compat) so getUserMedia no longer fails silently on macOS 10.14+ TCC. Tauri's bundler auto-merges the file at bundle time. Mic-denial toasts now also include platform-specific recovery hints (Settings paths for macOS/Windows, audio-group check for Linux).
- uv bundled per-platform. Release installers now ship the
uvbinary as a Tauri sidecar (bundle.externalBin). First launch no longer requires network access for the uv-download step — bootstrap uses the bundled binary directly. Adds ~12-15 MB per platform installer; falls back to PATH lookup, then standalone download, when the bundled file isn't present (dev builds, future targets). Pinned atUV_VERSION = "0.11.7"; bump the constant in lib.rs and the matching env var in release.yml together to refresh. - ffmpeg fetch removed from Tauri bootstrap. The redundant download from
eugeneware/ffmpeg-static(saved toapp_data/bin/) was never used by the backend, which already resolves ffmpeg viaimageio_ffmpeg.get_ffmpeg_exe()from the pip wheel pulled byuv sync. Net effect: one fewer first-run network round-trip, one fewer splash-screen stage, and the splash no longer shows the misleading "Downloading ffmpeg…" line. - CI cross-platform check. PRs now run
cargo checkagainst the Tauri shell on macOS (Apple Silicon), Windows, and Linux in parallel — surfaces platform-specific Rust regressions before tag push without paying the full ~15 min/platform tauri-bundle cost (full bundling stays inrelease.ymlon tag push). - Release notes from CHANGELOG.
release.ymlnow extracts the matching## [X.Y.Z]section fromCHANGELOG.mdand uses it as the GitHub Release body, replacing the prior placeholder "Auto-generated release. See commit log for changes." - Tests:
tests/test_capture_ws.py(3 cases) covers the EOF text-frame, empty-binary-frame, and legacy disconnect-finalize paths for/ws/transcribe.
- New Tauri commands:
quit_app,set_tray_recording,get_dictation_shortcut,set_dictation_shortcut. - New Tauri state:
AppFlags { quitting },TrayHandle { tray },DictationShortcutState { current }. - New deps:
tauri-plugin-single-instance2.x,tauri/image-pngfeature flag (enablesImage::from_bytesfor in-memory tray-icon swap).
Region selector, realtime download speed, retry buttons, recheck top-right, HF mirror support, splash bootstrap-log backfill. See git log v0.2.4..v0.2.5 for the full set.
See GitHub Releases for prior versions.