feat: single daemon with local OpenAI-compatible STT service#245
Draft
krystophny wants to merge 43 commits intomainfrom
Draft
feat: single daemon with local OpenAI-compatible STT service#245krystophny wants to merge 43 commits intomainfrom
krystophny wants to merge 43 commits intomainfrom
Conversation
Platform support: - CGEventTap-based global hotkey detection (FN/Globe key) - CGEvent text injection with osascript fallback - pbcopy clipboard integration - Native menu bar and system notifications - Hammerspoon integration for advanced users Build and distribution: - Universal binary build script (x86_64 + arm64) - Code signing and notarization scripts - DMG packaging with drag-to-Applications - Homebrew formula - LaunchAgent for auto-start Fixes: - CGEvent modifier flags: prevent Caps Lock causing random capitalization - Metal backend crash: align audio_ctx to multiple of 8 - Whisper hallucination: set no_context=true to prevent phrase repetition Co-authored-by: Christopher Albert <albert@tugraz.at>
- Add native SwiftUI setup app (macos/VoxtypeSetup/) - Setup wizard: permissions, model download, LaunchAgent - Preferences panel for ongoing configuration - Calls voxtype CLI for actual operations - Improve menubar and notification handling - Enhance macOS setup CLI as fallback - Update config and error handling for macOS
- CONFIGURATION.md: Restore CLI backend docs (backend="cli", whisper_cli_path) - TROUBLESHOOTING.md: Restore X11, keyboard layout, and FFI crash sections - INSTALL.md: Restore Fedora ydotool system service notes
- Update version for macOS release candidate - Add Christopher Albert, André Silva, goodroot, Chmouel Boudjnah, Alexander Bosu-Kellett, ayoahha, and Thinh Vu to authors - Add Homebrew formula for macOS installation
Stale build artifacts can cause GPU support to silently fail at runtime even when: - The build succeeds without errors - Binary size and checksum differ from previous builds - Version reports correctly Added warnings about running cargo clean before building with different feature sets, and a new "Functional Verification" section explaining how to verify GPU builds actually detect the GPU at runtime.
The rdev crate (for global hotkeys) and tray-icon crate require X11 and GTK development libraries on Linux. Added: - libx11-dev, libxi-dev, libxtst-dev (X11 input) - libgtk-3-dev, libglib2.0-dev, libappindicator3-dev (system tray) This fixes Docker builds after the macOS merge which added these dependencies via the tray-icon crate.
- Fix model selection to use static model lists (prevents infinite re-render loops) - Add real-time download progress bar that monitors file size on disk - Fix permissions flow to use manual confirmation (Open Settings + Done buttons) - Fix LaunchAgent detection to check for success message - Fix CLI command syntax for model downloads and engine switching - Tighten layouts to prevent button clipping on all screens - Add proper wizard completion state tracking - Fix PreferencesView to avoid @StateObject render loops - Add entitlements for AppleScript automation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
macOS builds now always include Parakeet support via --features parakeet. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Detect quantized model files (.int8.onnx) in addition to standard .onnx - Skip Whisper model check when using Parakeet engine - Show helpful message that Whisper model is not required for Parakeet The setup check was failing to find parakeet-tdt-0.6b-v3-int8 because it only looked for encoder-model.onnx and decoder_joint-model.onnx, not the quantized variants with .int8.onnx extension. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cask approach: - Installs prebuilt DMG to /Applications/Voxtype.app - Creates CLI symlink in $(brew --prefix)/bin - Works around Homebrew sandbox restrictions - Adds caveat for xattr quarantine removal if needed Formula updates: - Creates app bundle in Homebrew prefix during post_install - Symlinks to ~/Applications for permission grants - Adds service support via brew services - Updated caveats with permission grant instructions The Cask is the recommended installation method for most users. The Formula remains available for building from source. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Rebuilt the macOS DMG after the Parakeet model detection fix was applied. The new DMG contains a binary that correctly detects quantized .int8.onnx model files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Install LaunchAgent plist during brew install - Load the agent so daemon starts immediately - Auto-restart daemon if it crashes (KeepAlive) - Unload and remove LaunchAgent on uninstall - Updated caveats to reflect auto-start behavior Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Run xattr -cr on app bundle to remove Gatekeeper quarantine - Simplify caveats now that xattr is automatic - Users no longer need manual steps to bypass "damaged app" error Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove /tmp/voxtype during install to prevent stale lock issues - Clarify first-time setup steps in caveats: 1. Click "Open Anyway" for unsigned app 2. Download model 3. Grant Input Monitoring permission - Emphasize Input Monitoring is required for hotkey Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add note that macOS support is in beta with unsigned binaries - Document Homebrew Cask as primary install method - Add first-time security setup steps (Open Anyway, Input Monitoring) - Update config path to ~/Library/Application Support/voxtype - Document Right Option as default hotkey - Add troubleshooting for common issues - Simplify and focus on current workflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Separate Swift app for menu bar functionality: - Shows mic icon that changes based on daemon state - Dropdown menu with recording controls - Settings submenu (engine, output mode, hotkey mode) - Open setup, restart daemon, view logs actions - Reads state from /tmp/voxtype/state - Communicates with daemon via voxtype CLI This keeps macOS-specific GUI code out of the main Rust binary. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add "Voxtype" header at top of menu - Change "Open Setup..." to "Settings..." that launches VoxtypeSetup app - Search multiple locations for VoxtypeSetup - Change "Quit Menu Bar" to "Quit Voxtype Menu Bar" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Redesigned VoxtypeSetup from a sequential wizard to a proper macOS settings app with sidebar navigation: - General: Engine selection, hotkey config, daemon status - Models: View installed models, download new ones - Output: Output mode, typing delay, auto-submit - Permissions: Check and grant required permissions - Advanced: Open config file, logs, auto-start toggle Removed old wizard files (WelcomeView, SetupWizardView, etc.) in favor of non-sequential, easy-to-navigate settings panels. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Settings app fixes: - Add ConfigManager for section-aware TOML updates (fixes config corruption) - All settings views now use ConfigManager.shared instead of broken regex - Add restart banner when engine/hotkey settings change - Add new settings sections: Hotkey, Audio, Whisper, Remote Whisper, Text Processing, Notifications Menubar app fixes: - Detect daemon via PID file when not running via launchd - Fix VoxtypeCLI binary path detection Notification improvements: - Use terminal-notifier for custom icons (bundled in app) - Remove redundant app icon when engine emoji is shown - Add engine-specific emoji to notification titles Homebrew Cask: - Bundle terminal-notifier in Voxtype.app during install - Require Parakeet support in all macOS builds Documentation: - Add MACOS_ARCHITECTURE.md with component overview - Add MACOS_TROUBLESHOOTING.md with debugging checklist Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Menubar: Compact single-line header with Label (icon + "Voxtype · Status") - Settings: Combined Whisper local/remote into single view with animated transitions - Settings: Use ConfigManager for section-aware config updates (fixes corruption) - Settings: Redesigned Models view with unified list and inline progress - Notifications: Engine-specific icons via terminal-notifier contentImage - Added engine icon assets (parakeet.png, whisper.png) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added macOS/Homebrew card to package manager section - Updated macOS system requirements Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update DMG build script to bundle VoxtypeMenubar and VoxtypeSetup apps - Include engine notification icons (parakeet.png, whisper.png) - Auto-run setup and download model during brew install - Launch menubar app automatically after installation - Update caveats with cleaner instructions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- VoxtypeSetup opens to Permissions tab on first launch - Cask starts daemon and opens Settings for permission granting - Updated caveats to guide user through permission setup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- New toggle in General settings to show/hide menubar icon - Launches VoxtypeMenubar.app when enabled - Quits menubar app when disabled Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update version to 0.6.0-rc.2 - Document all 7 required Linux binaries: avx2, avx512, vulkan, parakeet-avx2, parakeet-avx512, parakeet-cuda, parakeet-rocm - Update build instructions to include parakeet-cuda (Docker/NVIDIA) and parakeet-rocm (local/AMD) builds - Regenerate Cargo.lock
- Fix dtolnay/rust-action -> dtolnay/rust-toolchain (non-existent action) - Add chmod 755 in Dockerfiles to fix permission errors on mounted volumes - Remove redundant chmod from build-linux.yml (Docker handles it now) - Add missing GTK/X11 dev dependencies to test-packages.yml and AVX-512 builds - Add protobuf deps for Parakeet AVX-512 build
Creates /Applications/Voxtype.app bundle and adds to Login Items. This is the recommended way to run voxtype on macOS because: - App bundles can be granted Accessibility, Input Monitoring, and Microphone permissions properly - Login Items inherit these permissions (launchd services don't) - Clean auto-start on login without extra wrapper scripts - Runs both daemon and menu bar icon Usage: voxtype setup app-bundle # Install voxtype setup app-bundle --status # Check status voxtype setup app-bundle --uninstall The launchd option is kept for users who don't need microphone (e.g., using remote transcription), but app-bundle is recommended.
- macos.rs wizard now delegates to app_bundle::create_app_bundle() instead of duplicating app bundle creation with a different bundle ID - macos.rs autostart now uses Login Items (via app_bundle) instead of launchd, which does not receive Microphone permissions - launchd.rs install() warns that it lacks mic permissions on macOS and recommends app-bundle instead - run_setup() "Next steps" are now platform-aware: macOS shows app-bundle/macos wizard instructions, Linux shows compositor/systemd - Default hotkey suggestion in wizard changed from rightalt to fn - Removed install_launchd_with_app_bundle() and duplicate constants
Fix CI workflow issues
Add FN/Function/Globe key support for macOS
Add macOS app bundle setup with Login Items autostart
…release Merge main into feature/macos-release and resolve conflicts
…ion handling - Fix self-copy corruption: skip copy when source == dest, use temp file + atomic rename otherwise to prevent truncation when updating from within the app bundle itself - Fix code signing: sign voxtype-bin individually before signing the bundle so the Mach-O gets proper code page hashes (was SIGKILL'd) - Replace wrapper script with direct binary launch: set voxtype-bin as CFBundleExecutable and add hidden AppLaunch command that starts daemon in background + menubar in foreground. The wrapper script's exec() broke macOS Control Center's XPC scene registration, preventing the menubar icon from appearing - Auto-launch after install: run open Voxtype.app at end of setup - Implement real Accessibility permission check using CGEventTapCreate (not AXIsProcessTrusted which caches per-process) with auto-restart: daemon polls every 2s and restarts itself when permission is granted - Prompt for Accessibility via AXIsProcessTrustedWithOptions on startup - Reset TCC entries only when the binary actually changed to avoid wiping permissions on self-copy reinstalls
…dates State file changes are now picked up via notify (kqueue on macOS) instead of polling every 500ms, eliminating the visible delay between hotkey press and icon update.
…ure/single-daemon-openai-stt-api # Conflicts: # Cargo.lock # src/transcribe/whisper.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/v1/audio/transcriptions,/v1/audio/translations,/healthz)transcribe_with_optionsRemoteTranscriberclient against the new local serviceDesign notes
de,en) when request language is not pinnedTesting
cargo testCloses #244