Skip to content

feat: single daemon with local OpenAI-compatible STT service#245

Draft
krystophny wants to merge 43 commits intomainfrom
feature/single-daemon-openai-stt-api
Draft

feat: single daemon with local OpenAI-compatible STT service#245
krystophny wants to merge 43 commits intomainfrom
feature/single-daemon-openai-stt-api

Conversation

@krystophny
Copy link
Collaborator

@krystophny krystophny commented Mar 1, 2026

Summary

  • add in-process HTTP service with OpenAI-compatible endpoints (/v1/audio/transcriptions, /v1/audio/translations, /healthz)
  • run service alongside the daemon hotkey loop in the same process
  • add service config + CLI/env overrides for bind host/port, request timeout, upload limits, and allowed languages
  • add request-level language/prompt overrides in whisper transcriber via transcribe_with_options
  • add end-to-end tests that use existing RemoteTranscriber client against the new local service

Design notes

  • no auth in voxtype service (loopback-first)
  • service defaults to constrained language set (de, en) when request language is not pinned
  • WAV decode/downmix/resample to 16k mono is handled server-side

Testing

  • cargo test

Closes #244

peteonrails and others added 30 commits January 31, 2026 13:08
Platform support:
- CGEventTap-based global hotkey detection (FN/Globe key)
- CGEvent text injection with osascript fallback
- pbcopy clipboard integration
- Native menu bar and system notifications
- Hammerspoon integration for advanced users

Build and distribution:
- Universal binary build script (x86_64 + arm64)
- Code signing and notarization scripts
- DMG packaging with drag-to-Applications
- Homebrew formula
- LaunchAgent for auto-start

Fixes:
- CGEvent modifier flags: prevent Caps Lock causing random capitalization
- Metal backend crash: align audio_ctx to multiple of 8
- Whisper hallucination: set no_context=true to prevent phrase repetition

Co-authored-by: Christopher Albert <albert@tugraz.at>
- Add native SwiftUI setup app (macos/VoxtypeSetup/)
  - Setup wizard: permissions, model download, LaunchAgent
  - Preferences panel for ongoing configuration
  - Calls voxtype CLI for actual operations
- Improve menubar and notification handling
- Enhance macOS setup CLI as fallback
- Update config and error handling for macOS
- CONFIGURATION.md: Restore CLI backend docs (backend="cli", whisper_cli_path)
- TROUBLESHOOTING.md: Restore X11, keyboard layout, and FFI crash sections
- INSTALL.md: Restore Fedora ydotool system service notes
- Update version for macOS release candidate
- Add Christopher Albert, André Silva, goodroot, Chmouel Boudjnah,
  Alexander Bosu-Kellett, ayoahha, and Thinh Vu to authors
- Add Homebrew formula for macOS installation
Stale build artifacts can cause GPU support to silently fail at runtime
even when:
- The build succeeds without errors
- Binary size and checksum differ from previous builds
- Version reports correctly

Added warnings about running cargo clean before building with different
feature sets, and a new "Functional Verification" section explaining how
to verify GPU builds actually detect the GPU at runtime.
The rdev crate (for global hotkeys) and tray-icon crate require X11
and GTK development libraries on Linux. Added:
- libx11-dev, libxi-dev, libxtst-dev (X11 input)
- libgtk-3-dev, libglib2.0-dev, libappindicator3-dev (system tray)

This fixes Docker builds after the macOS merge which added these
dependencies via the tray-icon crate.
- Fix model selection to use static model lists (prevents infinite re-render loops)
- Add real-time download progress bar that monitors file size on disk
- Fix permissions flow to use manual confirmation (Open Settings + Done buttons)
- Fix LaunchAgent detection to check for success message
- Fix CLI command syntax for model downloads and engine switching
- Tighten layouts to prevent button clipping on all screens
- Add proper wizard completion state tracking
- Fix PreferencesView to avoid @StateObject render loops
- Add entitlements for AppleScript automation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
macOS builds now always include Parakeet support via --features parakeet.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Detect quantized model files (.int8.onnx) in addition to standard .onnx
- Skip Whisper model check when using Parakeet engine
- Show helpful message that Whisper model is not required for Parakeet

The setup check was failing to find parakeet-tdt-0.6b-v3-int8 because it
only looked for encoder-model.onnx and decoder_joint-model.onnx, not
the quantized variants with .int8.onnx extension.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cask approach:
- Installs prebuilt DMG to /Applications/Voxtype.app
- Creates CLI symlink in $(brew --prefix)/bin
- Works around Homebrew sandbox restrictions
- Adds caveat for xattr quarantine removal if needed

Formula updates:
- Creates app bundle in Homebrew prefix during post_install
- Symlinks to ~/Applications for permission grants
- Adds service support via brew services
- Updated caveats with permission grant instructions

The Cask is the recommended installation method for most users.
The Formula remains available for building from source.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Rebuilt the macOS DMG after the Parakeet model detection fix was
applied. The new DMG contains a binary that correctly detects
quantized .int8.onnx model files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Install LaunchAgent plist during brew install
- Load the agent so daemon starts immediately
- Auto-restart daemon if it crashes (KeepAlive)
- Unload and remove LaunchAgent on uninstall
- Updated caveats to reflect auto-start behavior

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Run xattr -cr on app bundle to remove Gatekeeper quarantine
- Simplify caveats now that xattr is automatic
- Users no longer need manual steps to bypass "damaged app" error

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove /tmp/voxtype during install to prevent stale lock issues
- Clarify first-time setup steps in caveats:
  1. Click "Open Anyway" for unsigned app
  2. Download model
  3. Grant Input Monitoring permission
- Emphasize Input Monitoring is required for hotkey

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add note that macOS support is in beta with unsigned binaries
- Document Homebrew Cask as primary install method
- Add first-time security setup steps (Open Anyway, Input Monitoring)
- Update config path to ~/Library/Application Support/voxtype
- Document Right Option as default hotkey
- Add troubleshooting for common issues
- Simplify and focus on current workflow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Separate Swift app for menu bar functionality:
- Shows mic icon that changes based on daemon state
- Dropdown menu with recording controls
- Settings submenu (engine, output mode, hotkey mode)
- Open setup, restart daemon, view logs actions
- Reads state from /tmp/voxtype/state
- Communicates with daemon via voxtype CLI

This keeps macOS-specific GUI code out of the main Rust binary.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add "Voxtype" header at top of menu
- Change "Open Setup..." to "Settings..." that launches VoxtypeSetup app
- Search multiple locations for VoxtypeSetup
- Change "Quit Menu Bar" to "Quit Voxtype Menu Bar"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Redesigned VoxtypeSetup from a sequential wizard to a proper
macOS settings app with sidebar navigation:

- General: Engine selection, hotkey config, daemon status
- Models: View installed models, download new ones
- Output: Output mode, typing delay, auto-submit
- Permissions: Check and grant required permissions
- Advanced: Open config file, logs, auto-start toggle

Removed old wizard files (WelcomeView, SetupWizardView, etc.)
in favor of non-sequential, easy-to-navigate settings panels.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Settings app fixes:
- Add ConfigManager for section-aware TOML updates (fixes config corruption)
- All settings views now use ConfigManager.shared instead of broken regex
- Add restart banner when engine/hotkey settings change
- Add new settings sections: Hotkey, Audio, Whisper, Remote Whisper,
  Text Processing, Notifications

Menubar app fixes:
- Detect daemon via PID file when not running via launchd
- Fix VoxtypeCLI binary path detection

Notification improvements:
- Use terminal-notifier for custom icons (bundled in app)
- Remove redundant app icon when engine emoji is shown
- Add engine-specific emoji to notification titles

Homebrew Cask:
- Bundle terminal-notifier in Voxtype.app during install
- Require Parakeet support in all macOS builds

Documentation:
- Add MACOS_ARCHITECTURE.md with component overview
- Add MACOS_TROUBLESHOOTING.md with debugging checklist

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Menubar: Compact single-line header with Label (icon + "Voxtype · Status")
- Settings: Combined Whisper local/remote into single view with animated transitions
- Settings: Use ConfigManager for section-aware config updates (fixes corruption)
- Settings: Redesigned Models view with unified list and inline progress
- Notifications: Engine-specific icons via terminal-notifier contentImage
- Added engine icon assets (parakeet.png, whisper.png)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added macOS/Homebrew card to package manager section
- Updated macOS system requirements

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update DMG build script to bundle VoxtypeMenubar and VoxtypeSetup apps
- Include engine notification icons (parakeet.png, whisper.png)
- Auto-run setup and download model during brew install
- Launch menubar app automatically after installation
- Update caveats with cleaner instructions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- VoxtypeSetup opens to Permissions tab on first launch
- Cask starts daemon and opens Settings for permission granting
- Updated caveats to guide user through permission setup

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- New toggle in General settings to show/hide menubar icon
- Launches VoxtypeMenubar.app when enabled
- Quits menubar app when disabled

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update version to 0.6.0-rc.2
- Document all 7 required Linux binaries: avx2, avx512, vulkan,
  parakeet-avx2, parakeet-avx512, parakeet-cuda, parakeet-rocm
- Update build instructions to include parakeet-cuda (Docker/NVIDIA)
  and parakeet-rocm (local/AMD) builds
- Regenerate Cargo.lock
- Fix dtolnay/rust-action -> dtolnay/rust-toolchain (non-existent action)
- Add chmod 755 in Dockerfiles to fix permission errors on mounted volumes
- Remove redundant chmod from build-linux.yml (Docker handles it now)
- Add missing GTK/X11 dev dependencies to test-packages.yml and AVX-512 builds
- Add protobuf deps for Parakeet AVX-512 build
Creates /Applications/Voxtype.app bundle and adds to Login Items.
This is the recommended way to run voxtype on macOS because:

- App bundles can be granted Accessibility, Input Monitoring, and
  Microphone permissions properly
- Login Items inherit these permissions (launchd services don't)
- Clean auto-start on login without extra wrapper scripts
- Runs both daemon and menu bar icon

Usage:
  voxtype setup app-bundle           # Install
  voxtype setup app-bundle --status  # Check status
  voxtype setup app-bundle --uninstall

The launchd option is kept for users who don't need microphone
(e.g., using remote transcription), but app-bundle is recommended.
krystophny and others added 13 commits February 5, 2026 09:07
- macos.rs wizard now delegates to app_bundle::create_app_bundle()
  instead of duplicating app bundle creation with a different bundle ID
- macos.rs autostart now uses Login Items (via app_bundle) instead of
  launchd, which does not receive Microphone permissions
- launchd.rs install() warns that it lacks mic permissions on macOS
  and recommends app-bundle instead
- run_setup() "Next steps" are now platform-aware: macOS shows
  app-bundle/macos wizard instructions, Linux shows compositor/systemd
- Default hotkey suggestion in wizard changed from rightalt to fn
- Removed install_launchd_with_app_bundle() and duplicate constants
Add FN/Function/Globe key support for macOS
Add macOS app bundle setup with Login Items autostart
…release

Merge main into feature/macos-release and resolve conflicts
…ion handling

- Fix self-copy corruption: skip copy when source == dest, use temp file
  + atomic rename otherwise to prevent truncation when updating from
  within the app bundle itself
- Fix code signing: sign voxtype-bin individually before signing the
  bundle so the Mach-O gets proper code page hashes (was SIGKILL'd)
- Replace wrapper script with direct binary launch: set voxtype-bin as
  CFBundleExecutable and add hidden AppLaunch command that starts daemon
  in background + menubar in foreground. The wrapper script's exec()
  broke macOS Control Center's XPC scene registration, preventing the
  menubar icon from appearing
- Auto-launch after install: run open Voxtype.app at end of setup
- Implement real Accessibility permission check using CGEventTapCreate
  (not AXIsProcessTrusted which caches per-process) with auto-restart:
  daemon polls every 2s and restarts itself when permission is granted
- Prompt for Accessibility via AXIsProcessTrustedWithOptions on startup
- Reset TCC entries only when the binary actually changed to avoid
  wiping permissions on self-copy reinstalls
…dates

State file changes are now picked up via notify (kqueue on macOS)
instead of polling every 500ms, eliminating the visible delay between
hotkey press and icon update.
…ure/single-daemon-openai-stt-api

# Conflicts:
#	Cargo.lock
#	src/transcribe/whisper.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: single daemon for hotkey dictation + OpenAI-compatible local STT API

2 participants