Super Voice Assistant

macOS voice assistant with global hotkeys - transcribe speech to text with offline models (WhisperKit or Parakeet) or cloud-based Gemini API, capture and transcribe screen recordings with visual context, and read selected text aloud with Gemini Live. Fast, accurate, and simple.

Demo

Parakeet transcription (fast and accurate):

parakeet.mov

Instant text-to-speech:

tts_compressed.mp4

Visual disambiguation for names:

visual-disambiguation-demo.mov

Features

Voice-to-Text Transcription

Press Command+Option+Z for local offline transcription (WhisperKit or Parakeet)
Press Command+Option+X for cloud transcription with Gemini API
Choose your engine in Settings: WhisperKit models or Parakeet (faster, more accurate)
Automatic text pasting at cursor position
Transcription history with Command+Option+A

Streaming Text-to-Speech

Press Command+Option+S to read selected text aloud using Gemini Live API
Press Command+Option+S again while reading to cancel the operation
Sequential streaming for smooth, natural speech with minimal latency
Smart sentence splitting for optimal speech flow

Screen Recording & Video Transcription

Press Command+Option+C to start/stop screen recording
Automatic video transcription using Gemini 2.5 Flash API with visual context
Better accuracy for programming terms, code, technical jargon, and ambiguous words
Transcribed text automatically pastes at cursor position

Requirements

macOS 14.0 or later
Xcode 15+ or Xcode Command Line Tools (for Swift 5.9+)
Gemini API key (for text-to-speech and video transcription)
ffmpeg (for screen recording functionality)

System Permissions Setup

This app requires specific system permissions to function properly:

1. Microphone Access

The app will automatically request microphone permission on first launch. If denied, grant it manually:

Go to System Settings > Privacy & Security > Microphone
Enable access for Super Voice Assistant

2. Accessibility Access (Required for Global Hotkeys & Auto-Paste)

You must manually grant accessibility permissions for the app to:

Monitor global keyboard shortcuts (Command+Option+Z/S/X/A/V/C, Escape)
Automatically paste transcribed text at cursor position

To enable:

Go to System Settings > Privacy & Security > Accessibility
Click the lock icon to make changes (enter your password)
Click the + button to add an application
Navigate to the app location:
- If running via swift run: Add Terminal or your terminal app (iTerm2, etc.)
- If running the built binary directly: Add the SuperVoiceAssistant executable
Ensure the checkbox next to the app is checked

Important: Without accessibility access, the app cannot detect global hotkeys (Command+Option+Z/X/A/S/C/V, Escape) or paste text automatically.

3. Screen Recording Access (Required for Video Transcription)

The app requires screen recording permission to capture screen content:

Go to System Settings > Privacy & Security > Screen Recording
Enable access for Terminal (if running via swift run) or SuperVoiceAssistant

Installation & Running

# Clone the repository
git clone https://github.com/yourusername/super-voice-assistant.git
cd super-voice-assistant

# Install ffmpeg (required for screen recording)
brew install ffmpeg

# Set up environment (for TTS and video transcription)
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY

# Build the app
swift build

# Run the main app
swift run SuperVoiceAssistant

The app will appear in your menu bar as a waveform icon.

Configuration

Text Replacements

You can configure automatic text replacements for transcriptions by editing config.json in the project root:

{
  "textReplacements": {
    "Cloud Code": "Claude Code",
    "cloud code": "claude code",
    "cloud.md": "CLAUDE.md"
  }
}

This is useful for correcting common speech-to-text misrecognitions, especially for proper nouns, brand names, or technical terms. Replacements are case-sensitive and applied to all transcriptions.

Usage

Voice-to-Text Transcription

Local (Cmd+Option+Z):

Launch the app - it appears in the menu bar
Open Settings to select and download a model (Parakeet or WhisperKit)
Press Command+Option+Z to start recording
Press Command+Option+Z again to stop and transcribe
Text automatically pastes at cursor
Press Escape to cancel

Cloud (Cmd+Option+X):

Set GEMINI_API_KEY in your .env file
Press Command+Option+X to start/stop recording
Text automatically pastes at cursor

Transcription engines:

Parakeet v2: ~110x realtime, 1.69% WER, English - recommended for speed
Parakeet v3: ~210x realtime, 1.8% WER, 25 languages
WhisperKit: Various model sizes, good accuracy, more language options
Gemini: Cloud-based, best for complex audio, requires internet

Text-to-Speech

Select any text in any application
Press Command+Option+S to read the selected text aloud
Press Command+Option+S again while reading to cancel the operation
The app uses Gemini Live API for natural, streaming speech synthesis
Configure audio devices via Settings for optimal playback

Screen Recording & Video Transcription

Press Command+Option+C to start screen recording
The menu bar shows "🎥 REC" while recording
Press Command+Option+C again to stop recording
The app automatically transcribes the video using Gemini 2.5 Flash
Visual context improves accuracy for code, technical terms, and homophones
Transcribed text pastes at your cursor position
Video file is automatically deleted after successful transcription

Note: Audio recording and screen recording are mutually exclusive - you cannot run both simultaneously.

When to use video vs audio:

Video: Programming, code review, technical documentation, names, acronyms, specialized terminology
Audio: General speech, quick notes, casual transcription

Keyboard Shortcuts

Command+Option+Z: Start/stop audio recording and transcribe (WhisperKit - offline)
Command+Option+X: Start/stop audio recording and transcribe (Gemini - cloud)
Command+Option+S: Read selected text aloud / Cancel TTS playback
Command+Option+C: Start/stop screen recording and transcribe
Command+Option+A: Show transcription history window
Command+Option+V: Paste last transcription at cursor
Escape: Cancel audio recording (when recording is active)

Available Commands

# Run the main app
swift run SuperVoiceAssistant

# List all available WhisperKit models
swift run ListModels

# Test downloading a model (currently set to distil-whisper_distil-large-v3)
swift run TestDownload

# Validate downloaded models are complete
swift run ValidateModels

# Delete all downloaded models
swift run DeleteModels

# Delete a specific model
swift run DeleteModel <model-name>
# Example: swift run DeleteModel distil-large-v3

# Test transcription with a sample audio file
swift run TestTranscription

# Test live transcription with microphone input
swift run TestLiveTranscription

# Test streaming TTS functionality
swift run TestStreamingTTS

# Test audio collection for TTS
swift run TestAudioCollector

# Test sentence splitting for TTS
swift run TestSentenceSplitter

# Test screen recording (3-second capture)
swift run RecordScreen

# Test video transcription with Gemini API
swift run TranscribeVideo <path-to-video-file>
# Example: swift run TranscribeVideo ~/Desktop/recording.mp4

Project Structure

Sources/ - Main app code
- ModelStateManager.swift - Engine and model selection
- AudioTranscriptionManager.swift - Audio recording and transcription routing
- ScreenRecorder.swift - Screen recording with ffmpeg
SharedSources/ - Shared components
- ParakeetTranscriber.swift - FluidAudio Parakeet wrapper
- GeminiStreamingPlayer.swift - Streaming TTS playback
- GeminiAudioTranscriber.swift - Gemini API transcription
- VideoTranscriber.swift - Gemini API video transcription
tests/ - Test utilities
tools/ - Model management utilities

License

See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
SharedSources		SharedSources
Sources		Sources
logos		logos
scripts		scripts
tests		tests
tools		tools
.env.example		.env.example
.gitignore		.gitignore
AppIcon.icns		AppIcon.icns
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
config.json		config.json
test_audio.wav		test_audio.wav
visual-disambiguation-demo.mov		visual-disambiguation-demo.mov

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Super Voice Assistant

Demo

Features

Requirements

System Permissions Setup

1. Microphone Access

2. Accessibility Access (Required for Global Hotkeys & Auto-Paste)

3. Screen Recording Access (Required for Video Transcription)

Installation & Running

Configuration

Text Replacements

Usage

Voice-to-Text Transcription

Text-to-Speech

Screen Recording & Video Transcription

Keyboard Shortcuts

Available Commands

Project Structure

License

About

Uh oh!

Releases 1

Packages

Languages

License

ykdojo/super-voice-assistant

Folders and files

Latest commit

History

Repository files navigation

Super Voice Assistant

Demo

Features

Requirements

System Permissions Setup

1. Microphone Access

2. Accessibility Access (Required for Global Hotkeys & Auto-Paste)

3. Screen Recording Access (Required for Video Transcription)

Installation & Running

Configuration

Text Replacements

Usage

Voice-to-Text Transcription

Text-to-Speech

Screen Recording & Video Transcription

Keyboard Shortcuts

Available Commands

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages