Speak to AI is a minimalist, privacy-focused desktop application for offline voice recognition directly into any active window (editors, browsers, IDEs, AI assistants).
Written in pure Go, it leverages whisper.cpp for fast, offline transcription. The architecture is built from the ground up without external frameworks, featuring a custom dependency injection factory and a minimal set of dependencies, ensuring lean and maintainable.
speak-to-ai-preview.mp4
▸ Speak to AI runs quietly in the background and integrates into the system tray for convenient management.
▸ It can also be invoked as a CLI tool (see CLI Usage Guide) for scripting purposes.
▸ For integration enthusiasts, a WebSocket server is available at localhost:8080. Enable it in your config with web_server enabled: true (disabled by default).
- Offline speech-to-text, privacy-first: all processing happens locally
- Portable: AppImage package
- Cross-platform support for X11 and Wayland
- Linux DEs: native integration with GNOME, KDE, and others
- GPU + CPU support: Vulkan backend for faster transcription (auto-fallback to CPU)
- Voice typing or clipboard mode
- Flexible audio recording: arecord (ALSA) or ffmpeg (PulseAudio/PipeWire), see audio pipeline
- Multi-language support, custom hotkey binding, visual notifications
- Model management: switch between base, small, medium, and large-v3 whisper models via tray or CLI
Intuitive minimalist UX, robust STT infrastructure. A foundation for voice-controlled automation:
- Dual API: Unix socket IPC + WebSocket — script locally or integrate remotely
- Interface-driven: 50+ contracts — swap STT engines, add I/O methods, extend hotkey providers
- Daemon + CLI: background hub + stateless commands — perfect for IoT pipelines
- Graceful degradation: provider fallbacks, optional components, no crashes
# Voice command → smart home action
transcript=$(speak-to-ai stop-recording | jq -r '.data.transcript')
[[ "$transcript" == *"lights off"* ]] && curl -X POST http://hub/lights/offDownload the latest AppImage from Releases:
# Download the file, then:
chmod +x speak-to-ai-*.AppImage
# Ensure user is in input group for hotkeys to work:
sudo usermod -a -G input $USER
# then logout/login or reboot
# Open via GUI or with terminal command:
./speak-to-ai-*.AppImageArch Linux AUR:
yay -S speak-to-ai
# Ensure user is in input group:
sudo usermod -a -G input $USERFedora COPR:
sudo dnf copr enable ashbuk/speak-to-ai
sudo dnf install speak-to-ai
# Ensure user is in input group:
sudo usermod -a -G input $USER📋 Desktop Environment Support Guide - help us test different desktop environments!
For system tray integration on GNOME — install the AppIndicator extension ↑
KDE and other DEs have built-in system tray support out of the box
For automatic typing on GNOME — see setup guide ↑
Other Wayland compositors (KDE, Hyprland, Sway, etc.): wtype works without setup — automatically detected!
X11: Native support with xdotool out of the box
If automatic typing doesn't appear automatically, the app falls back to clipboard (Ctrl + V) mode
For issues and bug reports: GitHub Issues
See changes: CHANGELOG.md
| Category | Requirement |
|---|---|
| OS | Linux with glibc 2.35+ |
| Desktop | X11 or Wayland |
| Audio | Microphone capability |
| Storage | ~290MB |
| Memory | ~300MB RAM |
| CPU | AVX-capable (Intel/AMD 2011+) |
📋 Supported Distributions
| Family | Distributions |
|---|---|
| Ubuntu-based | Ubuntu 22.04+, Linux Mint 21+, Pop!_OS 22.04+, Elementary OS 7+, Zorin OS 17+ |
| Debian-based | Debian 12+ |
| Fedora | Fedora 36+ |
| Rolling release | Arch Linux, Manjaro, EndeavourOS, openSUSE Tumbleweed |
Start onboarding with:
- ARCHITECTURE.md — system architecture and component design
- DEVELOPMENT.md — development workflow and build instructions
- CONTRIBUTING.md — contribution guidelines and how to help improve the project
- docker/README.md — Docker-based development
Technical dive into architecture and engineering challenges: Building Speak-to-AI on Hashnode
- whisper.cpp for the excellent C++ implementation of OpenAI Whisper
- fyne.io/systray for cross-platform system tray support
- ydotool and wtype for Wayland-compatible input automation
- OpenAI for the original Whisper model
✦ MIT LICENSE
If you use this project, please link back to this repo and ⭐ it if it helped you.
- Consider contributing back improvements
Sharing with the community for privacy-conscious Linux users
Please consider supporting development