Speak to AI

🗣️ Native Linux Voice-To-Text App

Speak to AI is a minimalist, privacy-focused desktop application for offline voice recognition directly into any active window (editors, browsers, IDEs, AI assistants).

Written in pure Go, it leverages whisper.cpp for fast, offline transcription. The architecture is built from the ground up without external frameworks, featuring a custom dependency injection factory and a minimal set of dependencies, ensuring lean and maintainable.

speak-to-ai-preview.mp4

Features

▸ Speak to AI runs quietly in the background and integrates into the system tray for convenient management.

▸ It can also be invoked as a CLI tool (see CLI Usage Guide) for scripting purposes.

▸ For integration enthusiasts, a WebSocket server is available at localhost:8080. Enable it in your config with web_server enabled: true (disabled by default).

Offline speech-to-text, privacy-first: all processing happens locally
Portable: AppImage package
Cross-platform support for X11 and Wayland
Linux DEs: native integration with GNOME, KDE, and others
GPU + CPU support: Vulkan backend for faster transcription (auto-fallback to CPU)
Voice typing or clipboard mode
Flexible audio recording: arecord (ALSA) or ffmpeg (PulseAudio/PipeWire), see audio pipeline
Multi-language support, custom hotkey binding, visual notifications
Model management: switch between base, small, medium, and large-v3 whisper models via tray or CLI

Beyond Minimalism

Intuitive minimalist UX, robust STT infrastructure. A foundation for voice-controlled automation:

Dual API: Unix socket IPC + WebSocket — script locally or integrate remotely
Interface-driven: 50+ contracts — swap STT engines, add I/O methods, extend hotkey providers
Daemon + CLI: background hub + stateless commands — perfect for IoT pipelines
Graceful degradation: provider fallbacks, optional components, no crashes

# Voice command → smart home action
transcript=$(speak-to-ai stop-recording | jq -r '.data.transcript')
[[ "$transcript" == *"lights off"* ]] && curl -X POST http://hub/lights/off

✦ Installation

AppImage

Download the latest AppImage from Releases:

# Download the file, then:
chmod +x speak-to-ai-*.AppImage
# Ensure user is in input group for hotkeys to work:
sudo usermod -a -G input $USER
# then logout/login or reboot
# Open via GUI or with terminal command:
./speak-to-ai-*.AppImage

Arch Linux AUR:

yay -S speak-to-ai
# Ensure user is in input group:
sudo usermod -a -G input $USER

Fedora COPR:

sudo dnf copr enable ashbuk/speak-to-ai
sudo dnf install speak-to-ai
# Ensure user is in input group:
sudo usermod -a -G input $USER

Desktop Environment Compatibility

📋 Desktop Environment Support Guide - help us test different desktop environments!

For system tray integration on GNOME — install the AppIndicator extension ↑

KDE and other DEs have built-in system tray support out of the box

For automatic typing on GNOME — see setup guide ↑

Other Wayland compositors (KDE, Hyprland, Sway, etc.): wtype works without setup — automatically detected!
X11: Native support with xdotool out of the box

If automatic typing doesn't appear automatically, the app falls back to clipboard (Ctrl + V) mode

For issues and bug reports: GitHub Issues

See changes: CHANGELOG.md

System Requirements

Category	Requirement
OS	Linux with glibc 2.35+
Desktop	X11 or Wayland
Audio	Microphone capability
Storage	~290MB
Memory	~300MB RAM
CPU	AVX-capable (Intel/AMD 2011+)

📋 Supported Distributions

Family	Distributions
Ubuntu-based	Ubuntu 22.04+, Linux Mint 21+, Pop!_OS 22.04+, Elementary OS 7+, Zorin OS 17+
Debian-based	Debian 12+
Fedora	Fedora 36+
Rolling release	Arch Linux, Manjaro, EndeavourOS, openSUSE Tumbleweed

For Developers

Start onboarding with:

ARCHITECTURE.md — system architecture and component design
DEVELOPMENT.md — development workflow and build instructions
CONTRIBUTING.md — contribution guidelines and how to help improve the project
docker/README.md — Docker-based development

Technical dive into architecture and engineering challenges: Building Speak-to-AI on Hashnode

✦ Acknowledgments

whisper.cpp for the excellent C++ implementation of OpenAI Whisper
fyne.io/systray for cross-platform system tray support
ydotool and wtype for Wayland-compatible input automation
OpenAI for the original Whisper model

✦ MIT LICENSE

If you use this project, please link back to this repo and ⭐ it if it helped you.

Consider contributing back improvements

Sharing with the community for privacy-conscious Linux users

Sponsor

Please consider supporting development

Name		Name	Last commit message	Last commit date
Latest commit History 477 Commits
.github		.github
audio		audio
bash-scripts		bash-scripts
cmd/speak-to-ai		cmd/speak-to-ai
config		config
docker		docker
docs		docs
hotkeys		hotkeys
icons		icons
internal		internal
output		output
packaging		packaging
tests		tests
websocket		websocket
whisper		whisper
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.yaml		config.yaml
docker-compose.yml		docker-compose.yml
docs.go		docs.go
go.mod		go.mod
go.sum		go.sum
io.github.ashbuk.speak-to-ai.appdata.xml		io.github.ashbuk.speak-to-ai.appdata.xml
io.github.ashbuk.speak-to-ai.desktop		io.github.ashbuk.speak-to-ai.desktop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speak to AI

Features

Beyond Minimalism

✦ Installation

AppImage

Arch Linux AUR:

Fedora COPR:

Desktop Environment Compatibility

System Requirements

For Developers

✦ Acknowledgments

✦ MIT LICENSE

Sponsor

About

Uh oh!

Releases 20

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Speak to AI

Features

Beyond Minimalism

✦ Installation

AppImage

Arch Linux AUR:

Fedora COPR:

Desktop Environment Compatibility

System Requirements

For Developers

✦ Acknowledgments

✦ MIT LICENSE

Sponsor

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 20

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages