Skip to content

AshBuk/speak-to-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

477 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speak to AI

Speak to AI

🗣️ Native Linux Voice-To-Text App

Go Reference Go Report Card Go Version

CI Release AppImage AUR COPR

Speak to AI is a minimalist, privacy-focused desktop application for offline voice recognition directly into any active window (editors, browsers, IDEs, AI assistants).

Written in pure Go, it leverages whisper.cpp for fast, offline transcription. The architecture is built from the ground up without external frameworks, featuring a custom dependency injection factory and a minimal set of dependencies, ensuring lean and maintainable.

speak-to-ai-preview.mp4

Features

Privacy Security gosec

▸ Speak to AI runs quietly in the background and integrates into the system tray for convenient management.

▸ It can also be invoked as a CLI tool (see CLI Usage Guide) for scripting purposes.

▸ For integration enthusiasts, a WebSocket server is available at localhost:8080. Enable it in your config with web_server enabled: true (disabled by default).

  • Offline speech-to-text, privacy-first: all processing happens locally
  • Portable: AppImage package
  • Cross-platform support for X11 and Wayland
  • Linux DEs: native integration with GNOME, KDE, and others
  • GPU + CPU support: Vulkan backend for faster transcription (auto-fallback to CPU)
  • Voice typing or clipboard mode
  • Flexible audio recording: arecord (ALSA) or ffmpeg (PulseAudio/PipeWire), see audio pipeline
  • Multi-language support, custom hotkey binding, visual notifications
  • Model management: switch between base, small, medium, and large-v3 whisper models via tray or CLI

Beyond Minimalism

Intuitive minimalist UX, robust STT infrastructure. A foundation for voice-controlled automation:

  • Dual API: Unix socket IPC + WebSocket — script locally or integrate remotely
  • Interface-driven: 50+ contracts — swap STT engines, add I/O methods, extend hotkey providers
  • Daemon + CLI: background hub + stateless commands — perfect for IoT pipelines
  • Graceful degradation: provider fallbacks, optional components, no crashes
# Voice command → smart home action
transcript=$(speak-to-ai stop-recording | jq -r '.data.transcript')
[[ "$transcript" == *"lights off"* ]] && curl -X POST http://hub/lights/off

✦ Installation

AppImage

Download the latest AppImage from Releases:

# Download the file, then:
chmod +x speak-to-ai-*.AppImage
# Ensure user is in input group for hotkeys to work:
sudo usermod -a -G input $USER
# then logout/login or reboot
# Open via GUI or with terminal command:
./speak-to-ai-*.AppImage

Arch Linux AUR:

yay -S speak-to-ai
# Ensure user is in input group:
sudo usermod -a -G input $USER

Fedora COPR:

sudo dnf copr enable ashbuk/speak-to-ai
sudo dnf install speak-to-ai
# Ensure user is in input group:
sudo usermod -a -G input $USER

Desktop Environment Compatibility

OS Display

📋 Desktop Environment Support Guide - help us test different desktop environments!

For system tray integration on GNOME — install the AppIndicator extension

KDE and other DEs have built-in system tray support out of the box

For automatic typing on GNOME — see setup guide

Other Wayland compositors (KDE, Hyprland, Sway, etc.): wtype works without setup — automatically detected!
X11: Native support with xdotool out of the box

If automatic typing doesn't appear automatically, the app falls back to clipboard (Ctrl + V) mode

For issues and bug reports: GitHub Issues

See changes: CHANGELOG.md

System Requirements

Category Requirement
OS Linux with glibc 2.35+
Desktop X11 or Wayland
Audio Microphone capability
Storage ~290MB
Memory ~300MB RAM
CPU AVX-capable (Intel/AMD 2011+)
📋 Supported Distributions
Family Distributions
Ubuntu-based Ubuntu 22.04+, Linux Mint 21+, Pop!_OS 22.04+, Elementary OS 7+, Zorin OS 17+
Debian-based Debian 12+
Fedora Fedora 36+
Rolling release Arch Linux, Manjaro, EndeavourOS, openSUSE Tumbleweed

For Developers

Start onboarding with:

Technical dive into architecture and engineering challenges: Building Speak-to-AI on Hashnode

✦ Acknowledgments

  • whisper.cpp for the excellent C++ implementation of OpenAI Whisper
  • fyne.io/systray for cross-platform system tray support
  • ydotool and wtype for Wayland-compatible input automation
  • OpenAI for the original Whisper model

✦ MIT LICENSE

If you use this project, please link back to this repo and ⭐ it if it helped you.

  • Consider contributing back improvements

Sharing with the community for privacy-conscious Linux users


Sponsor

Sponsor PayPal

Please consider supporting development

About

Speak to AI • Native Linux Speech-to-Text (STT) • Offline, Privacy-Focused

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors