This repository contains a collection of bash scripts for privacy-focused, offline voice typing. It leverages whisper or whisper.cpp for speech-to-text and ydotool to type the transcribed text into the active window.
voice_typing: A standalone script that runs thewhisperCLI for every audio clip. Best for occasional use and low resource consumption.voice_client_local: A client/server setup where a localwhisper.cppserver is managed as a user service (whisper.service). Faster thanvoice_typingbut uses more resources.voice_client: A client script designed to connect to a remotewhisper.cppserver.llama_edit: An optional text-correction utility that uses an LLM (viallama.cppserver) to refine transcribed text (e.g., correcting "um", "uh", "oops, I mean..." or grammar).
| Mode | Script | Backend | Connection Type |
|---|---|---|---|
| Standalone | voice_typing |
whisper CLI |
Local process |
| Local Client/Server | voice_client_local |
whisper.cpp |
Local whisper.service |
| Remote Client/Server | voice_client |
whisper.cpp |
Networked server |
All main scripts use a FIFO-based producer-consumer architecture:
- Producer: A background loop records audio using
sox(rec) and writes the filename to a named pipe (FIFO). - Consumer: A foreground loop reads filenames from the FIFO, processes them (transcription), and then types the result.
The client/server scripts rely on systemd user services:
whisper.service: Manages thewhisper.cppserver.llama.service: Manages thellama.cppserver forllama_edit.
Text is injected into the OS using ydotool. This requires the ydotoold daemon to be running and the YDOTOOL_SOCKET environment variable to be correctly set.
# Standalone mode
./voice_typing
# Standalone mode with LLM text correction
./voice_typing -flow
# Local client/server mode
./voice_client_local
# Local client/server mode with LLM text correction
./voice_client_local -flow
# Remote client/server mode
./voice_client
# Remote client/server mode with LLM text correction
./voice_client -flow./llama_edit "Your uncorrected text here"- Audio:
sox,lame,ffmpeg - Transcription:
whisper(OpenAI) orwhisper.cpp - Text Injection:
ydotool - Data Processing:
jq,curl
If you encounter failed to connect socket '/tmp/.ydotool_socket': Permission denied, ensure:
- The user is in the
inputgroup. ydotooldis running.export YDOTOOL_SOCKET=/tmp/.ydotool_socketis in your.bashrc.- You may need
sudo chmod +s $(which ydotool).
The rec command uses silence detection thresholds. If transcription doesn't start or stops too early:
- Adjust the silence thresholds in the
reccommand (e.g.,silence 1 0.2 4% 1 1.0 2%). - Increasing the percentage (e.g.,
6%or8%) makes the detector more tolerant of background noise.
voice_client: By default, it targetshttp://127.0.0.1:7777. If using a remote server, you must edit the script to point to the correct IP/hostname.llama_edit: Targetshttp://127.0.0.1:8087/v1/chat/completions. Ensure yourllama.serviceis configured to listen on this port.
If the output contains "Thanks for watching!" or other artifacts from the model, the scripts include logic to filter these out based on string matching or length checks.