OSTT is an interactive terminal-based audio recording and speech-to-text transcription tool. Record audio with real-time waveform visualization, automatically transcribe using multiple AI providers and models, and maintain a browsable history of all your transcriptions. Built with Rust for performance and minimal dependencies, ostt works seamlessly on Linux and macOS.
Tip
Omarchy and Hyprland users! Configure ostt to run as a floating popup window to record and transcribe in any app.
ostt.-.8.secs.mp4
- Real-time audio visualization - Frequency spectrum (default) or time-domain waveform, optimized for human voice recording
- Noise gating - Automatic suppression of background noise in spectrum mode
- dBFS-based volume metering (industry standard)
- Configurable reference level for clipping detection
- Audio clipping detection with pause/resume support
- Audio compression for fast API calls
- Multiple transcription providers and models
- Browsable transcription history
- Keyword management for improved accuracy
- Cross-platform support - Linux and macOS
Important
Upgrading from 0.0.5? Version 0.0.7 introduces output flags (-c, -o) that change default behavior for popup integrations.
- Hyprland users: See Hyprland Upgrade Guide
- macOS users: See macOS Upgrade Guide
Without updates, transcriptions will output to stdout instead of clipboard in popup windows.
ostt supports multiple AI transcription providers. Bring your own API key and choose from the following:
- gpt-4o-transcribe - Latest model with best accuracy
- gpt-4o-mini-transcribe - Faster, lighter model
- whisper-1 - Legacy Whisper model
- nova-3 - Latest generation, fastest processing
- nova-2 - Previous generation model
- deepinfra-whisper-large-v3 - High accuracy Whisper model
- deepinfra-whisper-base - Fast, lightweight model
- groq-whisper-large-v3 - High accuracy processing
- groq-whisper-large-v3-turbo - Fastest transcription speed
- assemblyai-universal-3-pro - Best accuracy, latest model
Berget is a Swedish cloud provider guaranteeing that data never leaves Sweden. All models are hosted within Swedish borders.
- berget-whisper-kb-large - KB Whisper Large, developed by the National Library of Sweden. Trained on 50,000+ hours of Swedish speech, reduces WER by 47% compared to OpenAI's whisper-large-v3 on Swedish.
- berget-whisper-nb-large - NB Whisper Large, developed by the National Library of Norway. Trained on 66,000 hours of Norwegian speech, optimized for Norwegian ASR.
- berget-whisper-large-v3 - OpenAI Whisper Large V3, general-purpose multilingual model hosted on Berget infrastructure.
Configure your preferred provider and model using ostt auth.
Arch Linux (AUR):
yay -S osttShell Installer (All Distributions):
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/kristoferlund/ostt/releases/latest/download/ostt-installer.sh | shHomebrew (Recommended):
brew install kristoferlund/ostt/osttShell Installer:
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/kristoferlund/ostt/releases/latest/download/ostt-installer.sh | shDependencies need only to be installed manually if you used the shell installer. yay and brew installs the dependencies automatically.
macOS:
ffmpegLinux:
ffmpeg wl-clipboard # For Wayland
# OR
ffmpeg xclip # For X11Optional (Recommended for better audio playback):
mpv # Recommended for best audio playback experience with ostt replayNote on Audio Playback: For the best experience when replaying recordings with
ostt replay, we recommend installingmpv. It will be used as the primary audio player if available. Fallbacks includevlc,ffplay, andpaplay. If none are installed, the system default application will be used.
After installation, set up authentication and start recording:
Authentication: ostt is a bring-your-own-API-key application. Authenticate once with your preferred provider, then freely switch between available models.
# Configure your transcription provider
ostt auth
# Start recording (press Enter to transcribe, Esc to cancel)
ostt record
# Or just run ostt (defaults to recording)
osttThe app will create a default configuration file on first run at ~/.config/ostt/ostt.toml.
For the best experience, configure ostt to run as a floating popup window tied to a global hotkey. This allows you to:
- Press a hotkey from any application
- Record your speech in a popup window
- Have it automatically transcribed
- Paste the result directly into your current app
Platform-specific setup instructions:
- Hyprland / Omarchy Setup - Tiling window manager integration (recommended)
- macOS Setup - Hammerspoon-based popup configuration
ostt works on all Linux distributions and macOS without additional setup. Simply use ostt or ostt record from your terminal.
ostt # Record audio with real-time visualization (default)
ostt record # Record audio with real-time visualization
# Output to stdout by default
ostt -c # Record and copy to clipboard (shorthand)
ostt record -c # Record and copy to clipboard (explicit)
ostt -o file # Record and write to file (shorthand)
ostt record -o file # Record and write to file (explicit)
ostt transcribe file # Transcribe a pre-recorded audio file
ostt transcribe f -c # Transcribe and copy to clipboard
ostt transcribe f -o out.txt # Transcribe and write to file
ostt retry [N] # Re-transcribe recording #N (1=most recent)
ostt retry -c # Re-transcribe and copy to clipboard
ostt replay [N] # Play back recording #N
ostt auth # Configure transcription provider and API key
ostt history # Browse transcription history
ostt keywords # Manage keywords for improved accuracy
ostt config # Open configuration file in editor
ostt list-devices # List available audio input devices
ostt logs # View recent application logs
ostt version # Show version information
ostt help # Show all commands
ostt -h # Quick help
ostt --help # Detailed help with examplesCommand Aliases: Most commands have short aliases for faster typing: r (record), t (transcribe), a (auth), h (history), k (keywords), c (config), rp (replay).
ostt r -c # Same as: ostt record -c
ostt a # Same as: ostt authTranscribe: The transcribe command enables use of ostt's transcription pipeline for pre-recorded audio files, without interactive recording. This is useful for non-interactive workflows such as CI pipelines, GitHub Actions, or agentic scripts where you have an existing audio file and want to leverage ostt's multi-provider transcription infrastructure.
ostt transcribe recording.ogg # Transcribe to stdout
ostt transcribe voice-memo.mp3 -c # Transcribe and copy to clipboard
ostt transcribe meeting.wav -o transcript.txt # Transcribe and write to file
ostt transcribe audio.ogg | grep keyword # Pipe to other commandsRecord Options: The -c and -o flags can be used without explicitly saying record since it's the default command:
ostt -c # Same as: ostt record -c
ostt -o file.txt # Same as: ostt record -o file.txtostt can generate completion scripts for your shell to enable tab completion of commands and options.
Bash:
ostt completions bash > ostt.bash
sudo cp ostt.bash /etc/bash_completion.d/Zsh:
ostt completions zsh > _ostt
# Copy to your zsh completions directory (location varies by system)
sudo cp _ostt /usr/local/share/zsh/site-functions/Fish:
ostt completions fish > ostt.fish
cp ostt.fish ~/.config/fish/completions/PowerShell:
ostt completions powershell > ostt.ps1
# Add to your PowerShell profileAfter installation, restart your shell or source the completion file to enable completions.
ostt uses a TOML configuration file at ~/.config/ostt/ostt.toml.
List available devices:
ostt list-devicesExample output:
Available audio input devices:
ID: 0
Name: default [DEFAULT]
Config: (44100Hz, 2 channels)
ID: 2
Name: USB Microphone
Config: (48000Hz, 1 channels)
Edit ~/.config/ostt/ostt.toml:
[audio]
# Use device by ID, name, or "default"
device = "2" # or "USB Microphone" or "default"
sample_rate = 16000 # 16kHz recommended for speech
peak_volume_threshold = 90 # Warning threshold (0-100%)
reference_level_db = -20 # dBFS reference for 100% meter
output_format = "mp3 -ab 16k -ar 12000" # Compressed audio format
visualization = "spectrum" # "spectrum" (default) or "waveform"Visualization Types:
spectrum(default) - Shows frequency spectrum with energy distribution across frequencies optimized for human voice (100-1500 Hz range).waveform- Shows time-domain waveform with amplitude over time. Classic oscilloscope-style display showing raw audio envelope.
Configure your AI provider:
ostt authThis will:
- Show available providers and models
- Let you select your preferred model
- Prompt for your API key
- Save everything securely
Security Note: API keys are stored separately in ~/.local/share/ostt/credentials with restricted permissions (0600).
[audio]
device = "default"
sample_rate = 16000
peak_volume_threshold = 90
reference_level_db = -20
output_format = "mp3 -ab 16k -ar 12000"
visualization = "spectrum" # "spectrum" for frequency display, "waveform" for amplitude display
[providers.deepgram]
punctuate = true
smart_format = false
filler_words = false
detect_language = true # Automatic language detection (default: true)
# detect_language_codes = ["en", "es"] # Restrict to specific languages only
[providers.assemblyai]
format_text = true # Punctuation, casing, and numeral formatting
disfluencies = false # Include filler words (uh, um)
filter_profanity = false # Filter profanity from transcript
language_detection = true # Automatic language detectionFor detailed configuration options, see the config file comments or run ostt config to edit.
ostt # Output to stdout (default)
ostt record # Output to stdout (explicit)
ostt -c # Copy to clipboard (shorthand)
ostt record -c # Copy to clipboard (explicit)
ostt -o file # Write to file (shorthand)
ostt record -o file # Write to file (explicit)Keyboard Controls:
| Key | Action |
|---|---|
Enter |
Stop recording and transcribe |
Space |
Pause/resume recording |
Esc, q, Ctrl+C |
Cancel without saving |
Display Elements:
- Visualization: Real-time audio display (spectrum or waveform, configurable)
- Spectrum mode: Shows frequency distribution across the voice range. Peaks in the visualization align with volume meter peaks
- Waveform mode: Shows amplitude envelope over time
- Vol %: Current volume level
- Peak %: Maximum volume in last 3 seconds
- Red indicator: Clipping warning (appears in both visualization modes)
Browse your transcription history:
ostt historyUse arrow keys to navigate, Enter to copy selected transcription to clipboard, and Esc to exit.
Manage keywords for improved transcription accuracy:
ostt keywordsAdd technical terms, names, or domain-specific vocabulary to help the AI transcribe more accurately.
~/.config/ostt/
├── ostt.toml # Main configuration
└── hyprland/ # Hyprland integration (if set up)
├── ostt-float.sh
└── alacritty-float.toml
~/.local/share/ostt/
└── credentials # API keys (0600 permissions)
~/.local/state/ostt/
└── ostt.log.* # Daily-rotated logs (kept for 7 days, auto-cleanup on startup)
ostt logs all activity to ~/.local/state/ostt/ostt.log.* with daily rotation and automatic cleanup. Log files are kept for the 7 most recent days and older logs are automatically deleted on startup. By default, logs are set to info level.
View recent logs:
ostt logsEnable debug logging for detailed troubleshooting:
RUST_LOG=debug ostt recordAvailable log levels: error, warn, info (default), debug, trace
# List available devices
ostt list-devices
# Update config with correct device
ostt configThe reference level may be set too high/low for your audio card. Run ostt, maximize your microphone gain, note the peak dBFS value, and update reference_level_db in your config.
# Verify authentication
ostt auth
# Check logs with debug output
RUST_LOG=debug ostt record# Test the script directly
bash ~/.local/bin/ostt-float
# Verify Hyprland config loaded
hyprctl reloadFor more troubleshooting, see ostt logs or check ~/.local/state/ostt/ostt.log.*.
git clone https://github.com/kristoferlund/ostt.git
cd ostt
# Development build
cargo build
# Release build (optimized)
cargo build --release
# Run directly
cargo runostt/
├── src/
│ ├── commands/ # Command handlers
│ ├── config/ # Configuration management
│ ├── recording/ # Audio capture and UI
│ ├── transcription/ # API integrations
│ ├── history/ # History storage and UI
│ └── ui/ # Shared UI components
├── environments/ # Platform-specific integrations
└── Cargo.toml
Contributions are welcome! Please open an issue or submit a pull request.
|
Kristofer |
Kristofer Claw |
Pastilhas |
kristofernoaccess |
axo bot |
MIT