Real-time Italian→English, French, Spanish, German audio translation using Whisper and Argos Translate.
Captures system audio (YouTube, video calls, etc.) and provides instant transcription and translation with interactive keyboard controls.
As a French developer working remotely with Nethesis (an Italian company behind NethServer and NethSecurity), I found myself in daily Italian-language meetings and calls. Not speaking Italian fluently, I needed a practical solution to follow technical discussions in real-time.
I built this tool initially for my own needs — a simple Python script to capture audio streams and translate them live during meetings. It quickly became indispensable for my daily work, allowing me to actively participate in Italian-speaking technical sessions, understand documentation being discussed, and keep up with fast-paced conversations.
After months of refining it for personal use, I realized this could help other developers in similar situations. Whether you're:
- 🤝 Working with Italian teams or clients
- 📚 Learning from Italian technical content (YouTube, conferences, webinars)
- 🌍 Contributing to Italian open-source projects
- 🎓 Studying Italian tech tutorials
...this tool can make your life significantly easier.
It demonstrates that complex problems don't always need complex solutions — sometimes a straightforward Python script is all you need.
Buon lavoro! 🚀
- 🎤 Real-time translation - Italian to English, French, Spanish or German, phrase by phrase as the speaker finishes
- 🎯 5 Whisper models - From tiny (fast) to large-v3 (accurate), switchable on-the-fly with W
- ⚡ 3 speed modes - Fast (500ms)/Normal (800ms)/Slow (1200ms) silence thresholds, switchable on-the-fly with M
- 🔇 VAD-based chunking - webrtcvad detects speech boundaries in real-time; flushes on silence rather than fixed intervals
- 📊 Session statistics - Duration, segments, word count and dropped chunks at end of session
- ⌨️ Full keyboard control - Pause, save, switch model, switch language, change mode on-the-fly
- 💾 Markdown export - Save timestamped bilingual transcripts with session stats
- 🔧 Zero configuration - Auto-installs dependencies in isolated venv
- ⚡ GPU acceleration - Experimental NVIDIA/CUDA support via
--gpu(3-5x faster, falls back to CPU on failure) - 🐧 Linux native - Works with PipeWire/PulseAudio
- 🎛️ Smart audio source selection - Auto-detects active stream, interactive menu when multiple sources are active
- 🇮🇹 Bilingual display - Toggle Italian source text visibility
All processing happens entirely on your machine — no audio, transcription, or translation data ever leaves your computer.
- Speech recognition is performed by faster-whisper, a local Whisper model that runs offline after the initial download.
- Translation is handled by argostranslate, which uses locally installed language models with no network calls at runtime.
- No cloud API is contacted during use. There is no telemetry, no account, and no data sent to any third party.
- Audio capture reads your system audio stream in memory only; nothing is written to disk unless you explicitly use
--save.
This tool is safe to use in environments where confidentiality matters (internal meetings, proprietary content, etc.).
- OS: Linux (tested on Fedora 43, Ubuntu 24.04, openSUSE Tumbleweed)
- Python: 3.9+ (3.11 recommandé)
- Audio: PulseAudio or PipeWire (modern Linux distributions)
- Packages:
python3-venv,python3-devel/python311-devel(needed to compilewebrtcvad)
Note: Most modern Linux distributions (Fedora 34+, Ubuntu 22.10+, Debian 12+)
use PipeWire as the default audio server. The script works seamlessly with both
PipeWire and legacy PulseAudio systems through the pactl/parec compatibility layer.
Fedora / RHEL / CentOS:
sudo dnf install python3-venv python3-devel pulseaudio-utilsopenSUSE:
sudo zypper install python3-venv python311-devel pulseaudio-utilsUbuntu / Debian:
sudo apt install python3-venv python3-dev pulseaudio-utils
pulseaudio-utilsprovidespactlandparec, which are required for audio capture under both PulseAudio and PipeWire.
# Clone repository
git clone https://github.com/stephdl/live-voice-translate.git
cd live-voice-translate
# Run (first run auto-installs Python dependencies)
./lvt.pyIf your system Python is older than 3.9 (e.g. openSUSE with Python 3.6), the script auto-detects a compatible version in
/usr/binand re-launches itself automatically. If auto-detection fails, launch manually:python3.11 ./lvt.py
First run creates virtualenv in ~/.local/share/live-voice-translate/venv and installs:
- faster-whisper
- argostranslate
- webrtcvad
This takes 2-3 minutes.
cd live-voice-translate
git pull origin mainNew Python dependencies (if any) are installed automatically on the next run.
If you get unexpected errors after an update, delete the virtualenv to force a clean reinstall:
rm -rf ~/.local/share/live-voice-translate/venv ./lvt.py
./lvt.pySelect model (1-5), then start playing audio in another window.
# Medium model (recommended)
./lvt.py medium
# Large model with slow mode (best quality)
./lvt.py large --slow
# Tiny model with fast mode (lowest latency)
./lvt.py tiny --fast
# Save transcript to file
./lvt.py medium --save meeting.md
# Auto-generated filename
./lvt.py medium --save
# Display Italian + English
./lvt.py medium --show-italian
# Translate to French (via it→en→fr double translation)
./lvt.py medium --to fr
# Translate to Spanish
./lvt.py medium --to es
# Translate to German
./lvt.py medium --to de
# Disable Voice Activity Detection (transcribe everything including silence)
./lvt.py medium --no-vad
# Enable GPU acceleration (NVIDIA/CUDA only, experimental)
./lvt.py medium --gpuDuring execution, press:
| Key | Action |
|---|---|
| P | Pause/Resume translation |
| S | Save transcript now (creates file if needed) |
| M | Change mode (fast → normal → slow → fast) |
| W | Change Whisper model (tiny → base → small → medium → large-v3) |
| L | Change target language (en → fr → es → de → en) |
| I | Toggle Italian display (ON/OFF) |
| Q | Quit gracefully |
| H | Show session config + keyboard shortcuts help |
Note: Shortcuts respond instantly (no need to press Enter).
| Model | Accuracy | Latency | RAM | Use case |
|---|---|---|---|---|
| tiny | 60% | ~1.5s | 1GB | Quick tests, low-end systems |
| base | 85% | ~4s | 1.5GB | Fast casual listening |
| small | 90% | ~5s | 2GB | Good balance |
| medium | 95% | ~8s | 5GB | Recommended for most uses |
| large-v3 | 98% | ~12s | 10GB | Maximum accuracy (high CPU/fan) |
| Mode | Segment size | Latency | Quality |
|---|---|---|---|
| fast | Shorter | Lower | May cut sentences |
| normal | Balanced | Medium | Default, good compromise |
| slow | Longer | Higher | Complete sentences, best context |
Change mode on-the-fly by pressing M during execution.
[14:25:30] ▶ Today was a tough day
[14:25:45] ▶ What happened?
./lvt.py medium --show-italian[14:25:30] Oggi è stata una giornata difficile (green)
[14:25:30] ▶ Today was a tough day
[14:25:45] Cosa è successo? (green)
[14:25:45] ▶ What happened?
Toggle Italian display during execution with I key.
# Start translator
./lvt.py medium
# In another window/tab, open YouTube
firefox "https://www.youtube.com/watch?v=ITALIAN_VIDEO_ID"
# Translations appear in real-time in terminal# Start with save
./lvt.py medium --save meeting.md
# Join video call (Zoom, Teams, Google Meet, Discord, etc.)
# Translations saved to meeting.md
# During call:
# - Press 'p' to pause (e.g., when speaking)
# - Press 'p' again to resume
# - Press 's' to force save
# - Press 'i' to show Italian text# Test tiny (fastest)
./lvt.py tiny --fast
# Test large (best quality)
./lvt.py large --slowEnglish only:
[14:25:30] ▶ Today was a tough day
[14:25:45] ▶ What happened?
Bilingual (with --show-italian or i key):
[14:25:30] Oggi è stata una giornata difficile
[14:25:30] ▶ Today was a tough day
[14:25:45] Cosa è successo?
[14:25:45] ▶ What happened?
# Live Voice Translation
**Date:** 2026-04-01 14:25:30
**Model:** medium
**Mode:** normal
---
**[14:25:30]**
🇮🇹 *Oggi è stata una giornata difficile*
🇬🇧 Today was a tough day
---
**[14:25:45]**
🇮🇹 *Cosa è successo?*
🇬🇧 What happened?
---
**End of session:** 2026-04-01 14:57:45
**Duration:** 00:32:15
**Phrases:** 147
**Words:** 1823The tool auto-detects active audio monitor streams. If only one is active, it is selected automatically. If multiple streams are active simultaneously (e.g. a video call and a YouTube video), an interactive menu is displayed:
Multiple audio streams detected:
1) USB Audio
2) JBL LIVE650BTNC
Select stream (1-2):
PipeWire internal loopback sinks are automatically filtered out.
- Audio source: Auto-detects active PulseAudio/PipeWire monitor stream, with interactive selection when multiple are available
- VAD chunking: webrtcvad detects speech/silence boundaries and flushes each utterance when silence exceeds the mode threshold (500/800/1200ms)
- Transcription: Whisper converts Italian audio to text
- Translation: Argos Translate converts Italian → English, then English → target language if needed (fr/es/de)
- Display: Shows timestamped translations in terminal
- Save: Optionally exports to Markdown file
Architecture:
- Audio capture runs in background thread (non-blocking)
- Keyboard controller uses
select()for instant response (no dependencies) - Main thread processes audio queue and checks keyboard
# 1. Check if PulseAudio/PipeWire is running
pactl info
# Should show server info
# 2. List all audio sources
pactl list short sources
# 3. Look for monitor sources (with RUNNING status)
pactl list short sources | grep monitor
# 4. If no monitor source is RUNNING:
# - Play audio (YouTube, music, etc.)
# - Run the check again
pactl list short sources | grep -E "monitor.*RUNNING"
# 5. If still no output, restart audio service
pactl info
# Output PipeWire :
# Server Name: PulseAudio (on PipeWire 0.3.xx)
# Output PulseAudio :
# Server Name: pulseaudio
systemctl --user restart pipewire pipewire-pulse # For PipeWire
systemctl --user restart pulseaudio # For PulseAudioStill not working?
Check if audio is actually playing:
# Monitor audio levels in real-time
pavucontrol # GUI tool - check "Recording" tab
# Or command-line
pactl subscribe # Shows audio events# Install system dependencies (python*-devel is required to compile webrtcvad)
sudo dnf install python3-venv python3-devel # Fedora/RHEL
sudo zypper install python3-venv python311-devel # openSUSE (adjust version if needed)
sudo apt install python3-venv python3-dev # Ubuntu/Debian
# Retry
./lvt.py# Remove virtualenv
rm -rf ~/.local/share/live-voice-translate/
# Rerun (recreates clean venv)
./lvt.pyShortcuts require terminal in TTY mode. If piping output or running in non-interactive environment, use --no-keyboard:
./lvt.py medium --no-keyboardEnsure your terminal uses UTF-8 encoding:
# Check locale
echo $LANG
# Should show: fr_FR.UTF-8, en_US.UTF-8, or similar
# If not UTF-8:
export LANG=fr_FR.UTF-8
export LC_ALL=fr_FR.UTF-8- CPU usage: Use smaller models (tiny/base) on weak hardware
- Latency: Use
--fast(500ms silence, 6s max chunk) for lowest delay — may cut mid-sentence - Accuracy: Use
large --slow(1200ms silence, 12s max chunk, max beam) for best quality - Balance:
medium(default normal mode) is the best CPU/quality tradeoff - RAM: medium model needs ~5GB, large needs ~10GB
- GPU (NVIDIA): Use
--gpufor 3-5x faster transcription — requires CUDA drivers. Falls back to CPU automatically on failure. AMD is not supported (CTranslate2 has no ROCm build).
./lvt.py medium --save "$(date +%Y%m%d)-meeting.md"# Start without --save
./lvt.py medium
# During execution, press 's'
# Creates: live-translate-YYYYMMDD-HHMMSS.mdWorks out-of-the-box on Wayland (tested on Fedora 43 + GNOME 49).
Fully compatible with X11 desktop environments.
GNU General Public License v3.0 or later
See LICENSE file for details.
Stéphane de Labrusse
Freelance developer specializing in Linux, containerization, and cybersecurity.
- GitHub: @stephdl
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Test your changes
- Submit a pull request
Potential future features:
- VAD-based real-time chunking (webrtcvad) — flushes on silence, phrase by phrase
- Multiple target languages (French, Spanish, German) via double translation
- Smart audio source selection with interactive menu for multiple streams
- GPU acceleration (experimental, NVIDIA/CUDA only, auto-fallback to CPU)
- Session statistics - Duration, segments, word count, dropped chunks
- On-the-fly model switching (W key) without restarting
- On-the-fly language switching (L key) without restarting
- Markdown export with session stats and bilingual transcript
- Multiple translators (DeepL/GPT fallback)
- Bidirectional mode (IT+EN simultaneously)
- Speaker diarization
- Web dashboard
- Export formats (PDF, DOCX, SRT subtitles)
- Support for other language pairs (French, Spanish, German via double translation)
- Migrate VAD from webrtcvad (unmaintained since 2018) to a maintained alternative — Silero VAD investigated but requires PyTorch or has ONNX API instability; RMS energy-based VAD is a viable lightweight option
- Whisper by OpenAI
- faster-whisper by Guillaume Klein
- Argos Translate by Argos Open Technologies
For bugs or feature requests, please open an issue.
Made with ❤️ for the open-source community