live-voice-translate

Real-time Italian→English, French, Spanish, German audio translation using Whisper and Argos Translate.

Captures system audio (YouTube, video calls, etc.) and provides instant transcription and translation with interactive keyboard controls.

🇮🇹 → 🇬🇧 Story behind this tool

As a French developer working remotely with Nethesis (an Italian company behind NethServer and NethSecurity), I found myself in daily Italian-language meetings and calls. Not speaking Italian fluently, I needed a practical solution to follow technical discussions in real-time.

I built this tool initially for my own needs — a simple Python script to capture audio streams and translate them live during meetings. It quickly became indispensable for my daily work, allowing me to actively participate in Italian-speaking technical sessions, understand documentation being discussed, and keep up with fast-paced conversations.

After months of refining it for personal use, I realized this could help other developers in similar situations. Whether you're:

🤝 Working with Italian teams or clients
📚 Learning from Italian technical content (YouTube, conferences, webinars)
🌍 Contributing to Italian open-source projects
🎓 Studying Italian tech tutorials

...this tool can make your life significantly easier.

It demonstrates that complex problems don't always need complex solutions — sometimes a straightforward Python script is all you need.

Buon lavoro! 🚀

Features

🎤 Real-time translation - Italian to English, French, Spanish or German, phrase by phrase as the speaker finishes
🎯 5 Whisper models - From tiny (fast) to large-v3 (accurate), switchable on-the-fly with W
⚡ 3 speed modes - Fast (500ms)/Normal (800ms)/Slow (1200ms) silence thresholds, switchable on-the-fly with M
🔇 VAD-based chunking - webrtcvad detects speech boundaries in real-time; flushes on silence rather than fixed intervals
📊 Session statistics - Duration, segments, word count and dropped chunks at end of session
⌨️ Full keyboard control - Pause, save, switch model, switch language, change mode on-the-fly
💾 Markdown export - Save timestamped bilingual transcripts with session stats
🔧 Zero configuration - Auto-installs dependencies in isolated venv
⚡ GPU acceleration - Experimental NVIDIA/CUDA support via --gpu (3-5x faster, falls back to CPU on failure)
🐧 Linux native - Works with PipeWire/PulseAudio
🎛️ Smart audio source selection - Auto-detects active stream, interactive menu when multiple sources are active
🇮🇹 Bilingual display - Toggle Italian source text visibility

Privacy

All processing happens entirely on your machine — no audio, transcription, or translation data ever leaves your computer.

Speech recognition is performed by faster-whisper, a local Whisper model that runs offline after the initial download.
Translation is handled by argostranslate, which uses locally installed language models with no network calls at runtime.
No cloud API is contacted during use. There is no telemetry, no account, and no data sent to any third party.
Audio capture reads your system audio stream in memory only; nothing is written to disk unless you explicitly use --save.

This tool is safe to use in environments where confidentiality matters (internal meetings, proprietary content, etc.).

Requirements

OS: Linux (tested on Fedora 43, Ubuntu 24.04, openSUSE Tumbleweed)
Python: 3.9+ (3.11 recommandé)
Audio: PulseAudio or PipeWire (modern Linux distributions)
Packages: python3-venv, python3-devel / python311-devel (needed to compile webrtcvad)

Note: Most modern Linux distributions (Fedora 34+, Ubuntu 22.10+, Debian 12+) use PipeWire as the default audio server. The script works seamlessly with both PipeWire and legacy PulseAudio systems through the pactl/parec compatibility layer.

Installation

1. System dependencies

Fedora / RHEL / CentOS:

sudo dnf install python3-venv python3-devel pulseaudio-utils

openSUSE:

sudo zypper install python3-venv python311-devel pulseaudio-utils

Ubuntu / Debian:

sudo apt install python3-venv python3-dev pulseaudio-utils

pulseaudio-utils provides pactl and parec, which are required for audio capture under both PulseAudio and PipeWire.

2. Clone and run

# Clone repository
git clone https://github.com/stephdl/live-voice-translate.git
cd live-voice-translate

# Run (first run auto-installs Python dependencies)
./lvt.py

If your system Python is older than 3.9 (e.g. openSUSE with Python 3.6), the script auto-detects a compatible version in /usr/bin and re-launches itself automatically. If auto-detection fails, launch manually:
python3.11 ./lvt.py

First run creates virtualenv in ~/.local/share/live-voice-translate/venv and installs:

faster-whisper
argostranslate
webrtcvad

This takes 2-3 minutes.

3. Updating

cd live-voice-translate
git pull origin main

New Python dependencies (if any) are installed automatically on the next run.

If you get unexpected errors after an update, delete the virtualenv to force a clean reinstall:
rm -rf ~/.local/share/live-voice-translate/venv
./lvt.py

Usage

Interactive menu

./lvt.py

Select model (1-5), then start playing audio in another window.

Command-line

# Medium model (recommended)
./lvt.py medium

# Large model with slow mode (best quality)
./lvt.py large --slow

# Tiny model with fast mode (lowest latency)
./lvt.py tiny --fast

# Save transcript to file
./lvt.py medium --save meeting.md

# Auto-generated filename
./lvt.py medium --save

# Display Italian + English
./lvt.py medium --show-italian

# Translate to French (via it→en→fr double translation)
./lvt.py medium --to fr

# Translate to Spanish
./lvt.py medium --to es

# Translate to German
./lvt.py medium --to de

# Disable Voice Activity Detection (transcribe everything including silence)
./lvt.py medium --no-vad

# Enable GPU acceleration (NVIDIA/CUDA only, experimental)
./lvt.py medium --gpu

Keyboard shortcuts

During execution, press:

Key	Action
P	Pause/Resume translation
S	Save transcript now (creates file if needed)
M	Change mode (fast → normal → slow → fast)
W	Change Whisper model (tiny → base → small → medium → large-v3)
L	Change target language (en → fr → es → de → en)
I	Toggle Italian display (ON/OFF)
Q	Quit gracefully
H	Show session config + keyboard shortcuts help

Note: Shortcuts respond instantly (no need to press Enter).

Models comparison

Model	Accuracy	Latency	RAM	Use case
tiny	60%	~1.5s	1GB	Quick tests, low-end systems
base	85%	~4s	1.5GB	Fast casual listening
small	90%	~5s	2GB	Good balance
medium	95%	~8s	5GB	Recommended for most uses
large-v3	98%	~12s	10GB	Maximum accuracy (high CPU/fan)

Speed modes

Mode	Segment size	Latency	Quality
fast	Shorter	Lower	May cut sentences
normal	Balanced	Medium	Default, good compromise
slow	Longer	Higher	Complete sentences, best context

Change mode on-the-fly by pressing M during execution.

Display modes

English only (default)

[14:25:30] ▶ Today was a tough day
[14:25:45] ▶ What happened?

Bilingual (Italian + English)

./lvt.py medium --show-italian

[14:25:30] Oggi è stata una giornata difficile     (green)
[14:25:30] ▶ Today was a tough day

[14:25:45] Cosa è successo?                        (green)
[14:25:45] ▶ What happened?

Toggle Italian display during execution with I key.

Examples

Translate YouTube video

# Start translator
./lvt.py medium

# In another window/tab, open YouTube
firefox "https://www.youtube.com/watch?v=ITALIAN_VIDEO_ID"

# Translations appear in real-time in terminal

Translate video call

# Start with save
./lvt.py medium --save meeting.md

# Join video call (Zoom, Teams, Google Meet, Discord, etc.)
# Translations saved to meeting.md

# During call:
# - Press 'p' to pause (e.g., when speaking)
# - Press 'p' again to resume
# - Press 's' to force save
# - Press 'i' to show Italian text

Compare models

# Test tiny (fastest)
./lvt.py tiny --fast

# Test large (best quality)
./lvt.py large --slow

Output format

Terminal output

English only:

[14:25:30] ▶ Today was a tough day
[14:25:45] ▶ What happened?

Bilingual (with --show-italian or i key):

[14:25:30] Oggi è stata una giornata difficile
[14:25:30] ▶ Today was a tough day
[14:25:45] Cosa è successo?
[14:25:45] ▶ What happened?

Markdown file (--save)

# Live Voice Translation

**Date:** 2026-04-01 14:25:30
**Model:** medium
**Mode:** normal

---

**[14:25:30]**

🇮🇹 *Oggi è stata una giornata difficile*

🇬🇧 Today was a tough day

---

**[14:25:45]**

🇮🇹 *Cosa è successo?*

🇬🇧 What happened?

---

**End of session:** 2026-04-01 14:57:45
**Duration:** 00:32:15
**Phrases:** 147
**Words:** 1823

Audio source selection

The tool auto-detects active audio monitor streams. If only one is active, it is selected automatically. If multiple streams are active simultaneously (e.g. a video call and a YouTube video), an interactive menu is displayed:

  Multiple audio streams detected:

    1) USB Audio
    2) JBL LIVE650BTNC

  Select stream (1-2):

PipeWire internal loopback sinks are automatically filtered out.

How it works

Audio source: Auto-detects active PulseAudio/PipeWire monitor stream, with interactive selection when multiple are available
VAD chunking: webrtcvad detects speech/silence boundaries and flushes each utterance when silence exceeds the mode threshold (500/800/1200ms)
Transcription: Whisper converts Italian audio to text
Translation: Argos Translate converts Italian → English, then English → target language if needed (fr/es/de)
Display: Shows timestamped translations in terminal
Save: Optionally exports to Markdown file

Architecture:

Audio capture runs in background thread (non-blocking)
Keyboard controller uses select() for instant response (no dependencies)
Main thread processes audio queue and checks keyboard

Troubleshooting

No active audio stream detected

# 1. Check if PulseAudio/PipeWire is running
pactl info
# Should show server info

# 2. List all audio sources
pactl list short sources

# 3. Look for monitor sources (with RUNNING status)
pactl list short sources | grep monitor

# 4. If no monitor source is RUNNING:
# - Play audio (YouTube, music, etc.)
# - Run the check again
pactl list short sources | grep -E "monitor.*RUNNING"

# 5. If still no output, restart audio service
pactl info

# Output PipeWire :
# Server Name: PulseAudio (on PipeWire 0.3.xx)

# Output PulseAudio :
# Server Name: pulseaudio

systemctl --user restart pipewire pipewire-pulse  # For PipeWire
systemctl --user restart pulseaudio                # For PulseAudio

Still not working?

Check if audio is actually playing:

# Monitor audio levels in real-time
pavucontrol  # GUI tool - check "Recording" tab

# Or command-line
pactl subscribe  # Shows audio events

First run fails

# Install system dependencies (python*-devel is required to compile webrtcvad)
sudo dnf install python3-venv python3-devel      # Fedora/RHEL
sudo zypper install python3-venv python311-devel # openSUSE (adjust version if needed)
sudo apt install python3-venv python3-dev        # Ubuntu/Debian

# Retry
./lvt.py

Delete and reinstall virtualenv

# Remove virtualenv
rm -rf ~/.local/share/live-voice-translate/

# Rerun (recreates clean venv)
./lvt.py

Keyboard shortcuts not working

Shortcuts require terminal in TTY mode. If piping output or running in non-interactive environment, use --no-keyboard:

./lvt.py medium --no-keyboard

Italian text not displaying correctly

Ensure your terminal uses UTF-8 encoding:

# Check locale
echo $LANG
# Should show: fr_FR.UTF-8, en_US.UTF-8, or similar

# If not UTF-8:
export LANG=fr_FR.UTF-8
export LC_ALL=fr_FR.UTF-8

Performance tips

CPU usage: Use smaller models (tiny/base) on weak hardware
Latency: Use --fast (500ms silence, 6s max chunk) for lowest delay — may cut mid-sentence
Accuracy: Use large --slow (1200ms silence, 12s max chunk, max beam) for best quality
Balance: medium (default normal mode) is the best CPU/quality tradeoff
RAM: medium model needs ~5GB, large needs ~10GB
GPU (NVIDIA): Use --gpu for 3-5x faster transcription — requires CUDA drivers. Falls back to CPU automatically on failure. AMD is not supported (CTranslate2 has no ROCm build).

Advanced usage

Custom save filename

./lvt.py medium --save "$(date +%Y%m%d)-meeting.md"

Create save file during execution

# Start without --save
./lvt.py medium

# During execution, press 's'
# Creates: live-translate-YYYYMMDD-HHMMSS.md

Wayland compatibility

Works out-of-the-box on Wayland (tested on Fedora 43 + GNOME 49).

X11 compatibility

Fully compatible with X11 desktop environments.

License

GNU General Public License v3.0 or later

See LICENSE file for details.

Author

Stéphane de Labrusse

Freelance developer specializing in Linux, containerization, and cybersecurity.

GitHub: @stephdl

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Test your changes
Submit a pull request

Roadmap

Potential future features:

Acknowledgments

Whisper by OpenAI
faster-whisper by Guillaume Klein
Argos Translate by Argos Open Technologies

Support

For bugs or feature requests, please open an issue.

Made with ❤️ for the open-source community

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
LICENSE		LICENSE
README.md		README.md
lvt.py		lvt.py

Folders and files

Latest commit

History

Repository files navigation

live-voice-translate

🇮🇹 → 🇬🇧 Story behind this tool

Features

Privacy

Requirements

Installation

1. System dependencies

2. Clone and run

3. Updating

Usage

Interactive menu

Command-line

Keyboard shortcuts

Models comparison

Speed modes

Display modes

English only (default)

Bilingual (Italian + English)

Examples

Translate YouTube video

Translate video call

Compare models

Output format

Terminal output

Markdown file (--save)

Audio source selection

How it works

Troubleshooting

No active audio stream detected

First run fails

Delete and reinstall virtualenv

Keyboard shortcuts not working

Italian text not displaying correctly

Performance tips

Advanced usage

Custom save filename

Create save file during execution

Wayland compatibility

X11 compatibility

License

Author

Contributing

Roadmap

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages