A near-zero-cost speech translation system that runs entirely locally. Supports microphone input (for your voice) and system audio capture (for meetings).
Languages: English <-> Japanese Stack: FastAPI (Backend), Next.js (Frontend), Faster-Whisper (STT), NLLB (Translation), Piper (TTS).
- Docker & Docker Compose
- Nvidia GPU (Optional, defaults to CPU)
- ~2GB of Disk Space for models
-
Start the application
docker-compose up --build
Note: First run will download models (~1-2GB), so be patient.
-
Access the UI Open http://localhost:3001
-
Usage
- Mode Selection:
- Microphone: Use this to translate your own voice.
- System Audio: Use this to translate others in a meeting (Zoom/Teams). Browser will ask to share a tab/screen - select the meeting window and check "Share System Audio".
- Start: Click Start.
- Language: Toggle between EN <-> JA.
- Mode Selection:
Backend:
cd backend
pip install -r requirements.txt
python download_models.py
uvicorn main:app --reloadFrontend:
cd web-app
npm install
npm run dev- No Audio: Ensure browser permissions are granted.
- Slow Performance: CPU inference is used by default. For faster results, uncomment GPU settings in
backend/config.pyif you have an Nvidia card.