A production-ready web application for real-time speech translation powered by Soniox AI. Speak naturally and see instant translations with intelligent voice activity detection.
TransLang's intuitive interface with multi-language support, voice activity detection, and real-time translation display
- Real-time Translation: Spoken language to target language with sub-500ms latency
- Natural Sentence Display: Complete sentences instead of fragmented pieces
- Speaker Diarization: Multi-speaker support with automatic speaker tracking and labeling
- Voice Activity Detection: Automatic finalization during speech pauses
- Intelligent Buffering: Smart sentence boundary detection for natural reading experience
- Live Updates: Watch translations appear as you speak
- Dual Display: View both translated and original text with speaker labels
- Smart Token Processing: Clean transcripts with deduplication
- Configurable Sensitivity: Adjustable silence detection thresholds
- Modern UI: Large, readable text with auto-scroll and color coding
- Node.js 18 or higher
- Soniox API key (sign up here)
- Modern browser with microphone support (Chrome, Firefox, Edge, Safari)
# Clone repository
git clone <your-repo-url>
cd translang-real_time_speech_to_translated_transcription_app
# Install dependencies
npm install
# Configure environment
cp .env.local.example .env.local
# Edit .env.local and add: SONIOX_SECRET_KEY=your_api_key_here
# Start development server
npm run dev
# Open http://localhost:3000- Grant microphone permission when prompted by your browser
- Configure Voice Activity Detection (optional):
- Toggle VAD on/off
- Adjust silence threshold: 300ms (fast) to 2000ms (slow)
- Default: 800ms (recommended for natural speech)
- Configure Sentence Mode (optional):
- Toggle Sentence Mode on/off (default: OFF for fastest display)
- Adjust sentence hold time: 300ms (fast) to 900ms (slow)
- Default: 600ms (balances readability with speed)
- Start Translation and speak in German
- View results:
- Green boxes: Final translations (confirmed)
- Blue italic text: Live translations (updating)
- Yellow boxes: Original German text (toggle to show/hide)
- Stop to end session gracefully or Cancel for immediate termination
- Framework: Next.js 14 with App Router
- Language: TypeScript 5
- UI: React 18
- Speech Recognition: Soniox Speech-to-Text API
- Voice Activity Detection: @echogarden/fvad-wasm
- Audio Processing: Web Audio API, MediaStream
The application processes audio through parallel pipelines:
- Audio Capture: Browser MediaStream API with optimized settings (16kHz, mono, noise suppression)
- Translation Stream: Soniox WebSocket connection for real-time speech-to-text and translation
- Voice Activity Detection: Parallel VAD processing for silence detection and auto-finalization
- Sentence Stitching: Optional intelligent buffering to create complete sentences (configurable hold times)
- Token Processing: Custom parser distinguishes between partial and final translation tokens
- UI Rendering: React components with auto-scroll and color-coded display
| Command | Description |
|---|---|
npm run dev |
Start development server |
npm run build |
Build for production |
npm run start |
Start production server |
npm run lint |
Run ESLint |
Create a .env.local file:
SONIOX_SECRET_KEY=your_soniox_api_key_hereTransLang can be deployed to AWS using Docker, ECR, and ECS Fargate. See the deployment/ directory for complete guides:
deployment/QUICK-START.md- Deploy in 30 minutesdeployment/AWS-DEPLOYMENT-GUIDE.md- Comprehensive guidedeployment/AWS-ARCHITECTURE.md- Infrastructure detailsdeployment/MONITORING.md- Operations and monitoring
Estimated AWS cost: $11-33/month
- Latency: 100-500ms end-to-end
- Memory: ~50-80MB including VAD
- CPU: Less than 10% average
- Network: Continuous WebSocket (low bandwidth)
- API keys stored server-side only
- Temporary keys generated for client use
- No sensitive data exposed to browser
- Secure WebSocket connections
- No conversation data persistence
| Browser | Support |
|---|---|
| Chrome | Recommended |
| Edge | Full support |
| Firefox | Full support |
| Safari | Supported |
Requires WebRTC, MediaStream API, and Web Audio API support.
Configured for optimal speech recognition:
- Sample rate: 16kHz
- Channels: Mono
- Echo cancellation: Enabled
- Noise suppression: Enabled
- Auto gain control: Enabled
├── app/
│ ├── api/soniox-temp-key/ # Secure API key generation
│ ├── globals.css # Styles and animations
│ ├── layout.tsx # Root layout
│ └── page.tsx # Main page
├── components/
│ ├── TranscriptDisplay.tsx # Translation display
│ ├── TranslatorControls.tsx # Control panel
│ ├── VADSettings.tsx # VAD configuration
│ └── SentenceSettings.tsx # Sentence mode configuration
├── hooks/
│ └── useTranslator.ts # Translation state management
├── utils/
│ ├── tokenParser.ts # Token processing
│ ├── vadManager.ts # VAD wrapper
│ ├── keepaliveManager.ts # Connection keepalive
│ └── sentenceStitcher.ts # Sentence stitching logic
├── types/
│ └── soniox.ts # TypeScript definitions
└── deployment/ # AWS deployment guides
Microphone not working
- Check browser permissions in address bar
- Ensure microphone isn't used by another application
- Verify system microphone settings
No translations appearing
- Verify
SONIOX_SECRET_KEYin.env.local - Restart development server after environment changes
- Check browser console for errors
- Confirm API key is valid and active
High latency
- Check internet connection speed
- Close unnecessary browser tabs
- Disable browser extensions temporarily
VAD not detecting pauses
- Ensure VAD toggle is enabled
- Adjust silence threshold (try 800ms default)
- Speak clearly with natural pauses
MIT License - see LICENSE file for details.
- Soniox for speech-to-text and translation API
- @echogarden/fvad-wasm for voice activity detection
- Next.js and TypeScript communities
TransLang - Breaking language barriers in real-time
