- Introduction
- Tech Stack
- Features
- Quick Start
- Architecture
- Configuration
- Project Structure
- Development Guide
Media-UI is a full-featured testing platform for voice-based autonomous agents, providing real-time audio streaming, speech recognition debugging, and comprehensive latency analytics.
Built for:
- ✅ QA & Testing – Validate STT accuracy, TTS quality, and agent responses
- ✅ Performance Analysis – Track latency metrics, silence gaps, and barge-in behavior
- ✅ Debugging – Export full conversation logs, recordings, and metrics
- ✅ Demos & Presentations – Clean chat UI with real-time agent interaction
⚠️ Note: This is a testing/debugging tool, not a production voice application. Focus is on observability and developer experience.
- Next.js 15 – React framework with App Router
- React 19 – Latest features with concurrent rendering
- TypeScript 5 – Full type safety
- Tailwind CSS 4 – Utility-first styling
- Web Audio API – AudioWorklet for microphone capture & TTS playback
- Radix UI – Accessible dialog, tooltip, switch components
- Lucide Icons – Clean, consistent iconography
- WebSocket (ws) – Real-time bidirectional communication
- ConnectRPC – gRPC-web protocol over WebSocket
- Protocol Buffers – Type-safe message serialization
- ts-node – Direct TypeScript execution for server
- AudioWorklet – Low-latency PCM capture (
pcm-processor.js) - 16-bit LINEAR16 @ 16kHz – High-quality audio encoding
- µ-law decoding – TTS playback from backend
- WAV export – Mixed recordings with real-time sync
- Docker – Multi-stage production builds
- PM2 – Process management for Next.js + WebSocket server
- Protocol Buffers – Generated TypeScript types from
.protofiles
- Microphone capture via AudioWorklet (128-sample quantum)
- Buffered streaming with 40ms intervals
- Automatic AudioContext resume handling
- Device selection support
- Interim and final transcription results
- Start-of-input (SOI) and end-of-input (EOI) events
- Barge-in detection and handling
- Live text updates during speech
- Queue-based audio playback
- Interruptible during barge-in
- µ-law and WAV format support
- Chunk-level playback tracking
- Real-time message bubbles (user + agent)
- Millisecond-precision timestamps
- Connection status indicator
- Call duration timer
- Call-level: Start latency, greeting playback time
- Per-dialogue:
- First interim result latency
- Customer utterance length
- Prompt playback time
- Silence gaps (pre/post agent response)
- Barge-in latency
- Audio chunks sent
- Expandable metrics panel with visual indicators
- Mixed Recording: Caller + Agent audio synchronized
- Backend Logs: Full conversation with scrubbed audio payloads
- Transcript: HTML export with timestamps
- Kibana Link: Direct link to orchestrator logs
- WebSocket reconnection logic
- gRPC stream error recovery
- User-friendly error messages
- Comprehensive client-side logging
- Node.js 22.x (nvm)
- pnpm (enable with
corepack enable)
# 1. Install dependencies
pnpm install
# 2. Start WebSocket server (terminal 1)
pnpm dev:server
# Runs on ws://localhost:3001/ws
# 3. Start Next.js frontend (terminal 2)
pnpm dev
# Runs on http://localhost:3000
# Or start both concurrently:
pnpm dev:allVisit http://localhost:3000 → Configure connection → Start call
# Build image
docker build -t media-ui .
# Run container
docker run -d \
-p 3000:3000 \
-p 3001:3001 \
--name media-ui \
media-ui
# Check logs
docker logs -f media-uiServices:
- Frontend: http://localhost:3000
- WebSocket: ws://localhost:3001/ws
# Development
pnpm dev # Next.js dev server (port 3000)
pnpm dev:server # WebSocket server (port 3001)
pnpm dev:all # Start both with concurrently
# Production
pnpm build # Build Next.js app
pnpm start # Start production server
# Utilities
pnpm lint # ESLint checks
pnpm typecheck # TypeScript validation WebSocket (JSON/Protobuf) gRPC (Protobuf)
┌──────────────────────────────────────────┐ ┌──────────────────────────┐
│ │ │ │
│ ▼ ▼ │
┌───┴────────────┐ ┌─────────────────┐ ┌────┴─────────────┐
│ │ │ │ │ │
│ Next.js │◀────────────────────▶│ Node.js │◀────────────▶│ Universal │
│ Frontend │ │ WebSocket │ │ Harness │
│ │ Bidirectional │ Bridge │ Bidirectional│ (Backend) │
│ (Port 3000) │ Streaming │ (Port 3001) │ Streaming │ │
│ │ │ │ │ │
└────────┬───────┘ └────────┬────────┘ └──────────────────┘
│ │
│ ┌─────────────────────────────────────┘
│ │
│ │ • Bearer Token (JWT)
│ │ • Orchestrator Host URL
│ │ • Org ID / Conversation ID
│ │ • Language & Agent Config
│ │
▼ ▼
┌─────────────────┐
│ AudioWorklet │
│ PCM Processor │
├─────────────────┤
│ • 16-bit PCM │
│ • 16 kHz │
│ • 128 samples │
│ • 40ms buffer │
└─────────────────┘
│
▼
┌─────────────────┐
│ Microphone │
│ Hardware │
└─────────────────┘
IDLE
↓ startCall()
CALL_START (greeting)
↓ greeting received + played
AUDIO_STREAMING (duplex)
↓ user speaks → ASR → VA response
↓ loop until endCall()
CALL_END
↓ cleanup
ENDED
1. User speaks → AudioWorklet captures PCM
2. UseMicrophone hook → sendAudioChunk()
3. CallStateMachine → buffers 40ms chunks
4. WebSocket → sends to Node.js bridge
5. Bridge → forwards to gRPC backend
6. Backend → ASR (interim/final) + VA response
7. WebSocket ← receives response with TTS audio
8. TTSPlayer → decodes µ-law → plays via Web Audio
9. UI updates with transcript + metrics
Create .env.local:
# WebSocket URL (auto-detected if not set)
NEXT_PUBLIC_WS_URL=ws://localhost:3001/wsConfigure via UI (stored in localStorage):
| Field | Description | Example |
|---|---|---|
| Host | Orchestrator gRPC endpoint | https://orchestrator.example.com |
| Bearer Token | Authentication JWT | eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... |
| Language | Speech recognition language | en-US, en-IN, fr-FR |
| OrgId | Organization UUID | 12345678-1234-1234-1234-123456789abc |
| ConversationId | Unique conversation UUID | Auto-generated or manual |
| VirtualAgentId | Agent configuration ID | agent-abc123 |
| WxCC ClusterId | Cluster routing identifier | intgus1 |
| User Agent | Client identifier | web-ui |
| Microphone | Audio input device | Selected from browser enumeration |
media-ui/
├── src/
│ ├── app/
│ │ ├── page.tsx # Main entry (ChatApp wrapper)
│ │ ├── layout.tsx # Root layout with fonts
│ │ └── globals.css # Tailwind directives
│ │
│ ├── components/
│ │ ├── ChatApp.tsx # Top-level config + chat manager
│ │ ├── ChatBotUI.tsx # Main chat interface
│ │ ├── ChatBubble.tsx # Message display component
│ │ ├── ChatControls.tsx # Start/stop/mic buttons
│ │ ├── ChatMetricsPanel.tsx # Metrics sidebar
│ │ ├── ConfigScreen.tsx # Connection configuration form
│ │ ├── ConnectionIndicator.tsx
│ │ ├── LatencyMetricsDisplay.tsx
│ │ ├── TranscriptExporter.tsx
│ │
│ │
│ ├── state/
│ │ ├── CallStateMachine.ts # FSM orchestration
│ │ └── types.ts # CallState enum + types
│ │
│ ├── grpc/
│ │ ├── bridgingClient.ts # WebSocket ↔ gRPC bridge
│ │ ├── generated/ # Protobuf TypeScript files
│ │ │ ├── InsightInfer_pb.ts
│ │ │ ├── InsightInfer_connect.ts
│ │ │ ├── virtualagent_pb.ts
│ │ │
│ │ └── protos/ # .proto source files
│ │
│ ├── lib/
│ │ └── audio/
│ │ ├── TTSPlayer.ts # TTS playback queue
│ │ ├── wavRecorder.ts # WAV export utilities
│ │ ├── recordingBuilder.ts # Mixed audio timeline
│ │ └── recStore.ts # IndexedDB storage
│ │
│ ├── hooks/
│ │ └── UseMicrophone.ts # AudioWorklet integration
│ │
│ ├── server/
│ │ ├── wsServer.ts # WebSocket server (port 3001)
│ │ ├── grpcTransport.ts # gRPC client setup
│ │ ├── enumMapper.ts # Protobuf enum conversions
│ │ ├── PushableStream.ts # Async iterable stream
│ │ ├── utils.ts # Base64 + logging helpers
│ │ └── logger.ts # Structured logging
│ │
│ ├── config/
│ │ └── appProperties.ts # Audio constants
│ │
│ └── scripts/
│ └── generate_protos.sh # Protobuf codegen
│
├── public/
│ └── pcm-processor.js # AudioWorklet processor
│
├── docs/
│ ├── tool.png # UI screenshot
│ ├── Class Diagram.png # Architecture diagram
│ └── media-ui-sequence-diagram.png
│
├── Dockerfile # Multi-stage production build
├── ecosystem.config.js # PM2 configuration
├── next.config.ts # Next.js configuration
├── tsconfig.json # TypeScript config
├── tailwind.config.ts # Tailwind setup
└── package.json # Dependencies + scripts
# Install buf CLI (first time)
brew install bufbuild/buf/buf
# Generate TypeScript files from .proto
cd src/scripts
bash generate_protos.sh
# Or manually:
npx buf generate --path src/grpc/protosExample: Add "Call Recording Export to S3"
// 1. Update CallStateMachine.ts
public async endCall() {
const recordings = await this.getRecordings();
// New: Upload to S3
if (recordings.mixed) {
await uploadToS3(recordings.mixed, this.config.conversationId);
}
return recordings;
}
// 2. Create upload utility (lib/storage/s3.ts)
export async function uploadToS3(blob: Blob, convId: string) {
const formData = new FormData();
formData.append('file', blob, `${convId}.wav`);
await fetch('/api/upload', {
method: 'POST',
body: formData
});
}
// 3. Add API route (app/api/upload/route.ts)
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
export async function POST(req: Request) {
const formData = await req.formData();
const file = formData.get('file') as File;
// Upload to S3...
return Response.json({ url: s3Url });
}# Check server is running
curl http://localhost:3001
# Test WebSocket with wscat
npm install -g wscat
wscat -c ws://localhost:3001/ws
> {"ping":1}
# Check browser console for connection errors# Verify microphone permissions in browser
# Chrome: Settings → Privacy → Microphone
# Check AudioWorklet loading
# Browser console should show: "Microphone: Loaded PCM processor"
# Test with different sample rate
# Edit src/config/appProperties.ts:
FIXED_SAMPLE_RATE: 8000 # Try 8kHz instead of 16kHz# Check token expiration
# JWT decode: https://jwt.io
# Verify host URL format
# Must include https:// protocol
# Check backend logs for auth failures| Issue | Solution |
|---|---|
| "No token provided" | Enter valid bearer token in config screen |
| "AudioContext suspended" | Click anywhere on page to trigger user gesture |
| "WebSocket closed" | Restart ws-server: pnpm dev:server |
| "VA greeting timeout" | Check virtualAgentId is valid in config |
| Choppy audio playback | Reduce network latency or increase buffer size |
| Recording export fails | Check browser IndexedDB quota (clear if full) |
This is a debugging and testing platform. For production voice applications:
✓ Implement proper authentication ✓ Add rate limiting ✓ Secure WebSocket connections (WSS) ✓ Add monitoring/alerting
For architecture details and flow diagrams, see the docs/ folder
