Skip to content

Real-time voice agent testing platform with STT→LLM→TTS debugging, latency analytics, and conversation export

Notifications You must be signed in to change notification settings

rostwal95/media-ui

Repository files navigation

Media-UI — Real-Time Voice Agent Testing Platform


Next.js React TypeScript Node.js WebSocket gRPC Tailwind CSS

Debug & Test Voice-Based Autonomous Agents

Real-time STT → LLM → TTS testing with latency analytics, barge-in support, and conversation export

UI Screenshot

📋 Table of Contents

🚀 Introduction

Media-UI is a full-featured testing platform for voice-based autonomous agents, providing real-time audio streaming, speech recognition debugging, and comprehensive latency analytics.

Built for:

  • QA & Testing – Validate STT accuracy, TTS quality, and agent responses
  • Performance Analysis – Track latency metrics, silence gaps, and barge-in behavior
  • Debugging – Export full conversation logs, recordings, and metrics
  • Demos & Presentations – Clean chat UI with real-time agent interaction

⚠️ Note: This is a testing/debugging tool, not a production voice application. Focus is on observability and developer experience.

⚙️ Tech Stack

Frontend (Next.js App)

  • Next.js 15 – React framework with App Router
  • React 19 – Latest features with concurrent rendering
  • TypeScript 5 – Full type safety
  • Tailwind CSS 4 – Utility-first styling
  • Web Audio API – AudioWorklet for microphone capture & TTS playback
  • Radix UI – Accessible dialog, tooltip, switch components
  • Lucide Icons – Clean, consistent iconography

Backend (Node.js WebSocket Bridge)

  • WebSocket (ws) – Real-time bidirectional communication
  • ConnectRPC – gRPC-web protocol over WebSocket
  • Protocol Buffers – Type-safe message serialization
  • ts-node – Direct TypeScript execution for server

Audio Processing

  • AudioWorklet – Low-latency PCM capture (pcm-processor.js)
  • 16-bit LINEAR16 @ 16kHz – High-quality audio encoding
  • µ-law decoding – TTS playback from backend
  • WAV export – Mixed recordings with real-time sync

Infrastructure

  • Docker – Multi-stage production builds
  • PM2 – Process management for Next.js + WebSocket server
  • Protocol Buffers – Generated TypeScript types from .proto files

⚡ Features

🎤 Real-Time Audio Streaming

  • Microphone capture via AudioWorklet (128-sample quantum)
  • Buffered streaming with 40ms intervals
  • Automatic AudioContext resume handling
  • Device selection support

🧠 Speech Recognition

  • Interim and final transcription results
  • Start-of-input (SOI) and end-of-input (EOI) events
  • Barge-in detection and handling
  • Live text updates during speech

🔊 Text-to-Speech Playback

  • Queue-based audio playback
  • Interruptible during barge-in
  • µ-law and WAV format support
  • Chunk-level playback tracking

💬 Chat Interface

  • Real-time message bubbles (user + agent)
  • Millisecond-precision timestamps
  • Connection status indicator
  • Call duration timer

📊 Latency Metrics

  • Call-level: Start latency, greeting playback time
  • Per-dialogue:
    • First interim result latency
    • Customer utterance length
    • Prompt playback time
    • Silence gaps (pre/post agent response)
    • Barge-in latency
    • Audio chunks sent
  • Expandable metrics panel with visual indicators

📤 Export Capabilities

  • Mixed Recording: Caller + Agent audio synchronized
  • Backend Logs: Full conversation with scrubbed audio payloads
  • Transcript: HTML export with timestamps
  • Kibana Link: Direct link to orchestrator logs

🛡️ Error Handling

  • WebSocket reconnection logic
  • gRPC stream error recovery
  • User-friendly error messages
  • Comprehensive client-side logging

🚀 Quick Start

Prerequisites

  • Node.js 22.x (nvm)
  • pnpm (enable with corepack enable)

Local Development

# 1. Install dependencies
pnpm install

# 2. Start WebSocket server (terminal 1)
pnpm dev:server
# Runs on ws://localhost:3001/ws

# 3. Start Next.js frontend (terminal 2)
pnpm dev
# Runs on http://localhost:3000

# Or start both concurrently:
pnpm dev:all

Visit http://localhost:3000 → Configure connection → Start call

Docker Deployment

# Build image
docker build -t media-ui .

# Run container
docker run -d \
  -p 3000:3000 \
  -p 3001:3001 \
  --name media-ui \
  media-ui

# Check logs
docker logs -f media-ui

Services:

Available Scripts

# Development
pnpm dev              # Next.js dev server (port 3000)
pnpm dev:server       # WebSocket server (port 3001)
pnpm dev:all          # Start both with concurrently

# Production
pnpm build            # Build Next.js app
pnpm start            # Start production server

# Utilities
pnpm lint             # ESLint checks
pnpm typecheck        # TypeScript validation

🏗️ Architecture

High-Level Flow

                        WebSocket (JSON/Protobuf)           gRPC (Protobuf)
    ┌──────────────────────────────────────────┐    ┌──────────────────────────┐
    │                                          │    │                          │
    │                                          ▼    ▼                          │
┌───┴────────────┐                      ┌─────────────────┐              ┌────┴─────────────┐
│                │                      │                 │              │                  │
│    Next.js     │◀────────────────────▶│    Node.js      │◀────────────▶│    Universal     │
│    Frontend    │                      │  WebSocket      │              │     Harness      │
│                │   Bidirectional      │    Bridge       │ Bidirectional│    (Backend)     │
│  (Port 3000)   │   Streaming          │  (Port 3001)    │  Streaming   │                  │
│                │                      │                 │              │                  │
└────────┬───────┘                      └────────┬────────┘              └──────────────────┘
         │                                       │
         │ ┌─────────────────────────────────────┘
         │ │
         │ │  • Bearer Token (JWT)
         │ │  • Orchestrator Host URL
         │ │  • Org ID / Conversation ID
         │ │  • Language & Agent Config
         │ │
         ▼ ▼
    ┌─────────────────┐
    │  AudioWorklet   │
    │  PCM Processor  │
    ├─────────────────┤
    │  • 16-bit PCM   │
    │  • 16 kHz       │
    │  • 128 samples  │
    │  • 40ms buffer  │
    └─────────────────┘
         │
         ▼
    ┌─────────────────┐
    │   Microphone    │
    │   Hardware      │
    └─────────────────┘

Call State Machine

IDLE
  ↓ startCall()
CALL_START (greeting)
  ↓ greeting received + played
AUDIO_STREAMING (duplex)
  ↓ user speaks → ASR → VA response
  ↓ loop until endCall()
CALL_END
  ↓ cleanup
ENDED

Data Flow: Voice Interaction

1. User speaks → AudioWorklet captures PCM
2. UseMicrophone hook → sendAudioChunk()
3. CallStateMachine → buffers 40ms chunks
4. WebSocket → sends to Node.js bridge
5. Bridge → forwards to gRPC backend
6. Backend → ASR (interim/final) + VA response
7. WebSocket ← receives response with TTS audio
8. TTSPlayer → decodes µ-law → plays via Web Audio
9. UI updates with transcript + metrics

⚙️ Configuration

Environment Variables

Create .env.local:

# WebSocket URL (auto-detected if not set)
NEXT_PUBLIC_WS_URL=ws://localhost:3001/ws

Connection Settings

Configure via UI (stored in localStorage):

Field Description Example
Host Orchestrator gRPC endpoint https://orchestrator.example.com
Bearer Token Authentication JWT eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
Language Speech recognition language en-US, en-IN, fr-FR
OrgId Organization UUID 12345678-1234-1234-1234-123456789abc
ConversationId Unique conversation UUID Auto-generated or manual
VirtualAgentId Agent configuration ID agent-abc123
WxCC ClusterId Cluster routing identifier intgus1
User Agent Client identifier web-ui
Microphone Audio input device Selected from browser enumeration

📁 Project Structure

media-ui/
├── src/
│   ├── app/
│   │   ├── page.tsx              # Main entry (ChatApp wrapper)
│   │   ├── layout.tsx            # Root layout with fonts
│   │   └── globals.css           # Tailwind directives
│   │
│   ├── components/
│   │   ├── ChatApp.tsx           # Top-level config + chat manager
│   │   ├── ChatBotUI.tsx         # Main chat interface
│   │   ├── ChatBubble.tsx        # Message display component
│   │   ├── ChatControls.tsx      # Start/stop/mic buttons
│   │   ├── ChatMetricsPanel.tsx  # Metrics sidebar
│   │   ├── ConfigScreen.tsx      # Connection configuration form
│   │   ├── ConnectionIndicator.tsx
│   │   ├── LatencyMetricsDisplay.tsx
│   │   ├── TranscriptExporter.tsx
│   │
│   │
│   ├── state/
│   │   ├── CallStateMachine.ts   # FSM orchestration
│   │   └── types.ts              # CallState enum + types
│   │
│   ├── grpc/
│   │   ├── bridgingClient.ts     # WebSocket ↔ gRPC bridge
│   │   ├── generated/            # Protobuf TypeScript files
│   │   │   ├── InsightInfer_pb.ts
│   │   │   ├── InsightInfer_connect.ts
│   │   │   ├── virtualagent_pb.ts
│   │   │
│   │   └── protos/               # .proto source files
│   │
│   ├── lib/
│   │   └── audio/
│   │       ├── TTSPlayer.ts      # TTS playback queue
│   │       ├── wavRecorder.ts    # WAV export utilities
│   │       ├── recordingBuilder.ts # Mixed audio timeline
│   │       └── recStore.ts       # IndexedDB storage
│   │
│   ├── hooks/
│   │   └── UseMicrophone.ts      # AudioWorklet integration
│   │
│   ├── server/
│   │   ├── wsServer.ts           # WebSocket server (port 3001)
│   │   ├── grpcTransport.ts      # gRPC client setup
│   │   ├── enumMapper.ts         # Protobuf enum conversions
│   │   ├── PushableStream.ts     # Async iterable stream
│   │   ├── utils.ts              # Base64 + logging helpers
│   │   └── logger.ts             # Structured logging
│   │
│   ├── config/
│   │   └── appProperties.ts      # Audio constants
│   │
│   └── scripts/
│       └── generate_protos.sh    # Protobuf codegen
│
├── public/
│   └── pcm-processor.js          # AudioWorklet processor
│
├── docs/
│   ├── tool.png                  # UI screenshot
│   ├── Class Diagram.png         # Architecture diagram
│   └── media-ui-sequence-diagram.png
│
├── Dockerfile                    # Multi-stage production build
├── ecosystem.config.js           # PM2 configuration
├── next.config.ts                # Next.js configuration
├── tsconfig.json                 # TypeScript config
├── tailwind.config.ts            # Tailwind setup
└── package.json                  # Dependencies + scripts

🛠️ Development Guide

Generating Protobuf Files

# Install buf CLI (first time)
brew install bufbuild/buf/buf

# Generate TypeScript files from .proto
cd src/scripts
bash generate_protos.sh

# Or manually:
npx buf generate --path src/grpc/protos

Adding a New Feature

Example: Add "Call Recording Export to S3"

// 1. Update CallStateMachine.ts
public async endCall() {
  const recordings = await this.getRecordings();

  // New: Upload to S3
  if (recordings.mixed) {
    await uploadToS3(recordings.mixed, this.config.conversationId);
  }

  return recordings;
}

// 2. Create upload utility (lib/storage/s3.ts)
export async function uploadToS3(blob: Blob, convId: string) {
  const formData = new FormData();
  formData.append('file', blob, `${convId}.wav`);

  await fetch('/api/upload', {
    method: 'POST',
    body: formData
  });
}

// 3. Add API route (app/api/upload/route.ts)
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';

export async function POST(req: Request) {
  const formData = await req.formData();
  const file = formData.get('file') as File;

  // Upload to S3...
  return Response.json({ url: s3Url });
}

Debugging Tips

WebSocket connection issues

# Check server is running
curl http://localhost:3001

# Test WebSocket with wscat
npm install -g wscat
wscat -c ws://localhost:3001/ws
> {"ping":1}

# Check browser console for connection errors

Audio not capturing

# Verify microphone permissions in browser
# Chrome: Settings → Privacy → Microphone

# Check AudioWorklet loading
# Browser console should show: "Microphone: Loaded PCM processor"

# Test with different sample rate
# Edit src/config/appProperties.ts:
FIXED_SAMPLE_RATE: 8000  # Try 8kHz instead of 16kHz

gRPC errors

# Check token expiration
# JWT decode: https://jwt.io

# Verify host URL format
# Must include https:// protocol

# Check backend logs for auth failures

Common Issues

Issue Solution
"No token provided" Enter valid bearer token in config screen
"AudioContext suspended" Click anywhere on page to trigger user gesture
"WebSocket closed" Restart ws-server: pnpm dev:server
"VA greeting timeout" Check virtualAgentId is valid in config
Choppy audio playback Reduce network latency or increase buffer size
Recording export fails Check browser IndexedDB quota (clear if full)


⚠️ Testing Tool Disclaimer

This is a debugging and testing platform. For production voice applications:

✓ Implement proper authentication   ✓ Add rate limiting   ✓ Secure WebSocket connections (WSS)   ✓ Add monitoring/alerting


For architecture details and flow diagrams, see the docs/ folder

About

Real-time voice agent testing platform with STT→LLM→TTS debugging, latency analytics, and conversation export

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages