A complete example showing how to build ultra-responsive voice AI applications using jambonz, Deepgram Flux, and Anthropic Claude.
This example showcases optimistic response generation - a technique that dramatically reduces perceived latency in voice AI conversations by:
- Starting LLM generation early when Flux detects a probable end-of-turn
- Quarantining responses until the turn is confirmed
- Gracefully handling false positives when users pause mid-sentence
The result: Natural, responsive conversations that feel much faster than traditional wait-for-silence approaches.
Deepgram Flux is an advanced speech recognition model that provides three turn-taking events to enable intelligent response timing:
EagerEndOfTurn- High probability the user finished speaking, but not certain yetEndOfTurn- Definitive confirmation the user has finishedTurnResumed- User continued speaking after an eager prediction (false positive)
These events let you start processing before you're 100% certain the user is done, significantly reducing response latency.
- Node.js v18.19.0 or higher (required for Pino v10)
- A jambonz account with websocket support
- Deepgram API key with Flux access
- Anthropic API key
npm installSet the following environment variables (or configure them in the jambonz portal):
ANTHROPIC_API_KEY- Your Anthropic API keyANTHROPIC_MODEL- Claude model to use (e.g.,claude-3-5-sonnet-20241022)LLM_SYSTEM_PROMPT- System prompt for ClaudeEOT_THRESHOLD- Deepgram Flux end-of-turn confidence threshold (0-1, default: 0.7)EAGER_EOT_THRESHOLD- Deepgram Flux eager end-of-turn threshold (0-1, default: 0.5)
npm startThe websocket server will listen on port 3000 (or WS_PORT if set).
This application implements a "quarantine pattern" to handle the uncertainty of EagerEndOfTurn:
User speaks → EagerEndOfTurn fires
↓
Start LLM stream immediately
↓
Hold tokens in "quarantine"
↓
┌─────────┴─────────┐
↓ ↓
EndOfTurn TurnResumed
↓ ↓
Release tokens Discard tokens
Stream to TTS Abort LLM stream
The application tracks three states:
initial- No active speech processingeager_eot- In quarantine mode (holding LLM response tokens)eot- Confirmed turn end (streaming tokens to TTS)
The main logic is in lib/routes/flux-voice-pipeline.js:
- TurnResumed Handler (lines 122-161) - Aborts stream, discards quarantined tokens, removes incorrect transcript
- EagerEndOfTurn Handler (lines 163-295) - Starts LLM stream, quarantines response tokens
- EndOfTurn Handler (lines 297-425) - Either releases quarantined tokens OR starts new stream
Each handler is fully commented to explain the flow.
session.config({
recognizer: {
vendor: 'deepgramflux',
language: 'en-US',
deepgramOptions: {
eotThreshold: 0.7,
eagerEotThreshold: 0.5
}
}
})session.sendTtsTokens(tokens) // Stream tokens as they arrive
session.flushTtsTokens() // Signal end of responseThe application handles user interrupts via the tts:user_interrupt event, allowing natural conversation flow where users can interrupt the AI mid-response.
Adjust these values based on your use case:
- Higher
eagerEotThreshold(e.g., 0.7) - Fewer false positives, but less latency improvement - Lower
eagerEotThreshold(e.g., 0.3) - More aggressive optimization, but more false positives eotThresholdshould always be higher thaneagerEotThreshold
The application logs state transitions and timing:
[STATE CHANGE] initial -> eager_eot, transcript: "Hi, I need help with..."
first token after 534ms: "Sure"
quarantining 12 chars (total: 45)
[STATE CHANGE] eager_eot -> eot, releasing 127 quarantined chars: "Sure, I'd be ha"
[STATE CHANGE] eot -> initial
This makes it easy to understand the flow and measure performance improvements.
The implementation is straightforward and well-commented:
- lib/routes/flux-voice-pipeline.js - The main implementation with detailed comments explaining the Flux state machine and quarantine pattern
- app.json - Environment variable schema defining required configuration (API keys, thresholds, prompts)
Start by reading the file header in flux-voice-pipeline.js for an overview, then walk through each of the three event handlers to understand the flow.
MIT
This application was created with create-jambonz-ws-app and demonstrates best practices for building responsive voice AI applications with jambonz and Deepgram Flux.