Skip to content

Example of creating a cascaded VoiceAI application using Deepgram Flux

Notifications You must be signed in to change notification settings

jambonz/flux-voice-pipeline

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deepgram Flux Voice Pipeline

A complete example showing how to build ultra-responsive voice AI applications using jambonz, Deepgram Flux, and Anthropic Claude.

What This Demonstrates

This example showcases optimistic response generation - a technique that dramatically reduces perceived latency in voice AI conversations by:

  1. Starting LLM generation early when Flux detects a probable end-of-turn
  2. Quarantining responses until the turn is confirmed
  3. Gracefully handling false positives when users pause mid-sentence

The result: Natural, responsive conversations that feel much faster than traditional wait-for-silence approaches.

What is Deepgram Flux?

Deepgram Flux is an advanced speech recognition model that provides three turn-taking events to enable intelligent response timing:

  • EagerEndOfTurn - High probability the user finished speaking, but not certain yet
  • EndOfTurn - Definitive confirmation the user has finished
  • TurnResumed - User continued speaking after an eager prediction (false positive)

These events let you start processing before you're 100% certain the user is done, significantly reducing response latency.

Prerequisites

  • Node.js v18.19.0 or higher (required for Pino v10)
  • A jambonz account with websocket support
  • Deepgram API key with Flux access
  • Anthropic API key

Installation

npm install

Configuration

Set the following environment variables (or configure them in the jambonz portal):

  • ANTHROPIC_API_KEY - Your Anthropic API key
  • ANTHROPIC_MODEL - Claude model to use (e.g., claude-3-5-sonnet-20241022)
  • LLM_SYSTEM_PROMPT - System prompt for Claude
  • EOT_THRESHOLD - Deepgram Flux end-of-turn confidence threshold (0-1, default: 0.7)
  • EAGER_EOT_THRESHOLD - Deepgram Flux eager end-of-turn threshold (0-1, default: 0.5)

Running the Application

npm start

The websocket server will listen on port 3000 (or WS_PORT if set).

How It Works

The Quarantine Pattern

This application implements a "quarantine pattern" to handle the uncertainty of EagerEndOfTurn:

User speaks → EagerEndOfTurn fires
              ↓
         Start LLM stream immediately
              ↓
         Hold tokens in "quarantine"
              ↓
    ┌─────────┴─────────┐
    ↓                   ↓
EndOfTurn          TurnResumed
    ↓                   ↓
Release tokens     Discard tokens
Stream to TTS      Abort LLM stream

State Machine

The application tracks three states:

  • initial - No active speech processing
  • eager_eot - In quarantine mode (holding LLM response tokens)
  • eot - Confirmed turn end (streaming tokens to TTS)

Code Structure

The main logic is in lib/routes/flux-voice-pipeline.js:

  • TurnResumed Handler (lines 122-161) - Aborts stream, discards quarantined tokens, removes incorrect transcript
  • EagerEndOfTurn Handler (lines 163-295) - Starts LLM stream, quarantines response tokens
  • EndOfTurn Handler (lines 297-425) - Either releases quarantined tokens OR starts new stream

Each handler is fully commented to explain the flow.

Key jambonz Features Used

Deepgram Flux Configuration

session.config({
  recognizer: {
    vendor: 'deepgramflux',
    language: 'en-US',
    deepgramOptions: {
      eotThreshold: 0.7,
      eagerEotThreshold: 0.5
    }
  }
})

Streaming TTS

session.sendTtsTokens(tokens)  // Stream tokens as they arrive
session.flushTtsTokens()       // Signal end of response

Barge-In Detection

The application handles user interrupts via the tts:user_interrupt event, allowing natural conversation flow where users can interrupt the AI mid-response.

Tuning the Thresholds

Adjust these values based on your use case:

  • Higher eagerEotThreshold (e.g., 0.7) - Fewer false positives, but less latency improvement
  • Lower eagerEotThreshold (e.g., 0.3) - More aggressive optimization, but more false positives
  • eotThreshold should always be higher than eagerEotThreshold

Monitoring

The application logs state transitions and timing:

[STATE CHANGE] initial -> eager_eot, transcript: "Hi, I need help with..."
first token after 534ms: "Sure"
quarantining 12 chars (total: 45)
[STATE CHANGE] eager_eot -> eot, releasing 127 quarantined chars: "Sure, I'd be ha"
[STATE CHANGE] eot -> initial

This makes it easy to understand the flow and measure performance improvements.

Understanding the Code

The implementation is straightforward and well-commented:

  • lib/routes/flux-voice-pipeline.js - The main implementation with detailed comments explaining the Flux state machine and quarantine pattern
  • app.json - Environment variable schema defining required configuration (API keys, thresholds, prompts)

Start by reading the file header in flux-voice-pipeline.js for an overview, then walk through each of the three event handlers to understand the flow.

Learn More

License

MIT

About

This application was created with create-jambonz-ws-app and demonstrates best practices for building responsive voice AI applications with jambonz and Deepgram Flux.

About

Example of creating a cascaded VoiceAI application using Deepgram Flux

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Languages

  • JavaScript 100.0%