Deepgram Flux Voice Pipeline

A complete example showing how to build ultra-responsive voice AI applications using jambonz, Deepgram Flux, and Anthropic Claude.

What This Demonstrates

This example showcases optimistic response generation - a technique that dramatically reduces perceived latency in voice AI conversations by:

Starting LLM generation early when Flux detects a probable end-of-turn
Quarantining responses until the turn is confirmed
Gracefully handling false positives when users pause mid-sentence

The result: Natural, responsive conversations that feel much faster than traditional wait-for-silence approaches.

What is Deepgram Flux?

Deepgram Flux is an advanced speech recognition model that provides three turn-taking events to enable intelligent response timing:

EagerEndOfTurn - High probability the user finished speaking, but not certain yet
EndOfTurn - Definitive confirmation the user has finished
TurnResumed - User continued speaking after an eager prediction (false positive)

These events let you start processing before you're 100% certain the user is done, significantly reducing response latency.

Prerequisites

Node.js v18.19.0 or higher (required for Pino v10)
A jambonz account with websocket support
Deepgram API key with Flux access
Anthropic API key

Installation

npm install

Configuration

Set the following environment variables (or configure them in the jambonz portal):

ANTHROPIC_API_KEY - Your Anthropic API key
ANTHROPIC_MODEL - Claude model to use (e.g., claude-3-5-sonnet-20241022)
LLM_SYSTEM_PROMPT - System prompt for Claude
EOT_THRESHOLD - Deepgram Flux end-of-turn confidence threshold (0-1, default: 0.7)
EAGER_EOT_THRESHOLD - Deepgram Flux eager end-of-turn threshold (0-1, default: 0.5)

Running the Application

npm start

The websocket server will listen on port 3000 (or WS_PORT if set).

How It Works

The Quarantine Pattern

This application implements a "quarantine pattern" to handle the uncertainty of EagerEndOfTurn:

User speaks → EagerEndOfTurn fires
              ↓
         Start LLM stream immediately
              ↓
         Hold tokens in "quarantine"
              ↓
    ┌─────────┴─────────┐
    ↓                   ↓
EndOfTurn          TurnResumed
    ↓                   ↓
Release tokens     Discard tokens
Stream to TTS      Abort LLM stream

State Machine

The application tracks three states:

initial - No active speech processing
eager_eot - In quarantine mode (holding LLM response tokens)
eot - Confirmed turn end (streaming tokens to TTS)

Code Structure

The main logic is in lib/routes/flux-voice-pipeline.js:

TurnResumed Handler (lines 122-161) - Aborts stream, discards quarantined tokens, removes incorrect transcript
EagerEndOfTurn Handler (lines 163-295) - Starts LLM stream, quarantines response tokens
EndOfTurn Handler (lines 297-425) - Either releases quarantined tokens OR starts new stream

Each handler is fully commented to explain the flow.

Key jambonz Features Used

Deepgram Flux Configuration

session.config({
  recognizer: {
    vendor: 'deepgramflux',
    language: 'en-US',
    deepgramOptions: {
      eotThreshold: 0.7,
      eagerEotThreshold: 0.5
    }
  }
})

Streaming TTS

session.sendTtsTokens(tokens)  // Stream tokens as they arrive
session.flushTtsTokens()       // Signal end of response

Barge-In Detection

The application handles user interrupts via the tts:user_interrupt event, allowing natural conversation flow where users can interrupt the AI mid-response.

Tuning the Thresholds

Adjust these values based on your use case:

Higher eagerEotThreshold (e.g., 0.7) - Fewer false positives, but less latency improvement
Lower eagerEotThreshold (e.g., 0.3) - More aggressive optimization, but more false positives
eotThreshold should always be higher than eagerEotThreshold

Monitoring

The application logs state transitions and timing:

[STATE CHANGE] initial -> eager_eot, transcript: "Hi, I need help with..."
first token after 534ms: "Sure"
quarantining 12 chars (total: 45)
[STATE CHANGE] eager_eot -> eot, releasing 127 quarantined chars: "Sure, I'd be ha"
[STATE CHANGE] eot -> initial

This makes it easy to understand the flow and measure performance improvements.

Understanding the Code

The implementation is straightforward and well-commented:

lib/routes/flux-voice-pipeline.js - The main implementation with detailed comments explaining the Flux state machine and quarantine pattern
app.json - Environment variable schema defining required configuration (API keys, thresholds, prompts)

Start by reading the file header in flux-voice-pipeline.js for an overview, then walk through each of the three event handlers to understand the flow.

Learn More

License

MIT

About

This application was created with create-jambonz-ws-app and demonstrates best practices for building responsive voice AI applications with jambonz and Deepgram Flux.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
lib		lib
.gitignore		.gitignore
README.md		README.md
app.js		app.js
app.json		app.json
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deepgram Flux Voice Pipeline

What This Demonstrates

What is Deepgram Flux?

Prerequisites

Installation

Configuration

Running the Application

How It Works

The Quarantine Pattern

State Machine

Code Structure

Key jambonz Features Used

Deepgram Flux Configuration

Streaming TTS

Barge-In Detection

Tuning the Thresholds

Monitoring

Understanding the Code

Learn More

License

About

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Languages

Uh oh!

jambonz/flux-voice-pipeline

Folders and files

Latest commit

History

Repository files navigation

Deepgram Flux Voice Pipeline

What This Demonstrates

What is Deepgram Flux?

Prerequisites

Installation

Configuration

Running the Application

How It Works

The Quarantine Pattern

State Machine

Code Structure

Key jambonz Features Used

Deepgram Flux Configuration

Streaming TTS

Barge-In Detection

Tuning the Thresholds

Monitoring

Understanding the Code

Learn More

License

About

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Languages

Packages