|
| 1 | +# Intelligent Interruption Handling - Backchanneling Detection |
| 2 | + |
| 3 | +## 🎯 Overview |
| 4 | + |
| 5 | +This implementation adds **context-aware backchanneling detection** to the LiveKit Agents framework. The agent intelligently distinguishes between passive acknowledgments ("yeah", "ok", "hmm") and active interruptions ("wait", "stop", "no"). |
| 6 | + |
| 7 | +### The Problem Solved |
| 8 | +Previously, the AI agent would stop speaking whenever it detected any user voice activity. This caused unnatural conversation flow - even when the user was just saying "yeah" to show they were listening. |
| 9 | + |
| 10 | +### The Solution |
| 11 | +**Background Speech Processing**: The agent processes user speech in the background while continuing to speak. Only when STT confirms an interrupt word does the agent stop. |
| 12 | + |
| 13 | +- User says "yeah" while agent speaks → Agent continues seamlessly |
| 14 | +- User says "wait" while agent speaks → Agent stops and listens |
| 15 | + |
| 16 | +--- |
| 17 | + |
| 18 | +## ✅ Features Implemented |
| 19 | + |
| 20 | +| Feature | Description | |
| 21 | +|---------|-------------| |
| 22 | +| **Configurable Word Lists** | Easily customizable backchanneling and interrupt word sets | |
| 23 | +| **Multi-language Support** | Built-in words for English, Hindi, Spanish, French | |
| 24 | +| **State-Based Filtering** | Backchanneling only ignored when agent is speaking | |
| 25 | +| **Semantic Interruption** | Mixed inputs like "yeah but wait" correctly trigger interruption | |
| 26 | +| **Seamless Speech** | Agent continues without pause during backchanneling | |
| 27 | + |
| 28 | +--- |
| 29 | + |
| 30 | +## 📁 Files Changed |
| 31 | + |
| 32 | +``` |
| 33 | +livekit-agents/livekit/agents/voice/ |
| 34 | +├── backchanneling_config.py # NEW - Configuration and word lists |
| 35 | +├── agent_activity.py # MODIFIED - Integration hooks |
| 36 | +├── agent_session.py # MODIFIED - Config parameter |
| 37 | +└── __init__.py # MODIFIED - Exports |
| 38 | +``` |
| 39 | + |
| 40 | +--- |
| 41 | + |
| 42 | +## 🚀 Quick Start |
| 43 | + |
| 44 | +### Basic Usage (Default Configuration) |
| 45 | + |
| 46 | +```python |
| 47 | +from livekit.agents.voice import AgentSession |
| 48 | + |
| 49 | +# Backchanneling detection is enabled by default |
| 50 | +session = AgentSession( |
| 51 | + stt="deepgram/nova-3", |
| 52 | + llm="openai/gpt-4o-mini", |
| 53 | + tts="cartesia/sonic-2", |
| 54 | +) |
| 55 | +``` |
| 56 | + |
| 57 | +### Custom Configuration |
| 58 | + |
| 59 | +```python |
| 60 | +from livekit.agents.voice import AgentSession |
| 61 | +from livekit.agents.voice.backchanneling_config import create_config |
| 62 | + |
| 63 | +# Add custom words |
| 64 | +custom_config = create_config( |
| 65 | + backchanneling_words={"roger", "copy that", "understood"}, |
| 66 | + interrupt_words={"cancel", "abort", "emergency"}, |
| 67 | +) |
| 68 | + |
| 69 | +session = AgentSession( |
| 70 | + stt="deepgram/nova-3", |
| 71 | + llm="openai/gpt-4o-mini", |
| 72 | + tts="cartesia/sonic-2", |
| 73 | + backchanneling_config=custom_config, |
| 74 | +) |
| 75 | +``` |
| 76 | + |
| 77 | +### Disable Backchanneling Detection |
| 78 | + |
| 79 | +```python |
| 80 | +from livekit.agents.voice.backchanneling_config import create_config |
| 81 | + |
| 82 | +config = create_config(enabled=False) |
| 83 | + |
| 84 | +session = AgentSession( |
| 85 | + ..., |
| 86 | + backchanneling_config=config, |
| 87 | +) |
| 88 | +``` |
| 89 | + |
| 90 | +--- |
| 91 | + |
| 92 | +## 🔧 Configuration Options |
| 93 | + |
| 94 | +### BackchannelingConfig Parameters |
| 95 | + |
| 96 | +| Parameter | Type | Default | Description | |
| 97 | +|-----------|------|---------|-------------| |
| 98 | +| `enabled` | bool | `True` | Enable/disable backchanneling detection | |
| 99 | +| `backchanneling_words` | FrozenSet[str] | See below | Words to ignore when agent is speaking | |
| 100 | +| `interrupt_words` | FrozenSet[str] | See below | Words that always trigger interruption | |
| 101 | + |
| 102 | +### Default Word Lists |
| 103 | + |
| 104 | +**Backchanneling Words (Ignored when agent speaks):** |
| 105 | +``` |
| 106 | +English: yeah, yes, yep, ok, okay, alright, hmm, mhm, uh-huh, right, |
| 107 | + sure, got it, gotcha, cool, awesome, go on, continue... |
| 108 | +
|
| 109 | +Hindi: theek, theek hai, haan, bilkul, accha, ji, haanji... |
| 110 | +
|
| 111 | +Spanish: sí, vale, claro, bueno, ajá... |
| 112 | +
|
| 113 | +French: oui, ouais, d'accord, bon... |
| 114 | +``` |
| 115 | + |
| 116 | +**Interrupt Words (Always stop agent):** |
| 117 | +``` |
| 118 | +English: wait, stop, hold, pause, no, nope, hey, listen, actually, but, |
| 119 | + what, why, how, wrong, sorry, repeat... |
| 120 | +
|
| 121 | +Hindi: ruk, ruko, nahi, mat, suno... |
| 122 | +
|
| 123 | +Spanish: espera, para, no, perdón... |
| 124 | +
|
| 125 | +French: attends, non, arrête, pardon... |
| 126 | +``` |
| 127 | + |
| 128 | +--- |
| 129 | + |
| 130 | +## 🧪 Test Scenarios |
| 131 | + |
| 132 | +The implementation handles all required test scenarios: |
| 133 | + |
| 134 | +### Scenario 1: Long Explanation ✅ |
| 135 | +- **Context:** Agent reading a long paragraph |
| 136 | +- **User says:** "Okay... yeah... uh-huh" |
| 137 | +- **Result:** Agent continues speaking without any interruption |
| 138 | + |
| 139 | +### Scenario 2: Passive Affirmation ✅ |
| 140 | +- **Context:** Agent asks "Are you ready?" and goes silent |
| 141 | +- **User says:** "Yeah" |
| 142 | +- **Result:** Agent processes "Yeah" as a valid answer and proceeds |
| 143 | + |
| 144 | +### Scenario 3: The Correction ✅ |
| 145 | +- **Context:** Agent counting "One, two, three..." |
| 146 | +- **User says:** "No stop" |
| 147 | +- **Result:** Agent stops (after STT processing) |
| 148 | + |
| 149 | +### Scenario 4: Mixed Input ✅ |
| 150 | +- **Context:** Agent is speaking |
| 151 | +- **User says:** "Yeah okay but wait" |
| 152 | +- **Result:** Agent stops (detects "but" and "wait" as interrupt words) |
| 153 | + |
| 154 | +--- |
| 155 | + |
| 156 | +## 🏗️ How It Works |
| 157 | + |
| 158 | +``` |
| 159 | +┌─────────────────────────────────────────────────────────────────┐ |
| 160 | +│ EVENT FLOW │ |
| 161 | +└─────────────────────────────────────────────────────────────────┘ |
| 162 | +
|
| 163 | +User speaks while agent is talking |
| 164 | + │ |
| 165 | + ▼ |
| 166 | +┌─────────────────┐ |
| 167 | +│ VAD Detects │ |
| 168 | +│ Voice Activity │ |
| 169 | +└────────┬────────┘ |
| 170 | + │ |
| 171 | + ▼ |
| 172 | +┌─────────────────┐ |
| 173 | +│ Agent Continues │ ◄── No pause! Speech keeps going |
| 174 | +│ Speaking │ |
| 175 | +└────────┬────────┘ |
| 176 | + │ |
| 177 | + ▼ (STT processes in background) |
| 178 | +┌─────────────────┐ |
| 179 | +│ STT Returns │ |
| 180 | +│ Transcript │ |
| 181 | +└────────┬────────┘ |
| 182 | + │ |
| 183 | + ▼ |
| 184 | +┌─────────────────┐ |
| 185 | +│ Check: Is it │ |
| 186 | +│ backchanneling? │ |
| 187 | +└────────┬────────┘ |
| 188 | + │ |
| 189 | + ┌────┴────┐ |
| 190 | + │ │ |
| 191 | + ▼ ▼ |
| 192 | +┌───────┐ ┌───────┐ |
| 193 | +│"yeah" │ │"wait" │ |
| 194 | +│IGNORE │ │ STOP │ |
| 195 | +│continue│ │agent │ |
| 196 | +└───────┘ └───────┘ |
| 197 | +``` |
| 198 | + |
| 199 | +--- |
| 200 | + |
| 201 | +## 📝 API Reference |
| 202 | + |
| 203 | +### BackchannelingConfig Factory |
| 204 | + |
| 205 | +```python |
| 206 | +from livekit.agents.voice.backchanneling_config import ( |
| 207 | + create_config, |
| 208 | + create_english_only_config, |
| 209 | + DEFAULT_BACKCHANNELING_CONFIG, |
| 210 | + DEFAULT_BACKCHANNELING_WORDS, |
| 211 | + DEFAULT_INTERRUPT_WORDS, |
| 212 | +) |
| 213 | + |
| 214 | +# Create with all options |
| 215 | +config = create_config( |
| 216 | + enabled=True, |
| 217 | + backchanneling_words={"custom", "words"}, # Adds to defaults |
| 218 | + interrupt_words={"custom", "interrupts"}, # Adds to defaults |
| 219 | + extend_defaults=True, # Set False to replace |
| 220 | +) |
| 221 | + |
| 222 | +# English-only config (no Hindi/Spanish/French) |
| 223 | +english_config = create_english_only_config() |
| 224 | +``` |
| 225 | + |
| 226 | +--- |
| 227 | + |
| 228 | +## 🔍 Troubleshooting |
| 229 | + |
| 230 | +### Agent still stops on "yeah" |
| 231 | +- Check that `backchanneling_config.enabled` is `True` |
| 232 | +- Verify the word is in the `backchanneling_words` set |
| 233 | +- Check STT is transcribing correctly (might be hearing differently) |
| 234 | + |
| 235 | +### Agent doesn't stop on "wait" |
| 236 | +- Verify the word is in the `interrupt_words` set |
| 237 | +- STT processing takes ~1-1.5s - this is expected latency |
| 238 | + |
| 239 | +### Want faster interrupt response? |
| 240 | +- The delay is due to STT processing time |
| 241 | +- Consider using a faster STT provider |
| 242 | +- The tradeoff is accuracy vs speed |
| 243 | + |
| 244 | +--- |
| 245 | + |
| 246 | +## 📄 License |
| 247 | + |
| 248 | +This implementation follows the same license as the LiveKit Agents framework. |
0 commit comments