Skip to content

Commit 690846c

Browse files
committed
feat: Add intelligent backchanneling detection for voice agents
- Add BackchannelingConfig for configurable word lists - Implement _is_backchanneling_only() method in agent_activity.py - Filter backchanneling only when agent is speaking (state-aware) - Support semantic interruption (mixed inputs like 'yeah but wait') - No VAD modification - logic layer only - Add README.md with usage documentation - Add unit tests and example agent
1 parent 4f2c531 commit 690846c

File tree

7 files changed

+900
-17
lines changed

7 files changed

+900
-17
lines changed
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
import logging
2+
3+
from dotenv import load_dotenv
4+
from livekit.agents import Agent, AgentSession, AgentServer, JobContext, JobProcess, cli
5+
from livekit.plugins import silero
6+
7+
logger = logging.getLogger("backchanneling-agent")
8+
load_dotenv()
9+
10+
11+
class SimpleAgent(Agent):
12+
def __init__(self):
13+
super().__init__(
14+
instructions=(
15+
"You are a helpful assistant named Kelly. "
16+
"When asked to explain something, give detailed explanations. "
17+
"Keep responses conversational and avoid special characters."
18+
)
19+
)
20+
21+
async def on_enter(self):
22+
self.session.generate_reply()
23+
24+
25+
server = AgentServer()
26+
27+
28+
def prewarm(proc: JobProcess):
29+
proc.userdata["vad"] = silero.VAD.load()
30+
31+
32+
server.setup_fnc = prewarm
33+
34+
35+
@server.rtc_session()
36+
async def entrypoint(ctx: JobContext):
37+
session = AgentSession(
38+
stt="deepgram/nova-3",
39+
llm="openai/gpt-4o-mini",
40+
tts="cartesia/sonic-2",
41+
vad=ctx.proc.userdata["vad"],
42+
allow_interruptions=True,
43+
)
44+
45+
await session.start(agent=SimpleAgent(), room=ctx.room)
46+
47+
48+
if __name__ == "__main__":
49+
cli.run_app(server)
Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
# Intelligent Interruption Handling - Backchanneling Detection
2+
3+
## 🎯 Overview
4+
5+
This implementation adds **context-aware backchanneling detection** to the LiveKit Agents framework. The agent intelligently distinguishes between passive acknowledgments ("yeah", "ok", "hmm") and active interruptions ("wait", "stop", "no").
6+
7+
### The Problem Solved
8+
Previously, the AI agent would stop speaking whenever it detected any user voice activity. This caused unnatural conversation flow - even when the user was just saying "yeah" to show they were listening.
9+
10+
### The Solution
11+
**Background Speech Processing**: The agent processes user speech in the background while continuing to speak. Only when STT confirms an interrupt word does the agent stop.
12+
13+
- User says "yeah" while agent speaks → Agent continues seamlessly
14+
- User says "wait" while agent speaks → Agent stops and listens
15+
16+
---
17+
18+
## ✅ Features Implemented
19+
20+
| Feature | Description |
21+
|---------|-------------|
22+
| **Configurable Word Lists** | Easily customizable backchanneling and interrupt word sets |
23+
| **Multi-language Support** | Built-in words for English, Hindi, Spanish, French |
24+
| **State-Based Filtering** | Backchanneling only ignored when agent is speaking |
25+
| **Semantic Interruption** | Mixed inputs like "yeah but wait" correctly trigger interruption |
26+
| **Seamless Speech** | Agent continues without pause during backchanneling |
27+
28+
---
29+
30+
## 📁 Files Changed
31+
32+
```
33+
livekit-agents/livekit/agents/voice/
34+
├── backchanneling_config.py # NEW - Configuration and word lists
35+
├── agent_activity.py # MODIFIED - Integration hooks
36+
├── agent_session.py # MODIFIED - Config parameter
37+
└── __init__.py # MODIFIED - Exports
38+
```
39+
40+
---
41+
42+
## 🚀 Quick Start
43+
44+
### Basic Usage (Default Configuration)
45+
46+
```python
47+
from livekit.agents.voice import AgentSession
48+
49+
# Backchanneling detection is enabled by default
50+
session = AgentSession(
51+
stt="deepgram/nova-3",
52+
llm="openai/gpt-4o-mini",
53+
tts="cartesia/sonic-2",
54+
)
55+
```
56+
57+
### Custom Configuration
58+
59+
```python
60+
from livekit.agents.voice import AgentSession
61+
from livekit.agents.voice.backchanneling_config import create_config
62+
63+
# Add custom words
64+
custom_config = create_config(
65+
backchanneling_words={"roger", "copy that", "understood"},
66+
interrupt_words={"cancel", "abort", "emergency"},
67+
)
68+
69+
session = AgentSession(
70+
stt="deepgram/nova-3",
71+
llm="openai/gpt-4o-mini",
72+
tts="cartesia/sonic-2",
73+
backchanneling_config=custom_config,
74+
)
75+
```
76+
77+
### Disable Backchanneling Detection
78+
79+
```python
80+
from livekit.agents.voice.backchanneling_config import create_config
81+
82+
config = create_config(enabled=False)
83+
84+
session = AgentSession(
85+
...,
86+
backchanneling_config=config,
87+
)
88+
```
89+
90+
---
91+
92+
## 🔧 Configuration Options
93+
94+
### BackchannelingConfig Parameters
95+
96+
| Parameter | Type | Default | Description |
97+
|-----------|------|---------|-------------|
98+
| `enabled` | bool | `True` | Enable/disable backchanneling detection |
99+
| `backchanneling_words` | FrozenSet[str] | See below | Words to ignore when agent is speaking |
100+
| `interrupt_words` | FrozenSet[str] | See below | Words that always trigger interruption |
101+
102+
### Default Word Lists
103+
104+
**Backchanneling Words (Ignored when agent speaks):**
105+
```
106+
English: yeah, yes, yep, ok, okay, alright, hmm, mhm, uh-huh, right,
107+
sure, got it, gotcha, cool, awesome, go on, continue...
108+
109+
Hindi: theek, theek hai, haan, bilkul, accha, ji, haanji...
110+
111+
Spanish: sí, vale, claro, bueno, ajá...
112+
113+
French: oui, ouais, d'accord, bon...
114+
```
115+
116+
**Interrupt Words (Always stop agent):**
117+
```
118+
English: wait, stop, hold, pause, no, nope, hey, listen, actually, but,
119+
what, why, how, wrong, sorry, repeat...
120+
121+
Hindi: ruk, ruko, nahi, mat, suno...
122+
123+
Spanish: espera, para, no, perdón...
124+
125+
French: attends, non, arrête, pardon...
126+
```
127+
128+
---
129+
130+
## 🧪 Test Scenarios
131+
132+
The implementation handles all required test scenarios:
133+
134+
### Scenario 1: Long Explanation ✅
135+
- **Context:** Agent reading a long paragraph
136+
- **User says:** "Okay... yeah... uh-huh"
137+
- **Result:** Agent continues speaking without any interruption
138+
139+
### Scenario 2: Passive Affirmation ✅
140+
- **Context:** Agent asks "Are you ready?" and goes silent
141+
- **User says:** "Yeah"
142+
- **Result:** Agent processes "Yeah" as a valid answer and proceeds
143+
144+
### Scenario 3: The Correction ✅
145+
- **Context:** Agent counting "One, two, three..."
146+
- **User says:** "No stop"
147+
- **Result:** Agent stops (after STT processing)
148+
149+
### Scenario 4: Mixed Input ✅
150+
- **Context:** Agent is speaking
151+
- **User says:** "Yeah okay but wait"
152+
- **Result:** Agent stops (detects "but" and "wait" as interrupt words)
153+
154+
---
155+
156+
## 🏗️ How It Works
157+
158+
```
159+
┌─────────────────────────────────────────────────────────────────┐
160+
│ EVENT FLOW │
161+
└─────────────────────────────────────────────────────────────────┘
162+
163+
User speaks while agent is talking
164+
165+
166+
┌─────────────────┐
167+
│ VAD Detects │
168+
│ Voice Activity │
169+
└────────┬────────┘
170+
171+
172+
┌─────────────────┐
173+
│ Agent Continues │ ◄── No pause! Speech keeps going
174+
│ Speaking │
175+
└────────┬────────┘
176+
177+
▼ (STT processes in background)
178+
┌─────────────────┐
179+
│ STT Returns │
180+
│ Transcript │
181+
└────────┬────────┘
182+
183+
184+
┌─────────────────┐
185+
│ Check: Is it │
186+
│ backchanneling? │
187+
└────────┬────────┘
188+
189+
┌────┴────┐
190+
│ │
191+
▼ ▼
192+
┌───────┐ ┌───────┐
193+
│"yeah" │ │"wait" │
194+
│IGNORE │ │ STOP │
195+
│continue│ │agent │
196+
└───────┘ └───────┘
197+
```
198+
199+
---
200+
201+
## 📝 API Reference
202+
203+
### BackchannelingConfig Factory
204+
205+
```python
206+
from livekit.agents.voice.backchanneling_config import (
207+
create_config,
208+
create_english_only_config,
209+
DEFAULT_BACKCHANNELING_CONFIG,
210+
DEFAULT_BACKCHANNELING_WORDS,
211+
DEFAULT_INTERRUPT_WORDS,
212+
)
213+
214+
# Create with all options
215+
config = create_config(
216+
enabled=True,
217+
backchanneling_words={"custom", "words"}, # Adds to defaults
218+
interrupt_words={"custom", "interrupts"}, # Adds to defaults
219+
extend_defaults=True, # Set False to replace
220+
)
221+
222+
# English-only config (no Hindi/Spanish/French)
223+
english_config = create_english_only_config()
224+
```
225+
226+
---
227+
228+
## 🔍 Troubleshooting
229+
230+
### Agent still stops on "yeah"
231+
- Check that `backchanneling_config.enabled` is `True`
232+
- Verify the word is in the `backchanneling_words` set
233+
- Check STT is transcribing correctly (might be hearing differently)
234+
235+
### Agent doesn't stop on "wait"
236+
- Verify the word is in the `interrupt_words` set
237+
- STT processing takes ~1-1.5s - this is expected latency
238+
239+
### Want faster interrupt response?
240+
- The delay is due to STT processing time
241+
- Consider using a faster STT provider
242+
- The tradeoff is accuracy vs speed
243+
244+
---
245+
246+
## 📄 License
247+
248+
This implementation follows the same license as the LiveKit Agents framework.

livekit-agents/livekit/agents/voice/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from . import io, run_result
22
from .agent import Agent, AgentTask, ModelSettings
33
from .agent_session import AgentSession, VoiceActivityVideoSampler
4+
from .backchanneling_config import BackchannelingConfig
45
from .events import (
56
AgentEvent,
67
AgentFalseInterruptionEvent,
@@ -30,6 +31,7 @@
3031
"Agent",
3132
"ModelSettings",
3233
"AgentTask",
34+
"BackchannelingConfig",
3335
"SpeechHandle",
3436
"RunContext",
3537
"UserInputTranscribedEvent",

0 commit comments

Comments
 (0)