This example demonstrates TEN Framework's speaker diarization capabilities using Speechmatics ASR in a conversational game called Who Likes What, where the agent figures out “who said what” across multiple voices.
- Real-time speaker identification: Automatically detects and labels different speakers (S1, S2, S3, etc.)
- Configurable sensitivity: Adjust how aggressively the system detects new speakers
- Multi-speaker conversations: Supports up to 100 speakers (configurable) and powers the Who Likes What game loop
- Visual speaker labels: Speaker information is displayed in the transcript UI so the agent can call players by name
- Speechmatics API Key: Get one from Speechmatics
- OpenAI API Key: For the LLM responses
- ElevenLabs API Key: For text-to-speech
- Agora credentials: For real-time audio streaming
Add to your .env file:
# Speechmatics (required for diarization)
SPEECHMATICS_API_KEY=your_speechmatics_api_key_here
# OpenAI (for LLM)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o
# ElevenLabs (for TTS)
ELEVENLABS_TTS_KEY=your_elevenlabs_api_key_here
# Agora (for RTC)
AGORA_APP_ID=your_agora_app_id_here
AGORA_APP_CERTIFICATE=your_agora_certificate_herecd agents/examples/speaker-diarization
task installThis command will:
- Install required dependencies
- Configure the agent for speaker diarization
- Set up the graph with Speechmatics ASR
cd agents/examples/speaker-diarization
task runThe agent will start with speaker diarization enabled.
- Access the application:
- Frontend: http://localhost:3000
- API Server: http://localhost:8080
- TMAN Designer: http://localhost:49483
You can customize diarization settings in property.json:
{
"params": {
"key": "${env:SPEECHMATICS_API_KEY}",
"language": "en",
"sample_rate": 16000,
"diarization": "speaker",
"speaker_sensitivity": 0.5,
"max_speakers": 10,
"prefer_current_speaker": false
}
}| Parameter | Type | Default | Description |
|---|---|---|---|
diarization |
string | "none" |
Diarization mode: "none", "speaker", "channel", or "channel_and_speaker" |
max_speakers |
int | 50 |
Maximum number of speakers (2-100) |
speaker_sensitivity |
float | 0.5 |
Range 0-1. Higher values detect more unique speakers ( |
prefer_current_speaker |
bool | false |
Reduce false speaker switches between similar voices ( |
Note: The current implementation uses speechmatics-python==3.0.2, which has limited diarization configuration support. Only max_speakers is functional. speaker_sensitivity and prefer_current_speaker are available in newer Speechmatics API versions.
- Audio Input: User speaks through the microphone
- Speechmatics ASR: Transcribes audio AND identifies speakers
- Speaker Labels: Each transcription includes speaker labels like
[S1],[S2] - LLM Context: Speaker information is passed to the LLM
- Response: The agent responds, acknowledging different speakers
Elliot: "Hello, this is Elliot."
Transcript: "[Elliot] Hello, this is Elliot."
Musk: "This is Elon."
Transcript: "[Musk] This is Elon."
Agent: "Elliot's voice is locked in. Waiting for Taytay to give me a quick hello so I can lock in their voice."
- Verify
SPEECHMATICS_API_KEYis set correctly - Check that
diarizationis set to"speaker"in property.json - Ensure multiple people are speaking (single speaker might always be labeled S1)
- Note:
prefer_current_speakerandspeaker_sensitivityare not supported in the current version - Consider adjusting
max_speakersto limit the number of detected speakers
- Increase
max_speakersif you expect more than the default number of speakers
The playground UI automatically displays speaker labels in the transcript. To further customize the display, you can modify the main_python extension's _on_asr_result method in extension.py.
Note: The following commands need to be executed outside of any Docker container.
# Run at project root
cd ai_agents
docker build -f agents/examples/speaker-diarization/Dockerfile -t speaker-diarization-app .# Use local .env (optional)
docker run --rm -it \
--env-file .env \
-p 8080:8080 \
-p 3000:3000 \
speaker-diarization-app- Frontend: http://localhost:3000
- API Server: http://localhost:8080
- Speechmatics Diarization Docs
- TEN Framework Documentation
- Voice Assistant Example for the base architecture
Apache License 2.0