This repository showcases the integration between Agent Voice Response and Ultravox's Real-time Speech-to-Speech API. The application leverages Ultravox's powerful language model to process audio input from users, providing intelligent, context-aware responses in real-time audio format.
- Dual Call Types: Support for both agent-specific calls and generic calls
- Real-time Streaming: WebSocket-based audio streaming with buffering
- External Voice Support: Integration with ElevenLabs, Cartesia, LMNT, and generic voice providers
- Configurable Audio Settings: Customizable sample rates and buffer sizes
- Tool Integration: Support for custom tools and VAD settings
Copy .env.example to .env and configure the following variables:
ULTRAVOX_API_KEY: Your Ultravox API keyULTRAVOX_CALL_TYPE: Set to'agent'or'generic'PORT: Server port (default: 6031)
ULTRAVOX_SAMPLE_RATE: Audio sample rate (default: 8000)ULTRAVOX_CLIENT_BUFFER_SIZE_MS: Client buffer size in milliseconds (default: 60)
ULTRAVOX_AGENT_ID: Your Ultravox agent ID (required)
ULTRAVOX_SYSTEM_PROMPT: System prompt for the AI (default: "You are a helpful AI assistant.")ULTRAVOX_TEMPERATURE: AI temperature setting (default: 0)ULTRAVOX_MODEL: AI model to use (default: "fixie-ai/ultravox")ULTRAVOX_VOICE: Voice to use (default: "alloy")ULTRAVOX_RECORDING_ENABLED: Enable call recording (default: false)ULTRAVOX_JOIN_TIMEOUT: Join timeout (default: "30s")ULTRAVOX_MAX_DURATION: Maximum call duration (default: "3600s")
Set ULTRAVOX_EXTERNAL_VOICE_PROVIDER to one of: elevenlabs, cartesia, lmnt, or generic
ElevenLabs:
ULTRAVOX_ELEVENLABS_VOICE_ID: Voice IDULTRAVOX_ELEVENLABS_MODEL: Model (default: "eleven_monolingual_v1")ULTRAVOX_ELEVENLABS_SPEED: Speed (default: 1.0)ULTRAVOX_ELEVENLABS_USE_SPEAKER_BOOST: Use speaker boost (default: true)
Cartesia:
ULTRAVOX_CARTESIA_VOICE_ID: Voice IDULTRAVOX_CARTESIA_MODEL: Model (default: "cartesia-1")ULTRAVOX_CARTESIA_SPEED: Speed (default: 1.0)
LMNT:
ULTRAVOX_LMNT_VOICE_ID: Voice IDULTRAVOX_LMNT_MODEL: Model (default: "lmnt-1")ULTRAVOX_LMNT_SPEED: Speed (default: 1.0)ULTRAVOX_LMNT_CONVERSATIONAL: Conversational mode (default: true)
Generic:
ULTRAVOX_GENERIC_VOICE_URL: Voice service URLULTRAVOX_GENERIC_VOICE_HEADERS: HTTP headers (JSON string)ULTRAVOX_GENERIC_VOICE_BODY: Request body (JSON string)ULTRAVOX_GENERIC_VOICE_SAMPLE_RATE: Sample rate (default: 24000)ULTRAVOX_GENERIC_VOICE_WPM: Words per minute (default: 150)ULTRAVOX_GENERIC_VOICE_MIME_TYPE: MIME type (default: "audio/wav")ULTRAVOX_GENERIC_VOICE_AUDIO_FIELD: Audio field path (default: "audio")
ULTRAVOX_SELECTED_TOOLS: JSON string of tools to useULTRAVOX_VAD_SETTINGS: JSON string of VAD settings
npm install
npm startSend a POST request to /speech-to-speech-stream with:
-
Headers:
x-uuid: Unique identifier for the callContent-Type:audio/wav(or appropriate audio format)
-
Body: Raw audio data stream
# .env configuration
ULTRAVOX_CALL_TYPE=agent
ULTRAVOX_AGENT_ID=your_agent_id
ULTRAVOX_API_KEY=your_api_key
# Make request
curl -X POST http://localhost:6031/speech-to-speech-stream \
-H "x-uuid: call-123" \
-H "Content-Type: audio/wav" \
--data-binary @audio.wav# .env configuration
ULTRAVOX_CALL_TYPE=generic
ULTRAVOX_API_KEY=your_api_key
ULTRAVOX_SYSTEM_PROMPT=You are a customer service representative.
ULTRAVOX_EXTERNAL_VOICE_PROVIDER=elevenlabs
ULTRAVOX_ELEVENLABS_VOICE_ID=your_voice_id
# Make request
curl -X POST http://localhost:6031/speech-to-speech-stream \
-H "x-uuid: call-456" \
-H "Content-Type: audio/wav" \
--data-binary @audio.wavThe service returns a stream of audio data from Ultravox. The response includes:
- Real-time audio chunks from the AI
- JSON control messages for call state management
- Transcript information
The service handles various error scenarios:
- Missing required environment variables
- Invalid API responses
- WebSocket connection failures
- Audio processing errors
# Build the image
docker build -t avr-sts-ultravox .
# Run with environment file
docker run --env-file .env -p 6031:6031 avr-sts-ultravox- GitHub: https://github.com/agentvoiceresponse - Report issues, contribute code.
- Discord: https://discord.gg/DFTU69Hg74 - Join the community discussion.
- Docker Hub: https://hub.docker.com/u/agentvoiceresponse - Find Docker images.
- Wiki: https://wiki.agentvoiceresponse.com/en/home - Project documentation and guides.
AVR is free and open-source. If you find it valuable, consider supporting its development:
MIT License - see the LICENSE file for details.