This repository showcases the integration between Agent Voice Response and Ultravox's Real-time Speech-to-Speech API. The application leverages Ultravox's powerful language model to process audio input from users, providing intelligent, context-aware responses in real-time audio format.
To set up and run this project, you will need:
- Node.js and npm installed
- An Ultravox API key with access to the real-time API
- WebSocket support in your environment
git clone https://github.com/agentvoiceresponse/avr-sts-ultravox.git
cd avr-sts-ultravoxnpm installCreate a .env file in the root of the project to store your API keys and configuration. You will need to add the following variables:
ULTRAVOX_API_KEY=your_ultravox_api_key
ULTRAVOX_AGENT_ID=your_ultravox_agent_id
PORT=6031Replace your_ultravox_api_key with your actual Ultravox API key.
Start the application by running the following command:
node index.jsThe server will start on the port defined in the environment variable (default: 6030).
The Agent Voice Response system integrates with Ultravox's Real-time Speech-to-Speech API to provide intelligent audio-based responses to user queries. The server receives audio input from users, forwards it to Ultravox's API, and then returns the model's response as audio in real-time using WebSocket communication.
- Express.js Server: Handles incoming audio streams from clients
- WebSocket Communication: Manages real-time communication with Ultravox's API
- Audio Processing: Handles audio format conversion between 8kHz and 24kHz
- Real-time Streaming: Processes and streams audio data in real-time
The application includes two main audio processing functions:
-
Upsampling (8kHz to 48kHz):
- Converts client audio from 8kHz to 48kHz using linear interpolation
- Required for Ultravox's API which expects 48kHz input
-
Downsampling (24kHz to 8kHz):
- Converts Ultravox's 48kHz output back to 8kHz
- Ensures compatibility with client audio systems (Asterisk AudioSocket Module)
This endpoint accepts an audio stream and returns a streamed audio response generated by Ultravox.
You can customize the application behavior using the following environment variables:
ULTRAVOX_API_KEY: Your Ultravox API key (required)ULTRAVOX_AGENT_ID: Your Ultravox Agent ID (required)PORT: The port on which the server will listen (default: 6031)
The application includes comprehensive error handling for:
- WebSocket connection issues
- Audio processing errors
- Ultravox API errors
- Stream processing errors
All errors are logged to the console and appropriate error messages are returned to the client.
- GitHub: https://github.com/agentvoiceresponse - Report issues, contribute code.
- Discord: https://discord.gg/DFTU69Hg74 - Join the community discussion.
- Docker Hub: https://hub.docker.com/u/agentvoiceresponse - Find Docker images.
- Wiki: https://wiki.agentvoiceresponse.com/en/home - Project documentation and guides.
AVR is free and open-source. If you find it valuable, consider supporting its development:
MIT License - see the LICENSE file for details.