Skip to content

This repository showcases the integration between Agent Voice Response and Ultravox's Real-time Speech-to-Speech API. The application leverages Ultravox's powerful language model to process audio input from users, providing intelligent, context-aware responses in real-time audio format.

License

Notifications You must be signed in to change notification settings

operativeit/avr-sts-ultravox

 
 

Repository files navigation

Agent Voice Response - Ultravox Speech-to-Speech Integration

Discord GitHub Repo stars Docker Pulls Ko-fi

This repository showcases the integration between Agent Voice Response and Ultravox's Real-time Speech-to-Speech API. The application leverages Ultravox's powerful language model to process audio input from users, providing intelligent, context-aware responses in real-time audio format.

Prerequisites

To set up and run this project, you will need:

  1. Node.js and npm installed
  2. An Ultravox API key with access to the real-time API
  3. WebSocket support in your environment

Setup

1. Clone the Repository

git clone https://github.com/agentvoiceresponse/avr-sts-ultravox.git
cd avr-sts-ultravox

2. Install Dependencies

npm install

3. Configure Environment Variables

Create a .env file in the root of the project to store your API keys and configuration. You will need to add the following variables:

ULTRAVOX_API_KEY=your_ultravox_api_key
ULTRAVOX_AGENT_ID=your_ultravox_agent_id
PORT=6031

Replace your_ultravox_api_key with your actual Ultravox API key.

4. Running the Application

Start the application by running the following command:

node index.js

The server will start on the port defined in the environment variable (default: 6030).

How It Works

The Agent Voice Response system integrates with Ultravox's Real-time Speech-to-Speech API to provide intelligent audio-based responses to user queries. The server receives audio input from users, forwards it to Ultravox's API, and then returns the model's response as audio in real-time using WebSocket communication.

Key Components

  • Express.js Server: Handles incoming audio streams from clients
  • WebSocket Communication: Manages real-time communication with Ultravox's API
  • Audio Processing: Handles audio format conversion between 8kHz and 24kHz
  • Real-time Streaming: Processes and streams audio data in real-time

Audio Processing

The application includes two main audio processing functions:

  1. Upsampling (8kHz to 48kHz):

    • Converts client audio from 8kHz to 48kHz using linear interpolation
    • Required for Ultravox's API which expects 48kHz input
  2. Downsampling (24kHz to 8kHz):

    • Converts Ultravox's 48kHz output back to 8kHz
    • Ensures compatibility with client audio systems (Asterisk AudioSocket Module)

API Endpoints

POST /speech-to-speech-stream

This endpoint accepts an audio stream and returns a streamed audio response generated by Ultravox.

Customizing the Application

Environment Variables

You can customize the application behavior using the following environment variables:

  • ULTRAVOX_API_KEY: Your Ultravox API key (required)
  • ULTRAVOX_AGENT_ID: Your Ultravox Agent ID (required)
  • PORT: The port on which the server will listen (default: 6031)

Error Handling

The application includes comprehensive error handling for:

  • WebSocket connection issues
  • Audio processing errors
  • Ultravox API errors
  • Stream processing errors

All errors are logged to the console and appropriate error messages are returned to the client.

Support & Community

Support AVR

AVR is free and open-source. If you find it valuable, consider supporting its development:

Support us on Ko-fi

License

MIT License - see the LICENSE file for details.

About

This repository showcases the integration between Agent Voice Response and Ultravox's Real-time Speech-to-Speech API. The application leverages Ultravox's powerful language model to process audio input from users, providing intelligent, context-aware responses in real-time audio format.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 92.7%
  • Dockerfile 7.3%