Skip to content

NVIDIA-AI-Blueprints/nemotron-voice-agent

Nemotron Voice Agent

Nemotron Voice Agent provides a comprehensive, end-to-end voice agent blueprint built with NVIDIA Nemotron state-of-the-art open models, as NVIDIA NIM for acceleration and scaling. It is designed to guide developers through the creation of a cascaded pipeline, integrating Nemotron ASR, LLM, and TTS, while solving for the complexities of streaming, interruptible conversations. By leveraging NVIDIA NIM microservices, this developer example enables developers to accelerate the deployment of high-performance voice AI solutions.


Key Components

The following are the key components in this blueprint:

Architecture Diagram

Requirements

Check the following requirements before you begin.

Hardware Requirements

This blueprint requires 2 NVIDIA GPUs (Ampere, Hopper, Ada, or later).

Software Requirements


Quick Start

Start the application following these steps.

  1. Clone the repository and navigate to the root directory and copy the example environment file .env.example to the root directory.

    git clone git@github.com:NVIDIA-AI-Blueprints/nemotron-voice-agent.git
    cd nemotron-voice-agent
    git submodule update --init
    cp config/env.example .env
  2. Set your NVIDIA API key as an environment variable:

    export NVIDIA_API_KEY=<your-nvidia-api-key>
  3. Login to NVIDIA NGC Docker Registry.

    export NGC_API_KEY=<your-nvidia-api-key>
    docker login nvcr.io
  4. Deploy the application.

    docker compose up -d

    Note: Deployment may take 30-60 minutes on first run.

  5. Enable microphone access in Chrome before opening the app in the browser. Go to chrome://flags/, enable "Insecure origins treated as secure", add http://<machine-ip>:9000 to the list, and restart Chrome.

    Note: If this step is skipped, the UI may show Cannot read properties of undefined (reading 'getUserMedia') error.

    The UI might also get stuck or fail to access the microphone if you connect remotely (e.g., via public IP or cloud) and a TURN server is not configured. If you need to access the application from remote locations or deploy on cloud platforms, configure a TURN server—see Optional: Deploy TURN Server for Remote Access.

  6. Access the application at http://<machine-ip>:9000/

    Tip: For the best experience, we recommend using a headset (preferably wired) instead of your laptop's built-in microphone.

For detailed setup instructions and troubleshooting, proceed to Getting Started Guide.


Agent Skills

This repository includes AI agent skills for deployment assistance. Install them for your coding agent with:

npx skills add .

Documentation

Type Guide Description
Tutorial Getting Started Full deployment guide with prerequisites, GPU setup, and step-by-step instructions
How-to Configuration Guide Configuration guide on the .env file depending on various use cases
How-to Enable Multilingual Voice Agent Enable multi-language conversations with automatic language detection
How-to Jetson Thor Deployment Edge deployment guide for NVIDIA Jetson Thor platform
How-to Tune Pipeline Performance Reduce latency with speculative speech and other performance settings
Explanation Best Practices Production deployment, latency optimization, and UX design guidelines
Reference NVIDIA Pipecat Overview of Pipecat services and processors for voice AI pipelines
Reference Evaluation and Performance Accuracy benchmarking and latency/perf tests

License

This NVIDIA AI BLUEPRINT is licensed under the BSD 2-Clause License. See LICENSE for details. This project may download and install additional third-party open source software and containers. Review the license terms of these projects in third_party_oss_license.txt before use.

About

Reference implementation of an end-to-end voice agent built using the NVIDIA Nemotron models

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Contributors