Nemotron Voice Agent

Nemotron Voice Agent provides a comprehensive, end-to-end voice agent blueprint built with NVIDIA Nemotron state-of-the-art open models, as NVIDIA NIM for acceleration and scaling. It is designed to guide developers through the creation of a cascaded pipeline, integrating Nemotron ASR, LLM, and TTS, while solving for the complexities of streaming, interruptible conversations. By leveraging NVIDIA NIM microservices, this developer example enables developers to accelerate the deployment of high-performance voice AI solutions.

Key Components

The following are the key components in this blueprint:

NVIDIA Nemotron Speech ASR & TTS: High-performance streaming speech recognition and multilingual text-to-speech synthesis.
NVIDIA Nemotron LLMs: State-of-the-art LLM models engineered for real-time conversational use cases.
- Nemotron 3 Nano 30B A3B
- Llama 3.3 Nemotron Super 49B v1.5
Pipeline Orchestration: Built on top of the Pipecat framework with WebRTC transport, enabling low-latency real-time voice communication and speculative speech processing capabilities.

Requirements

Check the following requirements before you begin.

Hardware Requirements

This blueprint requires 2 NVIDIA GPUs (Ampere, Hopper, Ada, or later).

GPU 0: For running NVIDIA Nemotron Speech ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) models.
- Total VRAM required for ASR and TTS models: 48 GB
GPU 1: For running NVIDIA LLM NIM.
- Nemotron 3 Nano 30B A3B: 48 GB VRAM
- Llama 3.3 Nemotron Super 49B v1.5: 80 GB VRAM

Software Requirements

NVIDIA NGC: Valid credentials for NVIDIA NGC. See the NGC Getting Started Guide.
NVIDIA API Key: Required for NVIDIA NIM models and NGC container images. Get yours at build.nvidia.com.
Docker: With NVIDIA GPU support installed.
NVIDIA NIM: Required for running NVIDIA NIM models. See the NVIDIA NIM Getting Started Guide.

Quick Start

Start the application following these steps.

Clone the repository and navigate to the root directory and copy the example environment file .env.example to the root directory.

git clone git@github.com:NVIDIA-AI-Blueprints/nemotron-voice-agent.git
cd nemotron-voice-agent
git submodule update --init
cp config/env.example .env

Set your NVIDIA API key as an environment variable:
```
export NVIDIA_API_KEY=<your-nvidia-api-key>
```

export NGC_API_KEY=<your-nvidia-api-key>
docker login nvcr.io

Deploy the application.
```
docker compose up -d
```
Note: Deployment may take 30-60 minutes on first run.
Enable microphone access in Chrome before opening the app in the browser. Go to chrome://flags/, enable "Insecure origins treated as secure", add http://<machine-ip>:9000 to the list, and restart Chrome.

Note: If this step is skipped, the UI may show Cannot read properties of undefined (reading 'getUserMedia') error.

The UI might also get stuck or fail to access the microphone if you connect remotely (e.g., via public IP or cloud) and a TURN server is not configured. If you need to access the application from remote locations or deploy on cloud platforms, configure a TURN server—see Optional: Deploy TURN Server for Remote Access.
Access the application at http://<machine-ip>:9000/

Tip: For the best experience, we recommend using a headset (preferably wired) instead of your laptop's built-in microphone.

For detailed setup instructions and troubleshooting, proceed to Getting Started Guide.

Agent Skills

This repository includes AI agent skills for deployment assistance. Install them for your coding agent with:

npx skills add .

Documentation

Type	Guide	Description
Tutorial	Getting Started	Full deployment guide with prerequisites, GPU setup, and step-by-step instructions
How-to	Configuration Guide	Configuration guide on the `.env` file depending on various use cases
How-to	Enable Multilingual Voice Agent	Enable multi-language conversations with automatic language detection
How-to	Jetson Thor Deployment	Edge deployment guide for NVIDIA Jetson Thor platform
How-to	Tune Pipeline Performance	Reduce latency with speculative speech and other performance settings
Explanation	Best Practices	Production deployment, latency optimization, and UX design guidelines
Reference	NVIDIA Pipecat	Overview of Pipecat services and processors for voice AI pipelines
Reference	Evaluation and Performance	Accuracy benchmarking and latency/perf tests

License

This NVIDIA AI BLUEPRINT is licensed under the BSD 2-Clause License. See LICENSE for details. This project may download and install additional third-party open source software and containers. Review the license terms of these projects in third_party_oss_license.txt before use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nemotron Voice Agent

Key Components

Requirements

Hardware Requirements

Software Requirements

Quick Start

Agent Skills

Documentation

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Nemotron Voice Agent

Key Components

Requirements

Hardware Requirements

Software Requirements

Quick Start

Agent Skills

Documentation

License