Nemotron Voice Agent provides a comprehensive, end-to-end voice agent blueprint built with NVIDIA Nemotron state-of-the-art open models, as NVIDIA NIM for acceleration and scaling. It is designed to guide developers through the creation of a cascaded pipeline, integrating Nemotron ASR, LLM, and TTS, while solving for the complexities of streaming, interruptible conversations. By leveraging NVIDIA NIM microservices, this developer example enables developers to accelerate the deployment of high-performance voice AI solutions.
The following are the key components in this blueprint:
- NVIDIA Nemotron Speech ASR & TTS: High-performance streaming speech recognition and multilingual text-to-speech synthesis.
- NVIDIA Nemotron LLMs: State-of-the-art LLM models engineered for real-time conversational use cases.
- Pipeline Orchestration: Built on top of the Pipecat framework with WebRTC transport, enabling low-latency real-time voice communication and speculative speech processing capabilities.
Check the following requirements before you begin.
This blueprint requires 2 NVIDIA GPUs (Ampere, Hopper, Ada, or later).
- GPU 0: For running NVIDIA Nemotron Speech ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) models.
- Total VRAM required for ASR and TTS models: 48 GB
- GPU 1: For running NVIDIA LLM NIM.
- Nemotron 3 Nano 30B A3B: 48 GB VRAM
- Llama 3.3 Nemotron Super 49B v1.5: 80 GB VRAM
- NVIDIA NGC: Valid credentials for NVIDIA NGC. See the NGC Getting Started Guide.
- NVIDIA API Key: Required for NVIDIA NIM models and NGC container images. Get yours at build.nvidia.com.
- Docker: With NVIDIA GPU support installed.
- NVIDIA NIM: Required for running NVIDIA NIM models. See the NVIDIA NIM Getting Started Guide.
Start the application following these steps.
-
Clone the repository and navigate to the root directory and copy the example environment file .env.example to the root directory.
git clone git@github.com:NVIDIA-AI-Blueprints/nemotron-voice-agent.git cd nemotron-voice-agent git submodule update --init cp config/env.example .env -
Set your NVIDIA API key as an environment variable:
export NVIDIA_API_KEY=<your-nvidia-api-key>
-
Login to NVIDIA NGC Docker Registry.
export NGC_API_KEY=<your-nvidia-api-key> docker login nvcr.io
-
Deploy the application.
docker compose up -d
Note: Deployment may take 30-60 minutes on first run.
-
Enable microphone access in Chrome before opening the app in the browser. Go to
chrome://flags/, enable "Insecure origins treated as secure", addhttp://<machine-ip>:9000to the list, and restart Chrome.Note: If this step is skipped, the UI may show
Cannot read properties of undefined (reading 'getUserMedia')error.The UI might also get stuck or fail to access the microphone if you connect remotely (e.g., via public IP or cloud) and a TURN server is not configured. If you need to access the application from remote locations or deploy on cloud platforms, configure a TURN server—see Optional: Deploy TURN Server for Remote Access.
-
Access the application at
http://<machine-ip>:9000/Tip: For the best experience, we recommend using a headset (preferably wired) instead of your laptop's built-in microphone.
For detailed setup instructions and troubleshooting, proceed to Getting Started Guide.
This repository includes AI agent skills for deployment assistance. Install them for your coding agent with:
npx skills add .| Type | Guide | Description |
|---|---|---|
| Tutorial | Getting Started | Full deployment guide with prerequisites, GPU setup, and step-by-step instructions |
| How-to | Configuration Guide | Configuration guide on the .env file depending on various use cases |
| How-to | Enable Multilingual Voice Agent | Enable multi-language conversations with automatic language detection |
| How-to | Jetson Thor Deployment | Edge deployment guide for NVIDIA Jetson Thor platform |
| How-to | Tune Pipeline Performance | Reduce latency with speculative speech and other performance settings |
| Explanation | Best Practices | Production deployment, latency optimization, and UX design guidelines |
| Reference | NVIDIA Pipecat | Overview of Pipecat services and processors for voice AI pipelines |
| Reference | Evaluation and Performance | Accuracy benchmarking and latency/perf tests |
This NVIDIA AI BLUEPRINT is licensed under the BSD 2-Clause License. See LICENSE for details. This project may download and install additional third-party open source software and containers. Review the license terms of these projects in third_party_oss_license.txt before use.
