Releases: NVIDIA-AI-Blueprints/nemotron-voice-agent
Releases · NVIDIA-AI-Blueprints/nemotron-voice-agent
v1.0.0
Nemotron Voice Agent v1.0.0 (3 March 2026)
Initial release of Nemotron Voice Agent — an end-to-end voice agent blueprint powered by NVIDIA Nemotron ASR, LLM, and TTS, designed for scalable, production-ready deployments.
Added
- End-to-end voice agent pipeline with NVIDIA Nemotron ASR, LLM, and TTS, supporting streaming audio and mid-conversation interruptions
- Built on the open source Pipecat-ai and nvidia-pipecat frameworks
- NVIDIA Nemotron Speech models:
- Parakeet CTC 1.1B (English ASR)
- Parakeet 1.1B RNNT (Multilingual ASR)
- Magpie TTS Multilingual
- NVIDIA Nemotron LLMs via NVIDIA NIM:
- WebRTC transport for real-time, low-latency voice communication with a custom frontend UI
- Docker Compose deployment with optional TURN server support for remote access
- Multilingual support with automatic language detection and seamless mid-conversation language switching
- Jetson Thor edge deployment support
- Pipeline customizations using environment variables and config files
- ASR, LLM, TTS model change
- Speculative speech processing enable/disable
- Conversation history thresholds
- output audio buffering
- Open telemetry tracing and monitoring support
- Documentation:
- Getting started guide covering prerequisites, GPU configuration, and step-by-step setup
- Configuration guide for pipeline customizations
- Jetson Thor deployment guide for edge use cases
- Best practices guide covering production deployment, latency optimization, and conversational UX
- AI agent deployment skill for Cursor and Claude Code to streamline deployment on workstations and Jetson Thor
Known Issues
- ASR transcription can occasionally be inaccurate, though the LLM generally compensates by inferring meaning from context.
- The context aggregator limits chat history to 20 turns by default. Older turns are dropped when this limit is reached, rather than summarized.