Skip to content

Releases: NVIDIA-AI-Blueprints/nemotron-voice-agent

v1.0.0

03 Mar 16:39
bfcfc90

Choose a tag to compare

Nemotron Voice Agent v1.0.0 (3 March 2026)

Initial release of Nemotron Voice Agent — an end-to-end voice agent blueprint powered by NVIDIA Nemotron ASR, LLM, and TTS, designed for scalable, production-ready deployments.

Added

  • End-to-end voice agent pipeline with NVIDIA Nemotron ASR, LLM, and TTS, supporting streaming audio and mid-conversation interruptions
  • Built on the open source Pipecat-ai and nvidia-pipecat frameworks
  • NVIDIA Nemotron Speech models:
  • NVIDIA Nemotron LLMs via NVIDIA NIM:
  • WebRTC transport for real-time, low-latency voice communication with a custom frontend UI
  • Docker Compose deployment with optional TURN server support for remote access
  • Multilingual support with automatic language detection and seamless mid-conversation language switching
  • Jetson Thor edge deployment support
  • Pipeline customizations using environment variables and config files
    • ASR, LLM, TTS model change
    • Speculative speech processing enable/disable
    • Conversation history thresholds
    • output audio buffering
  • Open telemetry tracing and monitoring support
  • Documentation:
  • AI agent deployment skill for Cursor and Claude Code to streamline deployment on workstations and Jetson Thor

Known Issues

  • ASR transcription can occasionally be inaccurate, though the LLM generally compensates by inferring meaning from context.
  • The context aggregator limits chat history to 20 turns by default. Older turns are dropped when this limit is reached, rather than summarized.