Nemotron-Nano2-VL Notebooks

A collection of notebooks demonstrating the capabilities of NVIDIA Nemotron Nano 2 VL, a 12B parameter model that unifies visual and textual understanding for advanced multimodal agentic workflows.

Overview

These notebooks show how to use NVIDIA Nemotron Nano 2 VL to build applications that can see, read, and reason across diverse media. The model can extract, understand, and act on information from text, images, and videos, making it a powerful tool for next-generation AI agents.

Models

VLM (NIM): nvidia/nemotron-nano-2-vl (Available soon on NVIDIA AI Endpoints)
VLM (Hugging Face): nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-FP8 (link)
VLM (Hugging Face): nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 (link)

Key Features

Agentic Multimodal Reasoning: Unifies visual and textual understanding to extract, reason, and act on information.
Versatile Inputs: Natively handles text prompts, image URLs, and video URLs in a single request.
Controllable Reasoning: Use the /think system prompt to enable detailed reasoning steps and /no_think for direct answers.
Multi-Image Understanding: Capable of reasoning across multiple images, such as different pages of a PDF, to answer complex questions.
Advanced Video Analysis: Performs dense captioning and summarization of video content.
Efficient Video Sampling (EVS): Automatically prunes redundant video frames to enable efficient long-context reasoning.
Hybrid Mamba-Transformer Architecture: Delivers high accuracy with superior throughput and lower latency.

Requirements

NVIDIA API key (get one here)
GPU recommended for local deployment (e.g., single H100)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nemotron-Nano2-VL Notebooks

Overview

Models

Key Features

Requirements

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Nemotron-Nano2-VL Notebooks

Overview

Models

Key Features

Requirements