Skip to content

Latest commit

 

History

History
28 lines (19 loc) · 1.88 KB

File metadata and controls

28 lines (19 loc) · 1.88 KB

Nemotron-Nano2-VL Notebooks

A collection of notebooks demonstrating the capabilities of NVIDIA Nemotron Nano 2 VL, a 12B parameter model that unifies visual and textual understanding for advanced multimodal agentic workflows.

Overview

These notebooks show how to use NVIDIA Nemotron Nano 2 VL to build applications that can see, read, and reason across diverse media. The model can extract, understand, and act on information from text, images, and videos, making it a powerful tool for next-generation AI agents.

Models

  • VLM (NIM): nvidia/nemotron-nano-2-vl (Available soon on NVIDIA AI Endpoints)
  • VLM (Hugging Face): nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-FP8 (link)
  • VLM (Hugging Face): nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 (link)

Key Features

  • Agentic Multimodal Reasoning: Unifies visual and textual understanding to extract, reason, and act on information.
  • Versatile Inputs: Natively handles text prompts, image URLs, and video URLs in a single request.
  • Controllable Reasoning: Use the /think system prompt to enable detailed reasoning steps and /no_think for direct answers.
  • Multi-Image Understanding: Capable of reasoning across multiple images, such as different pages of a PDF, to answer complex questions.
  • Advanced Video Analysis: Performs dense captioning and summarization of video content.
  • Efficient Video Sampling (EVS): Automatically prunes redundant video frames to enable efficient long-context reasoning.
  • Hybrid Mamba-Transformer Architecture: Delivers high accuracy with superior throughput and lower latency.

Requirements

  • NVIDIA API key (get one here)
  • GPU recommended for local deployment (e.g., single H100)