Skip to content

NVIDIA-AI-Blueprints/video-search-and-summarization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NVIDIA AI Blueprint: Video Search and Summarization

Table of Contents

Overview

This repository is what powers the build experience, showcasing video search and summarization agent with NVIDIA NIM microservices.

Insightful, accurate, and interactive video analytics AI agents enable a range of industries to make better decisions faster. These AI agents are given tasks through natural language and can perform complex operations like video summarization and visual question-answering, unlocking entirely new application possibilities. The NVIDIA AI Blueprint makes it easy to get started building and customizing video analytics AI agents for video search and summarization — all powered by generative AI, vision language models (VLMs) like Cosmos Nemotron VLMs, large language models (LLMs) like Llama Nemotron LLMs,d NVIDIA NIM.

Use Case / Problem Description

The NVIDIA AI Blueprint for Video Search and Summarization addresses the challenge of deploying visual agents capable of interacting with large volumes of video data, both stored and streamed. This can be used to create vision AI agents, that can be applied to a multitude of use cases such as monitoring smart spaces, warehouse automation, and SOP validation. This is important where quick and accurate video analysis can lead to better decision-making and enhanced operational efficiency.

Agent Workflows

We provide multiple reference Agent Workflows which demonstrate how the individual components can be leveraged by an agent:

Workflow Description
Q&A and Report Generation (Quickstart) Video retrieval, VLM-based Q&A, and report generation on short video clips
Alert Verification Realtime processing of videos using perception (object detection, tracking) and behavior analytics to generate alerts, which are subsequently verified with VLM to reduce false positives
Real-Time Alerts Continuous processing of video streams through VLM for anomaly detection
Video Search Natural language search across video archives using video embeddings (alpha)
Long Video Summarization Analysis and summarization of extended video recordings through chunking and aggregation of dense captions

Software Components

  1. NIM microservices: Here are models used in this blueprint:

  2. Real-time video intelligence: The Real-Time Video Intelligence layer extracts rich visual features, semantic embeddings, and contextual understanding from video data in real-time, publishing results to a message broker for downstream analytics and agentic workflows. It provides three core microservices for processing video streams.

  3. Downstream analytics: The Downstream Analytics layer processes and enriches the metadata streams generated by real-time video intelligence microservices, transforming raw detections into actionable insights and verified alerts.

  4. Agent and offline processing: The top-level agent leverages the Model Context Protocol (MCP) to access video analytics data, incident records, and vision processing capabilities through a unified tool interface. It integrates multiple vision-based tools including video understanding with Vision Language Models (VLMs), semantic video search using embeddings, long video summarization for extended footage analysis, and video snapshot/clip retrieval.

Target Audience

This blueprint is designed for ease of setup with extensive configuration options, requiring technical expertise. It is intended for:

  1. Video Analysts and IT Engineers: Professionals focused on analyzing video data and ensuring efficient processing and summarization. The blueprint offers 1-click deployment steps, easy-to-manage configurations, and plug-and-play models, making it accessible for early developers.

  2. GenAI Developers / Machine Learning Engineers: Experts who need to customize the blueprint for specific use cases. This includes modifying the pipelines for unique datasets and fine-tuning LLMs as needed. For advanced users, the blueprint provides detailed configuration options and custom deployment possibilities, enabling extensive customization and optimization.

Repository Structure Overview

Directory Description
agent/ Video search and summarization agent (Python). Contains src/vss_agents/ (tools, agents, APIs, embeddings, evaluators, video analytics), tests/, stubs/, docker/, and 3rdparty/. See agent/README.md.
deployments/ Deployment configs and Docker Compose: NIM model configs (nim/), developer workflows (developer-workflow/ — dev-profile-base, dev-profile-search, dev-profile-alerts, dev-profile-lvs), foundational services, LVS, RTVI, VLM-as-verifier, VST, and root compose.yml.
scripts/ Deployment and patch scripts, including the Brev launchable notebook (deploy_vss_launchable.ipynb) and dev-profile / patch scripts.
ui/ Frontend monorepo (Next.js, Turbo): apps/ (nemo-agent-toolkit-ui, nv-metropolis-bp-vss-ui) and shared packages/. See ui/README.md.

Documentation

For detailed instructions and additional information about this blueprint, please refer to the official documentation.

Prerequisites

Obtain API Key

Hardware Requirements

The platform requirement can vary depending on the configuration and deployment topology used for VSS and dependencies like VLM, LLM, etc. For a list of validated GPU topologies and what configuration to use, see the GPU requirements.

Quickstart Guide

Launchable Deployment

Ideal for: Quickly getting started with your own videos without worrying about hardware and software requirements.

Follow the steps from the documentation and notebook in scripts directory to complete all pre-requisites and deploy the blueprint using Brev Launchable in a 2xRTX PRO 6000 SE AWS instance.

Docker Compose Deployment

Ideal for: Deploying a VSS agent on your own hardware or bare metal cloud instance.

System Requirements

  • OS:
    • x86 hosts: Ubuntu 22.04 or Ubuntu 24.04
    • DGX-SPARK: DGX OS 7.4.0
    • IGX-THOR: Jetson Linux BSP (Rel 38.5)
    • AGX-THOR: Jetson Linux BSP (Rel 38.4)
  • NVIDIA Driver:
    • 580.105.08 (x86 hosts with Ubuntu 24.04)
    • 580.65.06 (x86 hosts with Ubuntu 22.04)
    • 580.95.05 (DGX-SPARK)
    • 580.00 (IGX-THOR and AGX-THOR)
  • NVIDIA Container Toolkit: 1.17.8+
  • Docker: 27.2.0+
  • Docker Compose: v2.29.0+
  • NGC CLI: 4.10.0+

Please refer to Prerequisites section here for installation details.

Contributing

This project is currently in early access and not accepting contributions. Once made generally available, this project will accept contributions.

License

Refer to LICENSE

About

Blueprint for Ingesting massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

Topics

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE.DATA

Security policy

Stars

Watchers

Forks

Contributors