GitHub - NVIDIA-AI-Blueprints/video-search-and-summarization: Blueprint for Ingesting massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

NVIDIA AI Blueprint: Video Search and Summarization

Overview

This repository is what powers the build experience, showcasing video search and summarization agent with NVIDIA NIM microservices.

Insightful, accurate, and interactive video analytics AI agents enable a range of industries to make better decisions faster. These AI agents are given tasks through natural language and can perform complex operations like video summarization and visual question-answering, unlocking entirely new application possibilities. The NVIDIA AI Blueprint makes it easy to get started building and customizing video analytics AI agents for video search and summarization — all powered by generative AI, vision language models (VLMs) like Cosmos Nemotron VLMs, large language models (LLMs) like Llama Nemotron LLMs, NVIDIA NeMo Retriever, and NVIDIA NIM.

Software Components

NIM microservices: Here are models used in this blueprint:
Ingestion Pipeline:

The process involves decoding video segments (chunks) generated by the stream handler, selecting frames, and using a vision-language model (VLM) along with a caption prompt to generate detailed captions for each chunk. These dense captions are then indexed into vector and graph databases for use in the Context-Aware Retrieval-Augmented Generation workflow.
Context Manager:

Efficiently incorporates tools — a vision-language model (VLM) and a large language model (LLM), using them as required. Key functions including a summary generator, an answer generator, and an alert handler. The tools and functions are used in summary generation, handling Q&A, and managing alerts. In addition, context manager effectively maintains its working context by making efficient use of both short-term memory, such as chat history, and long-term memory resources like vector and graph databases, as needed.
CA-RAG module:

The Context-Aware Retrieval-Augmented Generation (CA-RAG) module leverages both Vector RAG and Graph-RAG as the primary sources for video understanding. During the Q&A workflow, the CA-RAG module extracts relevant context from the vector database and graph database to enhance temporal reasoning, anomaly detection, multi-hop reasoning, and scalability, thereby offering deeper contextual understanding and efficient management of extensive video data.

Prerequisites

NVAIE developer license

Obtain API Keys

Apply for Early Access Program for Video Search And Summarization NVIDIA AI Blueprint.
Login to NGC Portal with the same account you applied for early access.
Follow the steps here to obtain an NGC API Key.

This API Key (NVIDIA_API_KEY) will be used to pull the blueprint container and other models that will be used as part of the blueprint.

Hardware Requirements

Default helm chart (local models)

The following Nvidia GPUs are supported:

8 x H100 (80 GB)
4 x H100 (80 gb) (requires helm chart override)
8 x A100 (80 GB)
8 x L40S (48 GB)

500+ GB system memory

Remote deployment (some or all models remote)

The following Nvidia GPUs are supported if remote endpoints are being used:

All models remote: A6000, L40s, A100 (40 gb)

Local VLM: A100 (80 gb), H100, H200 (Use remote deployment )

Quickstart Guide

Launchable Deployment

Follow the notebook in deploy directory to complete all pre-requisites and deploy the blueprint using Brev Launchable in an 8xL040s Crusoe instance.

deploy/1_Deploy_VSS_docker_Crusoe.ipynb: This notebook is tailored spacifically for the Crusoe CSP which uses Ephemeral storage.

Docker Compose Deployment

There are also 3 Docker Compose deployments for local and remote deployments

System Requirements

Ubuntu 22.04
NVIDIA driver 535.161.08 (Recommended minimum version)
CUDA 12.2+ (CUDA driver installed with NVIDIA driver)
Docker Compose v2.32.4

Helm Chart Deployment

Once approved for Early Access, the Members page will contain a link to 'Download helm chart from NGC' and 'Documentation'. Follow the guide to deploy the blueprint with Helm Chart.

System Requirements

Ubuntu 22.04
NVIDIA driver 535.161.08 (Recommended minimum version)
CUDA 12.2+ (CUDA driver installed with NVIDIA driver)
Kubernetes v1.31.2
NVIDIA GPU Operator v23.9
Helm v3.x

Known CVEs

CVE	Description
CVE-2024-11393	This impacts the transformers v4.47.0 python package. This impacts the Hugging Face Transformers MaskFormer Model Deserialization and allows remote attackers to execute arbitrary code. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file. However, this does not affect VSS since MaskFormer model is not used in VSS.
CVE-2024-11392	This impacts the transformers v4.47.0 python package. This impacts the Hugging Face Transformers MobileViTV2 Model Deserialization and allows remote attackers to execute arbitrary code. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file. However, this does not affect VSS since MobileViTV2 model is not used in VSS.
CVE-2024-11394	This impacts the transformers v4.47.0 python package. This impacts the Hugging Face Transformers Trax Model Deserialization and allows remote attackers to execute arbitrary code. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file. However, this does not affect VSS since Trax model is not used in VSS.

License

The code in this repository is licensed under the Apache License, Version 2.0.

The software and materials through the Early Access program are governed by the NVIDIA Software and Model Evaluation License Agreement.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
deploy		deploy
docker		docker
docs		docs
src		src
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA AI Blueprint: Video Search and Summarization

Overview

Software Components

Prerequisites

NVAIE developer license

Obtain API Keys

Hardware Requirements

Default helm chart (local models)

Remote deployment (some or all models remote)

Quickstart Guide

Launchable Deployment

Docker Compose Deployment

System Requirements

Helm Chart Deployment

System Requirements

Known CVEs

License

About

Contributors 4

Languages

License

NVIDIA-AI-Blueprints/video-search-and-summarization

Folders and files

Latest commit

History

Repository files navigation

NVIDIA AI Blueprint: Video Search and Summarization

Overview

Software Components

Prerequisites

NVAIE developer license

Obtain API Keys

Hardware Requirements

Default helm chart (local models)

Remote deployment (some or all models remote)

Quickstart Guide

Launchable Deployment

Docker Compose Deployment

System Requirements

Helm Chart Deployment

System Requirements

Known CVEs

License

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Contributors 4

Languages