Skip to content

Feature Request: Video-Level Semantic Search and Multi-Video Chat Support #69

@kiranshivaraju

Description

@kiranshivaraju

Problem Statement

Currently, VSS supports chat/Q&A functionality only on a per-video basis. Users must know the specific video ID beforehand to query
it. There's no way to:

  1. Search across all videos semantically (e.g., "find all videos with accidents")
  2. Chat with multiple videos simultaneously (e.g., "analyze patterns across these 5 accident videos")

This limits VSS use cases where users have large video libraries and need to:

  • Discover relevant videos based on content, not just metadata
  • Ask questions that span multiple videos
  • Analyze patterns/trends across video collections

Proposed Solution

Add video-level semantic search capabilities to VSS:

  1. Video-Level Indexing
  • After video summarization completes, index the video summary into a dedicated Milvus collection
  • Store embeddings of video summaries (not just frame-level embeddings)
  • Include metadata: tags, custom fields, timestamps
  1. Cross-Video Search API
  • New endpoint: POST /videos/search
  • Input: Natural language query (e.g., "show me forklift accidents")
  • Output: Ranked list of relevant video IDs with similarity scores
  • Support filtering by tags, custom metadata, time ranges
  1. Multi-Video Chat Enhancement
  • Enhance existing POST /chat/completions to properly aggregate context from multiple videos
  • Current implementation accepts id: [list] but unclear if it truly aggregates multi-video context
  • Ensure RAG retrieves relevant information across all specified videos

Technical Architecture

Proposed Components

  1. Video-Level Milvus Collection
    Collection: "video_summaries"
    Schema:
    - vss_video_id: VARCHAR (unique identifier)
    - summary_embedding: FLOAT_VECTOR (embedded summary text)
    - summary_text: VARCHAR (full summary for retrieval)
    - tags: JSON (user-defined tags)
    - custom_metadata: JSON (arbitrary fields)
    - created_at: INT64 (timestamp)

  2. Video Indexing Hook

  • Location: via_stream_handler.py after video summary generation
  • Action: Embed summary → Insert into video-level collection
  • Error handling: Non-blocking (don't fail summarization if indexing fails)
  1. New API Endpoint
    POST /videos/search
    Request:
    {
    "query": "show me accident videos",
    "filters": {"tags": ["accident"], "metadata": {...}},
    "top_k": 10,
    "threshold": 0.7
    }

Response:
{
"results": [
{"vss_video_id": "...", "score": 0.89, "summary": "...", ...}
],
"total_found": 15
}

  1. Combined Search + Chat Endpoint (Optional)
    POST /videos/search-and-chat
    Request:
    {
    "search_query": "accident videos",
    "chat_question": "What caused these accidents?",
    "top_k": 5
    }

Response:
{
"search_results": [...],
"chat_response": {...}
}


Implementation Details

Integration Points

File 1: src/vss-engine/src/via_stream_handler.py

  • Add video indexing after video summary generation (around line 1350)
  • Call new VideoLevelIndexer service
  • Configuration flag: enable_video_search (default: true)

File 2: src/vss-engine/src/via_server.py

  • Add /videos/search endpoint (around line 1900)
  • Add /videos/search-and-chat endpoint
  • Verify /chat/completions properly handles multiple video IDs

File 3: src/vss-engine/src/vss_api_models.py

  • Add VideoSearchQuery model
  • Add VideoSearchResponse model
  • Add SearchAndChatQuery model

File 4: src/vss-engine/src/video_search/ (NEW MODULE)

  • video_indexer.py: Milvus collection management
  • embedding_service.py: Embed video summaries (reuse existing models)

Configuration

New environment variables

ENABLE_VIDEO_SEARCH=true
VIDEO_SEARCH_COLLECTION=video_summaries
VIDEO_EMBEDDING_MODEL=nvidia/nv-embedqa-e5-v5
VIDEO_SEARCH_TOP_K_DEFAULT=10


Use Cases

Use Case 1: Safety Monitoring

Scenario: Warehouse with 1000+ hours of footageQuery: "Show me all forklift near-miss incidents"Flow:

  1. Search: Returns 15 relevant videos with scores
  2. User reviews top results
  3. Multi-chat: "What are common factors in these incidents?"

Use Case 2: Compliance Audit

Scenario: Review PPE compliance across facilitiesQuery: "Find videos where workers aren't wearing helmets"Flow:

  1. Search with metadata filter: {"facility": "warehouse_A"}
  2. Returns tagged videos
  3. Multi-chat: "Generate compliance report for these videos"

Use Case 3: Pattern Analysis

Scenario: Traffic analysis across multiple intersectionsQuery: "Analyze traffic patterns at high-congestion locations"Flow:

  1. Search: Find videos tagged "high_traffic"
  2. Multi-chat over top 10 videos
  3. LLM aggregates insights across all videos

Benefits

  1. Improved Discoverability: Find videos by content, not just file names
  2. Multi-Video Intelligence: Ask questions spanning video collections
  3. Scalability: Works with large video libraries (1000s of videos)
  4. Backward Compatible: Optional feature, doesn't break existing APIs
  5. Reuses Infrastructure: Leverages existing Milvus/embedding setup

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions