Feature Request: Video-Level Semantic Search and Multi-Video Chat Support

**Problem Statement**

  Currently, VSS supports chat/Q&A functionality only on a per-video basis. Users must know the specific video ID beforehand to query
   it. There's no way to:

  1. Search across all videos semantically (e.g., "find all videos with accidents")
  2. Chat with multiple videos simultaneously (e.g., "analyze patterns across these 5 accident videos")

  This limits VSS use cases where users have large video libraries and need to:
  - Discover relevant videos based on content, not just metadata
  - Ask questions that span multiple videos
  - Analyze patterns/trends across video collections

  ---
  **Proposed Solution**

  Add video-level semantic search capabilities to VSS:

  1. Video-Level Indexing

  - After video summarization completes, index the video summary into a dedicated Milvus collection
  - Store embeddings of video summaries (not just frame-level embeddings)
  - Include metadata: tags, custom fields, timestamps

  2. Cross-Video Search API

  - New endpoint: POST /videos/search
  - Input: Natural language query (e.g., "show me forklift accidents")
  - Output: Ranked list of relevant video IDs with similarity scores
  - Support filtering by tags, custom metadata, time ranges

  3. Multi-Video Chat Enhancement

  - Enhance existing POST /chat/completions to properly aggregate context from multiple videos
  - Current implementation accepts id: [list] but unclear if it truly aggregates multi-video context
  - Ensure RAG retrieves relevant information across all specified videos

  ---
  **Technical Architecture**

  Proposed Components

  1. Video-Level Milvus Collection
  Collection: "video_summaries"
  Schema:
    - vss_video_id: VARCHAR (unique identifier)
    - summary_embedding: FLOAT_VECTOR (embedded summary text)
    - summary_text: VARCHAR (full summary for retrieval)
    - tags: JSON (user-defined tags)
    - custom_metadata: JSON (arbitrary fields)
    - created_at: INT64 (timestamp)

  2. Video Indexing Hook
  - Location: via_stream_handler.py after video summary generation
  - Action: Embed summary → Insert into video-level collection
  - Error handling: Non-blocking (don't fail summarization if indexing fails)

  3. New API Endpoint
  POST /videos/search
  Request:
  {
    "query": "show me accident videos",
    "filters": {"tags": ["accident"], "metadata": {...}},
    "top_k": 10,
    "threshold": 0.7
  }

  Response:
  {
    "results": [
      {"vss_video_id": "...", "score": 0.89, "summary": "...", ...}
    ],
    "total_found": 15
  }

  4. Combined Search + Chat Endpoint (Optional)
  POST /videos/search-and-chat
  Request:
  {
    "search_query": "accident videos",
    "chat_question": "What caused these accidents?",
    "top_k": 5
  }

  Response:
  {
    "search_results": [...],
    "chat_response": {...}
  }

  ---
 **Implementation Details**

  Integration Points

  File 1: src/vss-engine/src/via_stream_handler.py
  - Add video indexing after video summary generation (around line 1350)
  - Call new VideoLevelIndexer service
  - Configuration flag: enable_video_search (default: true)

  File 2: src/vss-engine/src/via_server.py
  - Add /videos/search endpoint (around line 1900)
  - Add /videos/search-and-chat endpoint
  - Verify /chat/completions properly handles multiple video IDs

  File 3: src/vss-engine/src/vss_api_models.py
  - Add VideoSearchQuery model
  - Add VideoSearchResponse model
  - Add SearchAndChatQuery model

  File 4: src/vss-engine/src/video_search/ (NEW MODULE)
  - video_indexer.py: Milvus collection management
  - embedding_service.py: Embed video summaries (reuse existing models)

  Configuration

  # New environment variables
  ENABLE_VIDEO_SEARCH=true
  VIDEO_SEARCH_COLLECTION=video_summaries
  VIDEO_EMBEDDING_MODEL=nvidia/nv-embedqa-e5-v5
  VIDEO_SEARCH_TOP_K_DEFAULT=10

  ---
  **Use Cases**

  Use Case 1: Safety Monitoring

  Scenario: Warehouse with 1000+ hours of footageQuery: "Show me all forklift near-miss incidents"Flow:
  1. Search: Returns 15 relevant videos with scores
  2. User reviews top results
  3. Multi-chat: "What are common factors in these incidents?"

  Use Case 2: Compliance Audit

  Scenario: Review PPE compliance across facilitiesQuery: "Find videos where workers aren't wearing helmets"Flow:
  1. Search with metadata filter: {"facility": "warehouse_A"}
  2. Returns tagged videos
  3. Multi-chat: "Generate compliance report for these videos"

  Use Case 3: Pattern Analysis

  Scenario: Traffic analysis across multiple intersectionsQuery: "Analyze traffic patterns at high-congestion locations"Flow:
  1. Search: Find videos tagged "high_traffic"
  2. Multi-chat over top 10 videos
  3. LLM aggregates insights across all videos

  ---
 **Benefits**

  1. Improved Discoverability: Find videos by content, not just file names
  2. Multi-Video Intelligence: Ask questions spanning video collections
  3. Scalability: Works with large video libraries (1000s of videos)
  4. Backward Compatible: Optional feature, doesn't break existing APIs
  5. Reuses Infrastructure: Leverages existing Milvus/embedding setup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Video-Level Semantic Search and Multi-Video Chat Support #69

New environment variables

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Video-Level Semantic Search and Multi-Video Chat Support #69

Description

New environment variables

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions