-
Notifications
You must be signed in to change notification settings - Fork 146
Description
Problem Statement
Currently, VSS supports chat/Q&A functionality only on a per-video basis. Users must know the specific video ID beforehand to query
it. There's no way to:
- Search across all videos semantically (e.g., "find all videos with accidents")
- Chat with multiple videos simultaneously (e.g., "analyze patterns across these 5 accident videos")
This limits VSS use cases where users have large video libraries and need to:
- Discover relevant videos based on content, not just metadata
- Ask questions that span multiple videos
- Analyze patterns/trends across video collections
Proposed Solution
Add video-level semantic search capabilities to VSS:
- Video-Level Indexing
- After video summarization completes, index the video summary into a dedicated Milvus collection
- Store embeddings of video summaries (not just frame-level embeddings)
- Include metadata: tags, custom fields, timestamps
- Cross-Video Search API
- New endpoint: POST /videos/search
- Input: Natural language query (e.g., "show me forklift accidents")
- Output: Ranked list of relevant video IDs with similarity scores
- Support filtering by tags, custom metadata, time ranges
- Multi-Video Chat Enhancement
- Enhance existing POST /chat/completions to properly aggregate context from multiple videos
- Current implementation accepts id: [list] but unclear if it truly aggregates multi-video context
- Ensure RAG retrieves relevant information across all specified videos
Technical Architecture
Proposed Components
-
Video-Level Milvus Collection
Collection: "video_summaries"
Schema:
- vss_video_id: VARCHAR (unique identifier)
- summary_embedding: FLOAT_VECTOR (embedded summary text)
- summary_text: VARCHAR (full summary for retrieval)
- tags: JSON (user-defined tags)
- custom_metadata: JSON (arbitrary fields)
- created_at: INT64 (timestamp) -
Video Indexing Hook
- Location: via_stream_handler.py after video summary generation
- Action: Embed summary → Insert into video-level collection
- Error handling: Non-blocking (don't fail summarization if indexing fails)
- New API Endpoint
POST /videos/search
Request:
{
"query": "show me accident videos",
"filters": {"tags": ["accident"], "metadata": {...}},
"top_k": 10,
"threshold": 0.7
}
Response:
{
"results": [
{"vss_video_id": "...", "score": 0.89, "summary": "...", ...}
],
"total_found": 15
}
- Combined Search + Chat Endpoint (Optional)
POST /videos/search-and-chat
Request:
{
"search_query": "accident videos",
"chat_question": "What caused these accidents?",
"top_k": 5
}
Response:
{
"search_results": [...],
"chat_response": {...}
}
Implementation Details
Integration Points
File 1: src/vss-engine/src/via_stream_handler.py
- Add video indexing after video summary generation (around line 1350)
- Call new VideoLevelIndexer service
- Configuration flag: enable_video_search (default: true)
File 2: src/vss-engine/src/via_server.py
- Add /videos/search endpoint (around line 1900)
- Add /videos/search-and-chat endpoint
- Verify /chat/completions properly handles multiple video IDs
File 3: src/vss-engine/src/vss_api_models.py
- Add VideoSearchQuery model
- Add VideoSearchResponse model
- Add SearchAndChatQuery model
File 4: src/vss-engine/src/video_search/ (NEW MODULE)
- video_indexer.py: Milvus collection management
- embedding_service.py: Embed video summaries (reuse existing models)
Configuration
New environment variables
ENABLE_VIDEO_SEARCH=true
VIDEO_SEARCH_COLLECTION=video_summaries
VIDEO_EMBEDDING_MODEL=nvidia/nv-embedqa-e5-v5
VIDEO_SEARCH_TOP_K_DEFAULT=10
Use Cases
Use Case 1: Safety Monitoring
Scenario: Warehouse with 1000+ hours of footageQuery: "Show me all forklift near-miss incidents"Flow:
- Search: Returns 15 relevant videos with scores
- User reviews top results
- Multi-chat: "What are common factors in these incidents?"
Use Case 2: Compliance Audit
Scenario: Review PPE compliance across facilitiesQuery: "Find videos where workers aren't wearing helmets"Flow:
- Search with metadata filter: {"facility": "warehouse_A"}
- Returns tagged videos
- Multi-chat: "Generate compliance report for these videos"
Use Case 3: Pattern Analysis
Scenario: Traffic analysis across multiple intersectionsQuery: "Analyze traffic patterns at high-congestion locations"Flow:
- Search: Find videos tagged "high_traffic"
- Multi-chat over top 10 videos
- LLM aggregates insights across all videos
Benefits
- Improved Discoverability: Find videos by content, not just file names
- Multi-Video Intelligence: Ask questions spanning video collections
- Scalability: Works with large video libraries (1000s of videos)
- Backward Compatible: Optional feature, doesn't break existing APIs
- Reuses Infrastructure: Leverages existing Milvus/embedding setup