48 lines (33 loc) · 1.56 KB

Migration Guide: ViT to Multi-Modal CLIP System

Guide for migrating from the ViT-only embedding system to the multi-modal CLIP-based system.

Overview

Old System: 768-dim ViT embeddings, single collection.
New System: 512-dim CLIP visual and 768-dim text embeddings, multi-modal collection with named vectors and OCR extraction.

Prerequisites

Python: pip install -r requirements.txt
System: FFmpeg (videos), CUDA (optional GPU).
Data: Access to media files and Qdrant.

Migration Strategy

Phase 1: Parallel Running

Enable new uploads in Node.js: USE_MULTIMODAL_EMBEDDINGS=true.
Keep old collection (media_embeddings) for existing data.

Phase 2: Data Migration

Use the migration script to regenerate embeddings for old posts:

python scripts/migrate_embeddings.py --media-path /path/to/media

Phase 3: Gradual Rollout

Week 1: 10% traffic.
Week 2: 50% traffic.
Week 3: 100% traffic.

Configuration

Node.js .env:
- USE_MULTIMODAL_EMBEDDINGS=true
- EMBEDDING_SERVER_URL=http://localhost:8000
Qdrant: New collection media_embeddings_v2 uses named vectors visual (512) and text (768).

API & Ranking

Endpoint: POST /extract-multimodal (replaces legacy extractor).
Signals: Visual (30%), Text (25%), Engagement (20%), Recency (15%), Diversity (10%).

Troubleshooting & Rollback

Issue: OCR fails -> Check image quality/EasyOCR model.
Rollback: Set USE_MULTIMODAL_EMBEDDINGS=false. The system will automatically revert to legacy collections.