Skip to content

Latest commit

 

History

History
48 lines (33 loc) · 1.56 KB

File metadata and controls

48 lines (33 loc) · 1.56 KB

Migration Guide: ViT to Multi-Modal CLIP System

Guide for migrating from the ViT-only embedding system to the multi-modal CLIP-based system.

Overview

  • Old System: 768-dim ViT embeddings, single collection.
  • New System: 512-dim CLIP visual and 768-dim text embeddings, multi-modal collection with named vectors and OCR extraction.

Prerequisites

  1. Python: pip install -r requirements.txt
  2. System: FFmpeg (videos), CUDA (optional GPU).
  3. Data: Access to media files and Qdrant.

Migration Strategy

Phase 1: Parallel Running

  1. Enable new uploads in Node.js: USE_MULTIMODAL_EMBEDDINGS=true.
  2. Keep old collection (media_embeddings) for existing data.

Phase 2: Data Migration

Use the migration script to regenerate embeddings for old posts:

python scripts/migrate_embeddings.py --media-path /path/to/media

Phase 3: Gradual Rollout

  1. Week 1: 10% traffic.
  2. Week 2: 50% traffic.
  3. Week 3: 100% traffic.

Configuration

  • Node.js .env:
    • USE_MULTIMODAL_EMBEDDINGS=true
    • EMBEDDING_SERVER_URL=http://localhost:8000
  • Qdrant: New collection media_embeddings_v2 uses named vectors visual (512) and text (768).

API & Ranking

  • Endpoint: POST /extract-multimodal (replaces legacy extractor).
  • Signals: Visual (30%), Text (25%), Engagement (20%), Recency (15%), Diversity (10%).

Troubleshooting & Rollback

  • Issue: OCR fails -> Check image quality/EasyOCR model.
  • Rollback: Set USE_MULTIMODAL_EMBEDDINGS=false. The system will automatically revert to legacy collections.