Guide for migrating from the ViT-only embedding system to the multi-modal CLIP-based system.
- Old System: 768-dim ViT embeddings, single collection.
- New System: 512-dim CLIP visual and 768-dim text embeddings, multi-modal collection with named vectors and OCR extraction.
- Python:
pip install -r requirements.txt - System: FFmpeg (videos), CUDA (optional GPU).
- Data: Access to media files and Qdrant.
- Enable new uploads in Node.js:
USE_MULTIMODAL_EMBEDDINGS=true. - Keep old collection (
media_embeddings) for existing data.
Use the migration script to regenerate embeddings for old posts:
python scripts/migrate_embeddings.py --media-path /path/to/media- Week 1: 10% traffic.
- Week 2: 50% traffic.
- Week 3: 100% traffic.
- Node.js .env:
USE_MULTIMODAL_EMBEDDINGS=trueEMBEDDING_SERVER_URL=http://localhost:8000
- Qdrant: New collection
media_embeddings_v2uses named vectorsvisual(512) andtext(768).
- Endpoint:
POST /extract-multimodal(replaces legacy extractor). - Signals: Visual (30%), Text (25%), Engagement (20%), Recency (15%), Diversity (10%).
- Issue: OCR fails -> Check image quality/EasyOCR model.
- Rollback: Set
USE_MULTIMODAL_EMBEDDINGS=false. The system will automatically revert to legacy collections.