data_preprocessing.py
- Query transformation implementationthree_vector_embedding.py
- Three-vector architecture implementationuser_embeddings.py
- User embeddings in latent space implementationtwo_tower_model.py
- Two-tower model implementationadvanced_search.py
- Advanced search implementationimproved_recommender.py
- Main recommendation system implementation
reccs/
- Main package directory with models, utils, and testsmodels/
- Neural network models for recommendationutils/
- Utility functions for recommendation systemtests/
- Test cases for recommendation componentsdata/
- Data processing utilities
These files are not part of the core recommendation system and are used for snapshot management and debugging:
upload_snapshots.py
- Uploads snapshots to Qdrantextract_snapshot_data.py
- Extracts data from Qdrant snapshotsfix_snapshot_config.py
- Fixes snapshot configuration issuesfix_snapshot_config_backup.py
- Backup of snapshot configuration fixrestore_snapshots.py
- Restores collections from snapshotscheck_collections.py
- Verifies collection statusexamine_element_payload.py
- Debug tool for examining element dataexamine_user_payload.py
- Debug tool for examining user dataexamine_users.py
- Debug tool for user data analysistest_qdrant.py
- Simple Qdrant connection test
This recommendation system implements five major improvements that address common challenges in content recommendation:
- Challenge: Basic search terms can return inappropriate or irrelevant content
- Solution: Dual-approach system combining rule-based replacement and neural transformation
- Advantage: Preserves search intent while ensuring appropriate results through intelligent query rewriting
- Challenge: Single CLIP embeddings don't capture all relevant content signals
- Solution: Combined representation using image, text, and metadata vectors with adaptive weighting
- Advantage: Richer content representation that captures multiple aspects of items
- Challenge: Users and content exist in separate representation spaces
- Solution: Neural transformation of user interactions into CLIP-compatible embeddings
- Advantage: Enables direct similarity computation between users and content
- Challenge: Independent learning of user and item representations
- Solution: Contrastive learning approach with separate user and item towers
- Advantage: Joint optimization of user and item representations for better matching
- Challenge: Basic vector similarity search has limited capabilities
- Solution: Hybrid search combining vector similarity, text relevance, and diversity reranking
- Advantage: More comprehensive search that balances multiple relevance signals
This project enhances a recommendation system with five major improvements that address key challenges in content recommendation:
- Query Transformation
- Three-Vector Architecture
- User Embeddings in Latent Space
- Two-Tower Model
- Advanced Search
The system is designed to provide more relevant, diverse, and appropriate recommendations while placing users and content in the same latent space for direct similarity computation.
Problem: Basic search terms can produce unwanted content (e.g., "girl" returning sensual images).
Solution: A dual-approach system that transforms problematic queries:
- Rule-based replacement: Dictionary of problematic terms with safer alternatives
- Neural transformation: Sequence-to-sequence model that rewrites queries while preserving core intent
transformed_query = query_transformer.transform_query("girl with a camera")
# Output: "woman with a camera"
Advantages over alternatives:
- More flexible than pure rule-based systems
- More efficient than post-filtering
- Better preserves search intent than query blocking
Problem: CLIP embeddings alone don't capture all relevant signals, and metadata is unused.
Solution: A unified representation system with three vector types:
- Image Vector: CLIP embedding of the visual content
- Text Description Vector: Generated from LLM captioning
- Metadata Vector: Structured attributes encoded separately
combined_vector = three_vector_encoder(
image=image_tensor,
text_description={"text_ids": text_ids, "attention_mask": mask},
metadata={"categorical_category": category_id, "text_tags": tags}
)
Dynamic weighting: Neural attention mechanism that adaptively weights vectors based on query context.
Advantages over alternatives:
- Richer representation than single-vector approaches
- More adaptable than fixed-weight combinations
- Preserves original signal fidelity better than early fusion
Problem: Users not represented in the same space as content, making direct similarity impossible.
Solution: Neural transformation of user interaction history into CLIP-compatible embeddings:
- Sequential modeling: Transformer that encodes interaction sequences
- CLIP projection: Maps user representations to content space
- Adaptive interest mapping: Captures multiple facets of user preferences
user_embedding = user_embedding_model(
item_embeddings=history_sequence,
attention_mask=sequence_mask
)
# Result: User embedding in the same space as content
Advantages over alternatives:
- More expressive than matrix factorization
- Directly comparable to content embeddings
- Self-updating as users interact with content
Problem: No joint learning of user and item representations.
Solution: Contrastive learning approach with:
- User tower: Processes interaction histories
- Item tower: Processes content features
- Contrastive objective: Brings positive user-item pairs closer together
loss = two_tower_model.compute_contrastive_loss(
user_embeddings=user_emb,
positive_items=pos_items,
negative_items=neg_items
)
Advantages over alternatives:
- More efficient inference than cross-attention
- Better quality than independent encoders
- Enables fast retrieval at scale
Problem: Limited search capabilities that miss relevant content.
Solution: Hybrid search combining:
- Vector similarity: For semantic understanding
- Text relevance: BM25 scoring for lexical matching
- Diversity reranking: Promotes variety in results
- Content steering: Adjustable preference vectors
results = advanced_search.search(
query="mountain landscape",
preference="artistic",
preference_weight=0.7,
use_diversity=True
)
Advantages over alternatives:
- Balances semantic and lexical matching
- Produces more diverse result sets
- Allows steering without retraining
The combined vector generation process:
1. Query → QueryTransformer → Transformed Query
2. Transformed Query → ThreeVectorEncoder → Query Vector
3. Query Vector + Filters → AdvancedSearch → Results
The system implements incremental vector updates:
1. New Content → ThreeVectorEncoder → Content Vector
2. Content Vector → VectorStorage.add() → Indexed Vector
3. User Interactions → UserEmbeddingModel → Updated User Vector
The recommendation system is designed to be self-running with minimal tuning:
- Automatic query transformation based on problematic term detection
- Learned vector weights adapted to query context
- Continuous user embedding updates based on interactions
- Diversity parameters self-adjusting based on result analysis
torch>=1.9.0
transformers>=4.11.0
qdrant-client>=0.11.0
nltk>=3.6.0
numpy>=1.20.0
# Initialize components
query_transformer = QueryTransformer()
three_vector_encoder = ThreeVectorEncoder()
user_embedding_model = UserEmbeddingModel()
advanced_search = AdvancedSearch(qdrant_client, "content_collection")
# Transform query
safe_query = query_transformer.transform_query("girl with camera")
# Generate query vector
query_vector = three_vector_encoder.encode_query(
query_text={"text_ids": tokenize(safe_query)}
)
# Get user embedding
user_vector = user_embedding_model.compute_user_embedding(user_history)
# Search with combined signals
results = advanced_search.search(
query=safe_query,
query_vector=query_vector,
user_vector=user_vector,
limit=20
)
- Vector quantization reduces storage requirements
- Batched processing for neural transformations improves throughput
- Caching frequently accessed embeddings reduces latency
- Partitioned vector indices enable efficient filtering
- Pre-computed cluster centroids for faster initial retrieval
- Multi-stage ranking with lightweight first-pass filters
- Cross-modal alignment fine-tuning
- Federated user representation learning