Semantica is a powerful Laravel package that brings semantic search capabilities to your applications using vector embeddings. It enables developers to implement advanced content search functionality in blogs, e-commerce platforms, knowledge bases, and other content-heavy applications by understanding semantic meaning rather than just keyword matching.
🚀 Key Features
Semantic Search Engine
- Perform intelligent searches based on meaning and context, not just exact keywords
- Configurable similarity thresholds (default 0.7) for fine-tuning result relevance
- Support for multiple similarity metrics: cosine similarity (default), Euclidean distance, and dot product
- Result limiting (max 100) with performance optimisations for large datasets
Multi-Provider AI Support
- OpenAI: High-quality embeddings via text-embedding-3-small model (1536 dimensions)
- Google Gemini: Cost-effective embeddings via Generative AI API (768 dimensions)
- Ollama: Run embedding models locally for privacy and cost control (768 dimensions with nomic-embed-text)
- Extensible provider interface for adding custom embedding services
- Automatic provider validation and error handling
Eloquent Integration
- HasEmbeddings trait for automatic embedding generation on model save/update/delete
- Polymorphic embedding relationships supporting any Eloquent model
- Configurable embedding fields per model (defaults to 'content')
- Customisable embedding text concatenation via getEmbeddingFields() method
Performance & Scalability
- Built-in caching layer with configurable TTL (default 1 hour)
- Batch processing for efficient bulk embedding generation (configurable batch size: 100)
- Database indexing on embedding relationships and timestamps
- Memory-safe processing with limits on embedding count (10,000) to prevent abuse
Developer Experience
- Laravel facade (Semantica::search()) for easy integration
- Comprehensive Artisan commands:
- php artisan semantica:index ModelClass - Generate embeddings for existing records
- php artisan semantica:reindex ModelClass - Rebuild embeddings with updated provider/model
- php artisan semantica:reindex --all - Reindex all embedded models
- Progress bars and chunked processing for long-running operations
🔧 Configuration Options
Provider Configuration
- Environment-based provider selection (SEMANTICA_PROVIDER)
- API key management via environment variables
- Model and dimension configuration per provider
Security & Safety
- Auto-embedding disabled by default - requires explicit SEMANTICA_AUTO_EMBED=true to enable
- Input sanitisation: HTML tag removal, whitespace normalisation, 8KB text length limit
- Embedding vectors hidden from JSON responses for security
- API key validation and secure storage (never logged)
- HTTPS enforcement for all external API calls
Operational Controls
- Similarity threshold clamping (0.0-1.0 range)
- Configurable caching (enabled by default)
- Batch size limits for performance
- Model class validation to prevent injection attacks
🛡️ Security Features
Data Protection
- Text content sanitization before sending to external APIs
- Embedding vectors stored as JSON arrays with metadata tracking
- Automatic cleanup of embeddings when models are deleted
- No sensitive data exposure in logs or responses
Input Validation
- Empty query prevention with descriptive error messages
- Model class existence and Eloquent validation
- Search result capping and similarity score validation
- Timeout protection (30s for cloud APIs, 60s for local Ollama)
Network Security
- HTTP client with retry logic and timeouts
- API failure logging without exposing sensitive response bodies
- Rate limiting through external API constraints
📊 Database Schema
Polymorphic embeddings table with:
- Morphs relationship (embeddable_type, embeddable_id)
- JSON embedding storage with proper indexing
- Metadata tracking (provider, model, timestamp)
- Optimised indexes for search performance
🧪 Quality Assurance
Testing & Analysis
- Comprehensive Pest test suite covering embedding generation and search
- PHPStan static analysis with strict level 8 configuration
- Automated code quality tools (Rector for refactoring)
- Larastan integration for Laravel-specific analysis
Code Quality Standards
- Full type declarations and PHPStan compliance
- PSR-4 autoloading and composer package structure
- MIT license with clear usage guidelines
🎯 Use Cases
- Content Management: Enhanced blog/article search with semantic understanding
- E-commerce: Product recommendation based on descriptions and reviews
- Knowledge Bases: Intelligent FAQ and documentation search
- Customer Support: Ticket categorisation and similar case finding
- Research Platforms: Academic paper and document similarity search
🔄 Future Roadmap
- Vector database integrations (Pinecone, Weaviate)
- Additional similarity metrics and search algorithms
- Embedding model fine-tuning capabilities
- Multi-language embedding support
- Advanced filtering and faceted search