Skip to content

v1.0.0

Latest

Choose a tag to compare

@dgtlss dgtlss released this 24 Dec 15:20

Semantica is a powerful Laravel package that brings semantic search capabilities to your applications using vector embeddings. It enables developers to implement advanced content search functionality in blogs, e-commerce platforms, knowledge bases, and other content-heavy applications by understanding semantic meaning rather than just keyword matching.

🚀 Key Features

Semantic Search Engine

  • Perform intelligent searches based on meaning and context, not just exact keywords
  • Configurable similarity thresholds (default 0.7) for fine-tuning result relevance
  • Support for multiple similarity metrics: cosine similarity (default), Euclidean distance, and dot product
  • Result limiting (max 100) with performance optimisations for large datasets

Multi-Provider AI Support

  • OpenAI: High-quality embeddings via text-embedding-3-small model (1536 dimensions)
  • Google Gemini: Cost-effective embeddings via Generative AI API (768 dimensions)
  • Ollama: Run embedding models locally for privacy and cost control (768 dimensions with nomic-embed-text)
  • Extensible provider interface for adding custom embedding services
  • Automatic provider validation and error handling

Eloquent Integration

  • HasEmbeddings trait for automatic embedding generation on model save/update/delete
  • Polymorphic embedding relationships supporting any Eloquent model
  • Configurable embedding fields per model (defaults to 'content')
  • Customisable embedding text concatenation via getEmbeddingFields() method

Performance & Scalability

  • Built-in caching layer with configurable TTL (default 1 hour)
  • Batch processing for efficient bulk embedding generation (configurable batch size: 100)
  • Database indexing on embedding relationships and timestamps
  • Memory-safe processing with limits on embedding count (10,000) to prevent abuse

Developer Experience

  • Laravel facade (Semantica::search()) for easy integration
  • Comprehensive Artisan commands:
    • php artisan semantica:index ModelClass - Generate embeddings for existing records
    • php artisan semantica:reindex ModelClass - Rebuild embeddings with updated provider/model
    • php artisan semantica:reindex --all - Reindex all embedded models
  • Progress bars and chunked processing for long-running operations

🔧 Configuration Options

Provider Configuration

  • Environment-based provider selection (SEMANTICA_PROVIDER)
  • API key management via environment variables
  • Model and dimension configuration per provider

Security & Safety

  • Auto-embedding disabled by default - requires explicit SEMANTICA_AUTO_EMBED=true to enable
  • Input sanitisation: HTML tag removal, whitespace normalisation, 8KB text length limit
  • Embedding vectors hidden from JSON responses for security
  • API key validation and secure storage (never logged)
  • HTTPS enforcement for all external API calls

Operational Controls

  • Similarity threshold clamping (0.0-1.0 range)
  • Configurable caching (enabled by default)
  • Batch size limits for performance
  • Model class validation to prevent injection attacks

🛡️ Security Features

Data Protection

  • Text content sanitization before sending to external APIs
  • Embedding vectors stored as JSON arrays with metadata tracking
  • Automatic cleanup of embeddings when models are deleted
  • No sensitive data exposure in logs or responses

Input Validation

  • Empty query prevention with descriptive error messages
  • Model class existence and Eloquent validation
  • Search result capping and similarity score validation
  • Timeout protection (30s for cloud APIs, 60s for local Ollama)

Network Security

  • HTTP client with retry logic and timeouts
  • API failure logging without exposing sensitive response bodies
  • Rate limiting through external API constraints

📊 Database Schema

Polymorphic embeddings table with:

  • Morphs relationship (embeddable_type, embeddable_id)
  • JSON embedding storage with proper indexing
  • Metadata tracking (provider, model, timestamp)
  • Optimised indexes for search performance

 🧪 Quality Assurance

Testing & Analysis

  • Comprehensive Pest test suite covering embedding generation and search
  • PHPStan static analysis with strict level 8 configuration
  • Automated code quality tools (Rector for refactoring)
  • Larastan integration for Laravel-specific analysis

Code Quality Standards

  • Full type declarations and PHPStan compliance
  • PSR-4 autoloading and composer package structure
  • MIT license with clear usage guidelines

🎯 Use Cases

  • Content Management: Enhanced blog/article search with semantic understanding
  • E-commerce: Product recommendation based on descriptions and reviews
  • Knowledge Bases: Intelligent FAQ and documentation search
  • Customer Support: Ticket categorisation and similar case finding
  • Research Platforms: Academic paper and document similarity search

🔄 Future Roadmap

  • Vector database integrations (Pinecone, Weaviate)
  • Additional similarity metrics and search algorithms
  • Embedding model fine-tuning capabilities
  • Multi-language embedding support
  • Advanced filtering and faceted search