Skip to content

Conversation

@rmolinamir
Copy link
Contributor

Summary

Centralized Document Embedding Architecture

Overview

This PR introduces a major refactor that consolidates the embedding system around a unified document-based approach, replacing separate embedding schemas for journals and pages with a centralized DocumentEmbedding system.

Key Changes

Architecture:

  • Unified Schema: Replaced separate JournalEmbedding and PageEmbedding schemas with a single DocumentEmbedding schema
  • Document: All content (journals, pages) now flows through the unified Document entity for embedding
  • tRPC: Split API into apiRouter (user-facing) and embedderRouter (internal embedding operations)

File Changes Summary

  • Removed: Old webhook handlers, separate embedding schemas
  • Added: New unified embedding system, webhook processor, embedder router
  • Refactored: API structure, search tools, and type definitions

Benefits

  1. Simplified Architecture: Single embedding table reduces complexity and maintenance overhead
  2. Better Performance: Unified schema enables more efficient queries and better indexing
  3. Enhanced Reliability: Webhook-driven processing with retry logic and failure handling
  4. Improved AI: Richer metadata and semantic chunking enhance search quality
  5. Future-Proof: Document-centric approach supports easier extension for new content types

@rmolinamir rmolinamir requested a review from jorgebaralt August 25, 2025 23:54
@rmolinamir rmolinamir self-assigned this Aug 25, 2025
@rmolinamir
Copy link
Contributor Author

Closes #40.

@rmolinamir rmolinamir merged commit 5f5a83c into main Aug 26, 2025
3 checks passed
@rmolinamir rmolinamir deleted the feature/document-embedding-worker branch August 26, 2025 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants