Phase5/dev#72
Conversation
…mprovements This includes: - Switch to Qdrant Cloud managed service - Optimization to fetch content directly from Qdrant payloads (reducing DB roundtrips) - Implementation of Qdrant Query API (Prefetch + Fusion) for server-side RRF - Added SPLADE memory/storage warnings for Cloud Free Tier - Updated DB schema changes for dev phase (force-reset approach)
- Implemented AES-256-GCM encryption for keys - Added Neural Sparse Embeddings (HybridEmbedder) - Integrated Qdrant Cloud for vector storage - Implemented Outbox Pattern for reliable sync - Migrated search to use Qdrant Hybrid Search
- 5A: AES-256-GCM encryption for OAuth tokens - 5B: Hybrid embedder (dense + sparse via fastembed) - 5C: Qdrant integration with collection setup - 5D: Outbox pattern for sync queue - 5E: Search migration to Qdrant hybrid - 5F: Per-user OAuth with encrypted tokens - 5G: Qdrant provider indicator in UI - 5H: Sync status (PENDING/SYNCED/FAILED) in chunks - 5I: Settings dashboard with system status - 5J: Analytics enhancements with sync metrics - 5K: Google Drive Folder Picker integration - 5L: Sync frequency dropdown presets
- feat(backend): implement Promise.all batch parallel DB insertion in callback-route for 10-20x speedup - refactor(phase5): cleanup obsolete pgvector/hybrid search code (schema, services, api) - refactor(phase5): consolidate DriveConfig to DriveFolder model - refactor(frontend): remove obsolete 'Balance' slider and alpha parameter - docs: update roadmap, architecture, and api docs for Phase 5 completion
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
|||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
||||||||||||||||||||||||||||
PR Type
Enhancement, Tests, Documentation
Description
Phase 5 Production Infrastructure Implementation
Qdrant Vector Database Integration: Added
QdrantServicefor hybrid search with dense + sparse vectors using RRF fusion, replacing pgvector-based searchHybrid Embeddings: Implemented
HybridEmbedderin AI Worker generating both dense (384-dim BGE-small) and sparse (BM25) vectors with new/embed/queryendpointVector Sync Infrastructure: Created
QdrantSyncProcessorwith outbox pattern, batch processing, and exponential backoff to sync chunks from PostgreSQL to QdrantPer-User OAuth for Google Drive: Added
UserDriveServicewith AES-256-GCM encrypted token storage, Google OAuth routes, and Drive Picker integration for folder selectionEncryption Service: Implemented
EncryptionServicefor authenticated encryption of sensitive data (OAuth refresh tokens)Schema Updates: Renamed
DriveConfigtoDriveFolder, addedChunkSyncStatusenum (PENDING/SYNCED/FAILED), added vector fields to chunks, newDriveOAuthmodelSystem Health Dashboard: Added
/api/systemendpoint andSettingsPagecomponent showing Qdrant status, AI Worker health, encryption config, and OAuth setupAnalytics Enhancement: Added sync queue statistics (PENDING/SYNCED/FAILED counts) to analytics overview
Frontend Components: New
GoogleOAuthSection,SyncFrequencySelect,AddFolderModalwith Drive Picker, and sync status badges on chunk cardsTest Coverage: Added Qdrant service integration tests, hybrid embedder tests, encryption service tests; skipped search/query tests pending Qdrant availability
Documentation: Updated architecture, API contracts, data flow, and roadmap; added detailed Phase 5 implementation plan and Phase 6 SaaS roadmap
Removed Legacy Code: Deleted old
HybridSearchService,Embedder, and hybrid search initialization codeDiagram Walkthrough
File Walkthrough
11 files
search-route.test.ts
Skip search tests, add Qdrant unavailability testsapps/backend/tests/integration/routes/search-route.test.ts
describe.skip()due to Phase 5Qdrant requirement
filtering, and modes
SEARCH_UNAVAILABLE) behavior
query-flow.test.ts
Skip E2E query tests, add Qdrant unavailability testapps/backend/tests/e2e/query-flow.test.ts
describe.skip()due to Qdrantrequirement
metadata)
configured
qdrant.service.test.ts
Add Qdrant service integration testsapps/backend/tests/integration/qdrant.service.test.ts
instance
deletion
encryption.service.test.ts
Add encryption service unit testsapps/backend/tests/unit/encryption.service.test.ts
tampering
multi-format-flow.test.ts
Remove query tests from multi-format E2E flowsapps/backend/tests/e2e/multi-format-flow.test.ts
Completed"
assertions
sync-service-relink.test.ts
Update sync service tests for UserDriveServiceapps/backend/tests/unit/services/sync-service-relink.test.ts
getDriveServicetoUserDriveService.create()driveConfigIdreferences todriveFolderIdUserDriveService.create()patternpdf-upload-flow.test.ts
Remove query test from PDF upload E2E flowapps/backend/tests/e2e/pdf-upload-flow.test.ts
"Upload → Queue → Callback → Chunks"
database.ts
Update test database cleanup for DriveFolderapps/backend/tests/helpers/database.ts
driveFolder.deleteMany()instead ofdriveConfig.deleteMany()test_hybrid_embedder.py
Add comprehensive tests for hybrid embedderapps/ai-worker/tests/test_hybrid_embedder.py
HybridEmbedderfunctionalityprocessing
test_existing_formats.py
Update regression tests for hybrid embedderapps/ai-worker/tests/regression/test_existing_formats.py
HybridEmbedderinstead ofEmbedderHybridVectorobjects
test_main.py
Update main.py tests for hybrid embedder mockingapps/ai-worker/tests/test_main.py
HybridEmbedderat module level inmain.pyembed()toembed_dense_only()dense-only method
35 files
sync-service.ts
Migrate to UserDriveService with OAuth supportapps/backend/src/services/sync-service.ts
DriveServicewithUserDriveServicefor per-user OAuth supportdriveConfigtodriveFolderthroughout (schema rename)UserDriveServiceviagetDrive()methodUserDriveService.create()qdrant.service.ts
Add Qdrant Cloud vector database serviceapps/backend/src/services/qdrant.service.ts
search
collection management
user-drive-service.ts
Add per-user OAuth Google Drive serviceapps/backend/src/services/user-drive-service.ts
5F)
google.route.ts
Add Google OAuth routes with encrypted token storageapps/backend/src/routes/oauth/google.route.ts
→ disconnect
qdrant-sync.processor.ts
Add Qdrant sync queue processor with batchingapps/backend/src/queue/qdrant-sync.processor.ts
(Phase 5D)
backoff
config-routes.ts
Rename driveConfig to driveFolder, use UserDriveServiceapps/backend/src/routes/drive/config-routes.ts
driveConfigtodriveFolderthroughout (schema alignment)UserDriveService.create()instead ofgetDriveService()folderNameparameter from Google Picker to avoid extraAPI call
driveFolder:deletedcallback-route.ts
Add batch chunk insertion and Qdrant sync job enqueueapps/backend/src/routes/internal/callback-route.ts
inserts)
legacy embeddings
sync_status,dense_vector,sparse_indices,sparse_valuesinchunks table
qdrant-hybrid-search.ts
Add Qdrant hybrid search service with AI Worker integrationapps/backend/src/services/qdrant-hybrid-search.ts
search-route.ts
Replace pgvector search with Qdrant-only implementationapps/backend/src/routes/query/search-route.ts
EmbeddingClientandHybridSearchServicedependenciesalphaparameteruseDrivePicker.ts
Add Google Drive Picker React hookapps/frontend/src/hooks/useDrivePicker.ts
encryption.service.ts
Add AES-256-GCM encryption serviceapps/backend/src/services/encryption.service.ts
endpoints.ts
Update API types for Qdrant and sync queue supportapps/frontend/src/api/endpoints.ts
alphaparameter fromSearchParamsandSearchResponseSearchResponse.modeto includeqdrant_hybridoptionproviderfield to indicate Qdrant vs pgvector backenddriveApi.createConfigto accept optionalfolderNameparametersyncQueuestats toAnalyticsOverviewinterfacesyncStatusfield toChunkListIteminterfacehealth-route.ts
Add system status endpoint for Settings dashboardapps/backend/src/routes/health-route.ts
/api/systemendpoint for Settings dashboard (Phase 5I)config
overview-route.ts
Add sync queue statistics to analytics overviewapps/backend/src/routes/analytics/overview-route.ts
syncQueueobject in analytics responseworker-init.ts
Initialize Qdrant sync worker and queueapps/backend/src/queue/worker-init.ts
qdrantQueueandqdrantWorkerif Qdrant configuredgetQdrantSyncQueue()function for job enqueueingapp.ts
Register OAuth routes, remove hybrid search initapps/backend/src/app.ts
oauthRoutes(app)initializeHybridSearch)callback-validator.ts
Add hybrid vector schema support to callback validatorapps/backend/src/validators/callback-validator.ts
SparseVectorSchemafor sparse vector indices/values (Phase 5B)HybridVectorSchemacombining dense + sparse vectorsProcessingResultSchemato support both legacyembeddingandnew
vectorformatssync-routes.ts
Rename driveConfig to driveFolder in sync routesapps/backend/src/routes/drive/sync-routes.ts
driveConfigtodriveFolderin two route handlersdriveFoldermodelindex.ts
Update service exports for Qdrant and encryptionapps/backend/src/services/index.ts
EncryptionServiceandgetEncryptionServiceexportsQdrantService,QdrantHybridSearchServiceexports with typedefinitions
HybridSearchServiceandinitializeHybridSearchexportsoauth.ts
OAuth API client for Google Drive integrationapps/frontend/src/api/oauth.ts
OAuthStatusinterface andoauthApiobject with methodsOAuth flow initiation
chunks-route.ts
Add syncStatus field to chunk responsesapps/backend/src/routes/chunks/chunks-route.ts
syncStatusfield to chunk selection in database querysyncStatusin response payload mappinghybrid_embedder.py
Implement hybrid embedder with dense and sparse vectorsapps/ai-worker/src/hybrid_embedder.py
HybridEmbedderclass implementing dense + sparse vector generationfastembedfor both BGE-small (384d dense) and BM25 (sparse)embeddings
counting
SparseVectorandHybridVectordataclasses for structuredoutput
pipeline.py
Integrate hybrid embedder into processing pipelineapps/ai-worker/src/pipeline.py
Embedderimport withHybridEmbedderembed()returningHybridVectorobjectsseparately
main.py
Add hybrid embedding endpoints to AI worker APIapps/ai-worker/src/main.py
HybridEmbedderimport at module level/embedendpoint to useembed_dense_only()for backwardcompatibility
/embed/queryendpoint for hybrid embeddings with dense + sparsevectors
HybridEmbedResponsewith both vector types for Qdrant searchmodels.py
Add hybrid embedding response modelsapps/ai-worker/src/models.py
SparseVectorModel,HybridVectorModel,HybridEmbedRequest,HybridEmbedResponseformats
SettingsPage.tsx
Add system settings and health dashboardapps/frontend/src/components/settings/SettingsPage.tsx
OAuth configuration
GoogleOAuthSection.tsx
Add Google OAuth connection UI componentapps/frontend/src/components/drive/GoogleOAuthSection.tsx
AddFolderModal.tsx
Integrate Google Drive Picker and sync frequency dropdownapps/frontend/src/components/drive/AddFolderModal.tsx
useDrivePickerhook for native folder selectionSyncFrequencySelectdropdown componenthandling
results-list.tsx
Add Qdrant provider badge to search resultsapps/frontend/src/components/query/results-list.tsx
providerprop to display Qdrant badge when applicablesearch-form.tsx
Remove manual alpha tuning from search formapps/frontend/src/components/query/search-form.tsx
providerinstead ofalphaSyncFrequencySelect.tsx
New sync frequency dropdown componentapps/frontend/src/components/drive/SyncFrequencySelect.tsx
manual-only option
frequency purpose
AnalyticsPage.tsx
Add Qdrant sync queue analytics displayapps/frontend/src/components/analytics/AnalyticsPage.tsx
dashboard
PENDING,SYNCED, andFAILEDcounts
indication
App.tsx
Integrate dedicated settings page componentapps/frontend/src/App.tsx
SettingsPagecomponent from settings moduleSettingsPagecomponent
placeholder text
DriveSyncTab.tsx
Add Google OAuth section and folder name supportapps/frontend/src/components/drive/DriveSyncTab.tsx
GoogleOAuthSectioncomponenthandleCreatefunction to accept and passfolderNameparameterGoogleOAuthSectioncomponent at top of drive sync tabChunkCard.tsx
Add sync status badge to chunk cardsapps/frontend/src/components/chunks/ChunkCard.tsx
SYNCED,PENDING, andFAILEDstatessyncStatusis present2 files
event-bus.ts
Rename drive config event to drive folderapps/backend/src/services/event-bus.ts
driveConfig:deletedtodriveFolder:deletedcron.ts
Update cron jobs to use DriveFolder modelapps/backend/src/jobs/cron.ts
driveConfig.findMany()todriveFolder.findMany()3 files
vite-env.d.ts
Add Vite environment variable type definitionsapps/frontend/src/vite-env.d.ts
VITE_API_URL,VITE_GOOGLE_PICKER_API_KEY, andVITE_GOOGLE_CLIENT_IDschema.prisma
Add Qdrant sync status and OAuth encryption to schemaapps/backend/prisma/schema.prisma
DriveConfigmodel toDriveFolderwith updated field mappingsChunkSyncStatusenum (PENDING, SYNCED, FAILED) for Qdrant synctracking
Chunk:syncStatus,denseVector,sparseIndices,sparseValuesDriveOAuthmodel for encrypted OAuth token storage withAES-256-GCM fields
.env.example
Add frontend environment configuration templateapps/frontend/.env.example
VITE_API_URLfor backend API endpointconfiguration
7 files
roadmap-phase5.md
Refactor Phase 5 roadmap to production infrastructuredocs/roadmap-phase5.md
Infrastructure
AES-256-GCM encryption
roadmap-phase6.md
Add Phase 6 multi-tenant SaaS roadmapdocs/roadmap-phase6.md
detailed-plan-phase5.md
Add detailed Phase 5 implementation plandocs/detailed-plan-phase5.md
integration, outbox pattern
criteria
roadmap.md
Update roadmap with Phase 5 and Phase 6docs/roadmap.md
roadmap
architecture.md
Update architecture documentation for Phase 5docs/architecture.md
sparse)
api.md
Update API contracts for Phase 5 changesdocs/api.md
syncStatusfield to Chunk interfaceDriveConfigtoDriveFolderin API contractsDriveOAuthinterface for encrypted token storagedata-flow.md
Update data flow documentation for Qdrant searchdocs/data-flow.md
3 files
pnpm-lock.yaml
Add Qdrant client library dependenciespnpm-lock.yaml
@qdrant/js-client-rest@1.16.2dependency for Qdrant integration@qdrant/openapi-typescript-fetch@1.2.6as transitive dependencyundici@6.22.0for HTTP client supportpackage.json
Add Qdrant JavaScript client dependencyapps/backend/package.json
@qdrant/js-client-restdependency version^1.16.2requirements.txt
Add fastembed dependency for hybrid embeddingsapps/ai-worker/requirements.txt
fastembed>=0.4.0dependency for Phase 5 hybrid embeddings1 files
document-list.tsx
Update document list filter parameter namingapps/frontend/src/components/documents/document-list.tsx
driveConfigIdtodriveFolderId8 files