Skip to content

feat(rag): enhance search observability and simplify ingestion errorhandling#222

Open
sqhyz55 wants to merge 5 commits into
xorbitsai:mainfrom
sqhyz55:feat/rag-observability-async-optimization
Open

feat(rag): enhance search observability and simplify ingestion errorhandling#222
sqhyz55 wants to merge 5 commits into
xorbitsai:mainfrom
sqhyz55:feat/rag-observability-async-optimization

Conversation

@sqhyz55
Copy link
Copy Markdown
Collaborator

@sqhyz55 sqhyz55 commented Mar 26, 2026

Search Pipeline (document_search.py):

  • Add detailed timing logs for all search types (sparse/dense/hybrid)
  • Track and log encode/search/rerank latencies with user_id context
  • Add comprehensive search completion logging with performance metrics

Vector Storage (vector_manager.py):

  • Add diagnostic logging to detect user_id filter mismatches
  • Help debug permission/exclusion issues when chunks exist but user can't see them

Knowledge Base Tool (document_search.py):

  • Reduce verbose INFO logs to DEBUG level for collection iteration
  • Improve log performance by reducing noise in multi-collection searches

Web API (kb.py):

  • Introduce shared _ingest_executor thread pool (max_workers=10) for ingestion tasks
  • Use contextvars.copy_context() for proper context propagation in executors
  • Simplify delete_collection_api response structure
  • Simplify rename_collection_api with streamlined physical file handling
  • Add UploadedFile cleanup in delete_document_api

Copy link
Copy Markdown
Collaborator

@rogercloud rogercloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I found a few issues that should be fixed before merge.

Also, GitHub currently reports this PR as CONFLICTING. Please rebase on the latest main and resolve the merge conflicts before requesting another review.

Comment thread src/xagent/web/api/kb.py Outdated
Comment thread src/xagent/web/api/kb.py Outdated
Comment thread frontend/src/components/kb/knowledge-base-creation-dialog.tsx Outdated
@sqhyz55 sqhyz55 force-pushed the feat/rag-observability-async-optimization branch from d62438c to 127e8f6 Compare May 11, 2026 03:22
Copy link
Copy Markdown
Collaborator

@rogercloud rogercloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One follow-up issue from the rename re-review.

Comment thread src/xagent/web/api/kb.py Outdated
sqhyz55 added 5 commits May 19, 2026 16:39
…handling

**Search Pipeline (document_search.py)**:
- Add detailed timing logs for all search types (sparse/dense/hybrid)
- Track and log encode/search/rerank latencies with user_id context
- Add comprehensive search completion logging with performance metrics

**Vector Storage (vector_manager.py)**:
- Add diagnostic logging to detect user_id filter mismatches
- Help debug permission/exclusion issues when chunks exist but user can't see them

**Knowledge Base Tool (document_search.py)**:
- Reduce verbose INFO logs to DEBUG level for collection iteration
- Improve log performance by reducing noise in multi-collection searches

**Web API (kb.py)**:
- Introduce shared _ingest_executor thread pool (max_workers=10) for ingestion tasks
- Use contextvars.copy_context() for proper context propagation in executors
- Simplify delete_collection_api response structure
- Simplify rename_collection_api with streamlined physical file handling
- Add UploadedFile cleanup in delete_document_api
**Critical Fixes**:
- rename_collection_api: Fix DB↔File consistency by committing DB changes
  BEFORE physical move, with rollback on physical move failure
- delete_document_api: Prevent premature deletion of shared files by checking
  if other documents reference the same physical file before deletion
- delete_collection_api: Fix Pydantic model access (use attributes instead of
  .get() method) and restore physical cleanup status in warnings

**Bug Fixes**:
- Fix import error in rename_collection_api (sanitize_path_component is in
  config.py, not string_utils)
- Update tests to handle 409 Conflict when target directory already exists

**Test Updates**:
- test_kb_delete_returns_physical_cleanup_status: Check warnings for physical
  cleanup info (new API format)
- test_kb_rename_physical_directory_rename: Handle 409 Conflict when target
  exists from previous test run
- test_kb_rename_physical_rename_failure_aborts_operation: Handle 409
  Conflict and update error message assertion

These fixes address the critical issues identified in the RAG review:
1. DB↔File consistency in rename operations
2. Shared file deletion safety in delete_document_api
3. Proper API response format for physical cleanup status
…rations

Backend changes:
- Add shared thread pool _ingest_executor to prevent global thread pool exhaustion
- Implement graceful shutdown with timeout mechanism (30s) for ingestion executor
- Fix rename_collection_api cross-user data pollution by adding user_id filter
- Improve rename_collection_api transaction safety with pre-validation
- Simplify delete_document_api: remove complex UploadedFile and physical file handling
- Physical file cleanup is now handled at collection level only

Frontend changes:
- Extend IngestionResult interface with embedding_count, vector_count, warnings, failed_step
- Add toast warning when embedding_count is 0 but chunks_count > 0
- Improve error handling for both document and web ingestion with partial status warnings
…agnostics

- Extend VectorIndexStore.rename_collection_data with user_id and is_admin;
  LanceDB implementation builds the same filter as delete_collection_data
  before table.update so non-admins only rename their rows and admins rename
  all rows for the old collection key.
- Pass caller identity from rename_collection_api.
- When read_chunks_for_embedding finds zero rows under user scope, log at
  ERROR under DEBUG if admin-unscoped count shows hidden rows (user_id mismatch
  or legacy NULL ownership).
- Update lancedb store unit test for the new rename signature.
…licts

Reject non-admin renames when multiple users share a collection name, require
admin target_user_id for scoped renames, and align metadata/config updates with
per-user vector renames to prevent cross-tenant control-plane corruption.
@sqhyz55 sqhyz55 force-pushed the feat/rag-observability-async-optimization branch from 127e8f6 to c17779c Compare May 19, 2026 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants