Fix/connector page stack depth limit#5417
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Greptile Summary
This PR addresses PostgreSQL stack depth limit exceptions that occur when processing large numbers of connector-credential pairs by implementing batching and parallelization in the document counting functionality. The core change modifies get_document_counts_for_cc_pairs in backend/onyx/db/document.py to process CC pairs in batches of 1000 instead of generating massive IN clauses that exceed PostgreSQL's stack depth limits.
The implementation adds three key components:
- Batched processing: The main function now splits CC pairs into chunks of 1000 and processes them sequentially
- Parallel worker function:
_get_document_counts_for_cc_pairs_batchhandles individual batches with their own database sessions - Fully parallel variant:
get_document_counts_for_cc_pairs_batched_parallelprocesses all batches concurrently using the existing thread pool utilities
The consumer in backend/onyx/server/documents/connector.py is updated to use the new parallel implementation, removing the db_session parameter since the new function manages its own database connections internally. This change maintains the same API contract and return format while providing scalability for deployments with thousands of connector-credential pairs.
The solution leverages existing patterns in the codebase for concurrent processing and follows the principle of preserving backward compatibility while fixing critical scalability issues.
Confidence score: 4/5
- This PR addresses a well-defined PostgreSQL limitation with a proven batching strategy that should resolve stack depth issues
- Score reflects solid implementation using existing concurrency patterns, though the hardcoded batch size of 1000 could benefit from configuration
- Pay attention to the new parallel database session management in
_get_document_counts_for_cc_pairs_batchto ensure proper connection handling
2 files reviewed, no comments
1384f56 to
e95b450
Compare
e95b450 to
94e25e4
Compare
Description
Added batching and parallelization to get_document_counts_for_cc_pairs to fix Postgres stack depth issues when querying large numbers of connector-credential pairs.
PR -> https://linear.app/danswer/issue/DAN-2517/postgres-stack-depth-limit-exception
How Has This Been Tested?
Generated 19k cc-pairs and verified if the issue remains.
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.
Summary by cubic
Batch and parallelize document count queries for connector-credential pairs to prevent Postgres stack depth errors at scale. Resolves Linear DAN-2517 and improves connector page load time.