Skip to content

Fix/connector page stack depth limit#5417

Merged
Subash-Mohan merged 2 commits intomainfrom
fix/connector-page-stack-depth-limit
Sep 22, 2025
Merged

Fix/connector page stack depth limit#5417
Subash-Mohan merged 2 commits intomainfrom
fix/connector-page-stack-depth-limit

Conversation

@Subash-Mohan
Copy link
Contributor

@Subash-Mohan Subash-Mohan commented Sep 15, 2025

Description

Added batching and parallelization to get_document_counts_for_cc_pairs to fix Postgres stack depth issues when querying large numbers of connector-credential pairs.
PR -> https://linear.app/danswer/issue/DAN-2517/postgres-stack-depth-limit-exception

How Has This Been Tested?

Generated 19k cc-pairs and verified if the issue remains.

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Summary by cubic

Batch and parallelize document count queries for connector-credential pairs to prevent Postgres stack depth errors at scale. Resolves Linear DAN-2517 and improves connector page load time.

  • Bug Fixes
    • Split IN-clause queries into batches of 1000 to avoid stack depth issues.
    • Added get_document_counts_for_cc_pairs_batched_parallel with per-batch DB sessions and parallel execution.
    • Switched get_connector_indexing_status to use the new batched parallel function.
    • Verified with ~19k cc-pairs; no stack depth exceptions.

@Subash-Mohan Subash-Mohan requested a review from a team as a code owner September 15, 2025 06:44
@vercel
Copy link

vercel bot commented Sep 15, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
internal-search Ready Ready Preview Comment Sep 22, 2025 11:44am

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR addresses PostgreSQL stack depth limit exceptions that occur when processing large numbers of connector-credential pairs by implementing batching and parallelization in the document counting functionality. The core change modifies get_document_counts_for_cc_pairs in backend/onyx/db/document.py to process CC pairs in batches of 1000 instead of generating massive IN clauses that exceed PostgreSQL's stack depth limits.

The implementation adds three key components:

  1. Batched processing: The main function now splits CC pairs into chunks of 1000 and processes them sequentially
  2. Parallel worker function: _get_document_counts_for_cc_pairs_batch handles individual batches with their own database sessions
  3. Fully parallel variant: get_document_counts_for_cc_pairs_batched_parallel processes all batches concurrently using the existing thread pool utilities

The consumer in backend/onyx/server/documents/connector.py is updated to use the new parallel implementation, removing the db_session parameter since the new function manages its own database connections internally. This change maintains the same API contract and return format while providing scalability for deployments with thousands of connector-credential pairs.

The solution leverages existing patterns in the codebase for concurrent processing and follows the principle of preserving backward compatibility while fixing critical scalability issues.

Confidence score: 4/5

  • This PR addresses a well-defined PostgreSQL limitation with a proven batching strategy that should resolve stack depth issues
  • Score reflects solid implementation using existing concurrency patterns, though the hardcoded batch size of 1000 could benefit from configuration
  • Pay attention to the new parallel database session management in _get_document_counts_for_cc_pairs_batch to ensure proper connection handling

2 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

@Subash-Mohan Subash-Mohan force-pushed the fix/connector-page-stack-depth-limit branch from e95b450 to 94e25e4 Compare September 22, 2025 11:40
@Subash-Mohan Subash-Mohan merged commit 26e7bba into main Sep 22, 2025
53 of 55 checks passed
@Subash-Mohan Subash-Mohan deleted the fix/connector-page-stack-depth-limit branch September 22, 2025 13:53
Copy link

@waseembahralaseel-cell waseembahralaseel-cell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey

Copy link

@waseembahralaseel-cell waseembahralaseel-cell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ban

razvanMiu pushed a commit to eea/danswer that referenced this pull request Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants