task executor issues by concertdictate · Pull Request #12006 · infiniflow/ragflow

concertdictate · 2025-12-17T15:44:31Z

What problem does this PR solve?

Fixes #8706 - InfinityException: TOO_MANY_CONNECTIONS when running multiple task executor workers

Problem Description

When running RAGFlow with 8-16 task executor workers, most workers fail to start properly. Checking logs revealed that workers were stuck/hanging during Infinity connection initialization - only 1-2 workers would successfully register in Redis while the rest remained blocked.

Root Cause

The Infinity SDK ConnectionPool pre-allocates all connections in __init__. With the default max_size=32 and multiple workers (e.g., 16), this creates 16×32=512 connections immediately on startup, exceeding Infinity's default 128 connection limit. Workers hang while waiting for connections that can never be established.

Changes

Prevent Infinity connection storm (rag/utils/infinity_conn.py, rag/svr/task_executor.py)
- Reduced ConnectionPool max_size from 32 to 4 (sufficient since operations are synchronous)
- Added staggered startup delay (2s per worker) to spread connection initialization
Handle None children_delimiter (rag/app/naive.py)
- Use or "" to handle explicitly set None values from parser config
MinerU parser robustness (deepdoc/parser/mineru_parser.py)
- Use .get() for optional output fields that may be missing
- Fix DISCARDED block handling: change pass to continue to skip discarded blocks entirely

Why `max_size=4` is sufficient

Workers	Pool Size	Total Connections	Infinity Limit
16	32	512	128 ❌
16	4	64	128 ✅
32	4	128	128 ✅

All RAGFlow operations are synchronous: get_conn() → operation → release_conn()
No parallel docStoreConn operations in the codebase
Maximum 1-2 concurrent connections needed per worker; 4 provides safety margin

MinerU DISCARDED block bug

When MinerU returns blocks with type: "discarded" (headers, footers, watermarks, page numbers, artifacts), the previous code used pass which left the section variable undefined, causing:

UnboundLocalError if DISCARDED is the first block
Duplicate content if DISCARDED follows another block (stale value from previous iteration)

Root cause confirmed via MinerU source code:

From mineru/utils/enum_class.py:

class BlockType:
    DISCARDED = 'discarded'
    # VLM 2.5+ also has: HEADER, FOOTER, PAGE_NUMBER, ASIDE_TEXT, PAGE_FOOTNOTE

Per MinerU documentation, discarded blocks contain content that should be filtered out for clean text extraction.

Fix: Changed pass to continue to skip discarded blocks entirely.

Testing

Verified all 16 workers now register successfully in Redis
All workers heartbeating correctly
Document parsing works as expected
MinerU parsing with DISCARDED blocks no longer crashes

Type of change

Bug Fix (non-breaking change which fixes an issue)

Use 'or ""' instead of default param to handle explicit None values

…simultaneously - Add staggered startup delay (2.0s × worker_num) to spread connection attempts - Reduce ConnectionPool max_size from 32 to 4 to stay within Infinity connection limit

When MinerU returns blocks with type=DISCARDED, the match/case falls through with 'pass' but still tries to use the 'section' variable which may be unbound (if DISCARDED is the first block) or stale from a previous iteration. Change 'pass' to 'continue' to skip the entire block processing for discarded content, preventing both UnboundLocalError and duplicate section entries.

yuzhichang

LGTM

### What problem does this PR solve? **Fixes infiniflow#8706** - `InfinityException: TOO_MANY_CONNECTIONS` when running multiple task executor workers ### Problem Description When running RAGFlow with 8-16 task executor workers, most workers fail to start properly. Checking logs revealed that workers were stuck/hanging during Infinity connection initialization - only 1-2 workers would successfully register in Redis while the rest remained blocked. ### Root Cause The Infinity SDK `ConnectionPool` pre-allocates all connections in `__init__`. With the default `max_size=32` and multiple workers (e.g., 16), this creates 16×32=512 connections immediately on startup, exceeding Infinity's default 128 connection limit. Workers hang while waiting for connections that can never be established. ### Changes 1. **Prevent Infinity connection storm** (`rag/utils/infinity_conn.py`, `rag/svr/task_executor.py`) - Reduced ConnectionPool `max_size` from 32 to 4 (sufficient since operations are synchronous) - Added staggered startup delay (2s per worker) to spread connection initialization 2. **Handle None children_delimiter** (`rag/app/naive.py`) - Use `or ""` to handle explicitly set None values from parser config 3. **MinerU parser robustness** (`deepdoc/parser/mineru_parser.py`) - Use `.get()` for optional output fields that may be missing - Fix DISCARDED block handling: change `pass` to `continue` to skip discarded blocks entirely ### Why `max_size=4` is sufficient | Workers | Pool Size | Total Connections | Infinity Limit | |---------|-----------|-------------------|----------------| | 16 | 32 | 512 | 128 ❌ | | 16 | 4 | 64 | 128 ✅ | | 32 | 4 | 128 | 128 ✅ | - All RAGFlow operations are synchronous: `get_conn()` → operation → `release_conn()` - No parallel `docStoreConn` operations in the codebase - Maximum 1-2 concurrent connections needed per worker; 4 provides safety margin ### MinerU DISCARDED block bug When MinerU returns blocks with `type: "discarded"` (headers, footers, watermarks, page numbers, artifacts), the previous code used `pass` which left the `section` variable undefined, causing: - **UnboundLocalError** if DISCARDED is the first block - **Duplicate content** if DISCARDED follows another block (stale value from previous iteration) **Root cause confirmed via MinerU source code:** From [`mineru/utils/enum_class.py`](https://github.com/opendatalab/MinerU/blob/main/mineru/utils/enum_class.py#L14): ```python class BlockType: DISCARDED = 'discarded' # VLM 2.5+ also has: HEADER, FOOTER, PAGE_NUMBER, ASIDE_TEXT, PAGE_FOOTNOTE ``` Per [MinerU documentation](https://opendatalab.github.io/MinerU/reference/output_files/), discarded blocks contain content that should be filtered out for clean text extraction. **Fix:** Changed `pass` to `continue` to skip discarded blocks entirely. ### Testing - Verified all 16 workers now register successfully in Redis - All workers heartbeating correctly - Document parsing works as expected - MinerU parsing with DISCARDED blocks no longer crashes ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: user210 <user210@rt>

user210 added 3 commits December 17, 2025 14:22

fix: handle None children_delimiter in naive parser

f9a4898

Use 'or ""' instead of default param to handle explicit None values

fix: prevent Infinity connection storm when running multiple workers …

11ab988

…simultaneously - Add staggered startup delay (2.0s × worker_num) to spread connection attempts - Reduce ConnectionPool max_size from 32 to 4 to stay within Infinity connection limit

fix(mineru): use .get() for optional output fields for better robustness

2586442

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. ♾️infinity Pull requests that‘s involved with infinity(DB) 🐞 bug Something isn't working, pull request that fix bug. labels Dec 17, 2025

concertdictate changed the title ~~Debug/task executor issues~~ task executor issues Dec 17, 2025

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Dec 17, 2025

concertdictate force-pushed the debug/task-executor-issues branch from 8017f1e to fcfc53c Compare December 17, 2025 16:42

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Dec 17, 2025

KevinHuSh added the ci Continue Integration label Dec 18, 2025

KevinHuSh requested review from yongtenglei and yuzhichang December 18, 2025 01:35

KevinHuSh marked this pull request as draft December 18, 2025 01:38

KevinHuSh marked this pull request as ready for review December 18, 2025 01:38

yuzhichang approved these changes Dec 18, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 18, 2025

yongtenglei approved these changes Dec 18, 2025

View reviewed changes

KevinHuSh merged commit 4dd8cdc into infiniflow:main Dec 18, 2025
2 checks passed

concertdictate deleted the debug/task-executor-issues branch December 18, 2025 07:25

prpercival mentioned this pull request Apr 15, 2026

[Bug]: GraphRAG resolution fails with TOO_MANY_CONNECTIONS (Infinity ErrorCode 5003) during dataset-scope operations #14137

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

task executor issues#12006

task executor issues#12006
KevinHuSh merged 4 commits intoinfiniflow:mainfrom
concertdictate:debug/task-executor-issues

concertdictate commented Dec 17, 2025 •

edited

Loading

Uh oh!

yuzhichang left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

concertdictate commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Problem Description

Root Cause

Changes

Why max_size=4 is sufficient

MinerU DISCARDED block bug

Testing

Type of change

Uh oh!

yuzhichang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

concertdictate commented Dec 17, 2025 •

edited

Loading

Why `max_size=4` is sufficient