Skip to content

task executor issues#12006

Merged
KevinHuSh merged 4 commits intoinfiniflow:mainfrom
concertdictate:debug/task-executor-issues
Dec 18, 2025
Merged

task executor issues#12006
KevinHuSh merged 4 commits intoinfiniflow:mainfrom
concertdictate:debug/task-executor-issues

Conversation

@concertdictate
Copy link
Copy Markdown
Contributor

@concertdictate concertdictate commented Dec 17, 2025

What problem does this PR solve?

Fixes #8706 - InfinityException: TOO_MANY_CONNECTIONS when running multiple task executor workers

Problem Description

When running RAGFlow with 8-16 task executor workers, most workers fail to start properly. Checking logs revealed that workers were stuck/hanging during Infinity connection initialization - only 1-2 workers would successfully register in Redis while the rest remained blocked.

Root Cause

The Infinity SDK ConnectionPool pre-allocates all connections in __init__. With the default max_size=32 and multiple workers (e.g., 16), this creates 16×32=512 connections immediately on startup, exceeding Infinity's default 128 connection limit. Workers hang while waiting for connections that can never be established.

Changes

  1. Prevent Infinity connection storm (rag/utils/infinity_conn.py, rag/svr/task_executor.py)

    • Reduced ConnectionPool max_size from 32 to 4 (sufficient since operations are synchronous)
    • Added staggered startup delay (2s per worker) to spread connection initialization
  2. Handle None children_delimiter (rag/app/naive.py)

    • Use or "" to handle explicitly set None values from parser config
  3. MinerU parser robustness (deepdoc/parser/mineru_parser.py)

    • Use .get() for optional output fields that may be missing
    • Fix DISCARDED block handling: change pass to continue to skip discarded blocks entirely

Why max_size=4 is sufficient

Workers Pool Size Total Connections Infinity Limit
16 32 512 128 ❌
16 4 64 128 ✅
32 4 128 128 ✅
  • All RAGFlow operations are synchronous: get_conn() → operation → release_conn()
  • No parallel docStoreConn operations in the codebase
  • Maximum 1-2 concurrent connections needed per worker; 4 provides safety margin

MinerU DISCARDED block bug

When MinerU returns blocks with type: "discarded" (headers, footers, watermarks, page numbers, artifacts), the previous code used pass which left the section variable undefined, causing:

  • UnboundLocalError if DISCARDED is the first block
  • Duplicate content if DISCARDED follows another block (stale value from previous iteration)

Root cause confirmed via MinerU source code:

From mineru/utils/enum_class.py:

class BlockType:
    DISCARDED = 'discarded'
    # VLM 2.5+ also has: HEADER, FOOTER, PAGE_NUMBER, ASIDE_TEXT, PAGE_FOOTNOTE

Per MinerU documentation, discarded blocks contain content that should be filtered out for clean text extraction.

Fix: Changed pass to continue to skip discarded blocks entirely.

Testing

  • Verified all 16 workers now register successfully in Redis
  • All workers heartbeating correctly
  • Document parsing works as expected
  • MinerU parsing with DISCARDED blocks no longer crashes

Type of change

  • Bug Fix (non-breaking change which fixes an issue)

user210 added 3 commits December 17, 2025 14:22
Use 'or ""' instead of default param to handle explicit None values
…simultaneously

- Add staggered startup delay (2.0s × worker_num) to spread connection attempts
- Reduce ConnectionPool max_size from 32 to 4 to stay within Infinity connection limit
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. ♾️infinity Pull requests that‘s involved with infinity(DB) 🐞 bug Something isn't working, pull request that fix bug. labels Dec 17, 2025
@concertdictate concertdictate changed the title Debug/task executor issues task executor issues Dec 17, 2025
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Dec 17, 2025
When MinerU returns blocks with type=DISCARDED, the match/case falls through
with 'pass' but still tries to use the 'section' variable which may be unbound
(if DISCARDED is the first block) or stale from a previous iteration.

Change 'pass' to 'continue' to skip the entire block processing for discarded
content, preventing both UnboundLocalError and duplicate section entries.
@concertdictate concertdictate force-pushed the debug/task-executor-issues branch from 8017f1e to fcfc53c Compare December 17, 2025 16:42
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Dec 17, 2025
@KevinHuSh KevinHuSh added the ci Continue Integration label Dec 18, 2025
@KevinHuSh KevinHuSh marked this pull request as draft December 18, 2025 01:38
@KevinHuSh KevinHuSh marked this pull request as ready for review December 18, 2025 01:38
Copy link
Copy Markdown
Member

@yuzhichang yuzhichang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 18, 2025
@KevinHuSh KevinHuSh merged commit 4dd8cdc into infiniflow:main Dec 18, 2025
2 checks passed
@concertdictate concertdictate deleted the debug/task-executor-issues branch December 18, 2025 07:25
clifftseng pushed a commit to clifftseng/ragflow that referenced this pull request Feb 9, 2026
### What problem does this PR solve?

**Fixes infiniflow#8706** - `InfinityException: TOO_MANY_CONNECTIONS` when running
multiple task executor workers

### Problem Description

When running RAGFlow with 8-16 task executor workers, most workers fail
to start properly. Checking logs revealed that workers were
stuck/hanging during Infinity connection initialization - only 1-2
workers would successfully register in Redis while the rest remained
blocked.

### Root Cause

The Infinity SDK `ConnectionPool` pre-allocates all connections in
`__init__`. With the default `max_size=32` and multiple workers (e.g.,
16), this creates 16×32=512 connections immediately on startup,
exceeding Infinity's default 128 connection limit. Workers hang while
waiting for connections that can never be established.

### Changes

1. **Prevent Infinity connection storm** (`rag/utils/infinity_conn.py`,
`rag/svr/task_executor.py`)
- Reduced ConnectionPool `max_size` from 32 to 4 (sufficient since
operations are synchronous)
- Added staggered startup delay (2s per worker) to spread connection
initialization

2. **Handle None children_delimiter** (`rag/app/naive.py`)
   - Use `or ""` to handle explicitly set None values from parser config

3. **MinerU parser robustness** (`deepdoc/parser/mineru_parser.py`)
   - Use `.get()` for optional output fields that may be missing
- Fix DISCARDED block handling: change `pass` to `continue` to skip
discarded blocks entirely

### Why `max_size=4` is sufficient

| Workers | Pool Size | Total Connections | Infinity Limit |
|---------|-----------|-------------------|----------------|
| 16      | 32        | 512               | 128 ❌         |
| 16      | 4         | 64                | 128 ✅         |
| 32      | 4         | 128               | 128 ✅         |

- All RAGFlow operations are synchronous: `get_conn()` → operation →
`release_conn()`
- No parallel `docStoreConn` operations in the codebase
- Maximum 1-2 concurrent connections needed per worker; 4 provides
safety margin

### MinerU DISCARDED block bug

When MinerU returns blocks with `type: "discarded"` (headers, footers,
watermarks, page numbers, artifacts), the previous code used `pass`
which left the `section` variable undefined, causing:

- **UnboundLocalError** if DISCARDED is the first block
- **Duplicate content** if DISCARDED follows another block (stale value
from previous iteration)

**Root cause confirmed via MinerU source code:**

From
[`mineru/utils/enum_class.py`](https://github.com/opendatalab/MinerU/blob/main/mineru/utils/enum_class.py#L14):
```python
class BlockType:
    DISCARDED = 'discarded'
    # VLM 2.5+ also has: HEADER, FOOTER, PAGE_NUMBER, ASIDE_TEXT, PAGE_FOOTNOTE
```

Per [MinerU
documentation](https://opendatalab.github.io/MinerU/reference/output_files/),
discarded blocks contain content that should be filtered out for clean
text extraction.

**Fix:** Changed `pass` to `continue` to skip discarded blocks entirely.

### Testing

- Verified all 16 workers now register successfully in Redis
- All workers heartbeating correctly
- Document parsing works as expected
- MinerU parsing with DISCARDED blocks no longer crashes

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: user210 <user210@rt>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Something isn't working, pull request that fix bug. ci Continue Integration ♾️infinity Pull requests that‘s involved with infinity(DB) lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: InfinityException: (<ErrorCode.TOO_MANY_CONNECTIONS: 5003>, 'Try 10 times, but still failed')

4 participants