[Bug]: GraphRAG resolution fails with `TOO_MANY_CONNECTIONS` (Infinity ErrorCode 5003) during dataset-scope operations

### Self Checks

- [x] I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
- [x] I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Please do not modify this template :) and fill in all the required fields.

### RAGFlow workspace code commit ID

5923365ef987fb328c977d07c788bc775732bc64bc96140e643a588ddd68ff92

### RAGFlow image version

v0.24.0-456-g52442c8eb

### Other environment information

```Markdown
- **Hardware**: Kubernetes (bare-metal, Talos Linux), 8 CPU / 16 GiB RAM allocated to Infinity StatefulSet
- **OS type**: Talos Linux (Kubernetes nodes), containers on Debian-based images
- **Document engine**: Infinity v0.7.0-dev5 (`infiniflow/infinity:v0.7.0-dev5`)
- **LLM**: Azure OpenAI (gpt-5.4), Embedding: text-embedding-3-large
- **Dataset**: 28 documents (JSON schemas, ~584 bytes to multi-MB), KB ID `774ce450198711f1b748d19e18b9e406`
- **Infinity config**: `connection_pool_size = 512` (bumped from default 128)
```

### Actual behavior

When running dataset-scope GraphRAG with `resolution: true` and `community: true` across 28 documents, the graph resolution step fails with:

```
20:31:59 Resolved 5694 candidate pairs, 333 of them are selected to merge.
20:32:31 Graph resolution removed 294 nodes and 1808 edges.
20:32:31 Graph resolution updated pagerank.
20:32:31 [ERROR][Exception]: (<ErrorCode.TOO_MANY_CONNECTIONS: 5003>, 'Try 10 times, but still failed')
```

The error occurs consistently at the end of graph resolution, after the merge/delete/pagerank operations complete. The resolution itself succeeds (294 nodes removed, 1808 edges removed, pagerank updated), but the subsequent Infinity write operations (likely the community detection step or final graph persist) exhaust the connection pool.

**Prometheus metrics** from the Infinity pod during the failure:

| Time (CDT) | Infinity CPU (cores) | Notes |
|---|---|---|
| 18:02 | **8.0** | Peak CPU — 2x the 4-core limit at the time, severe CFS throttling |
| 20:12 | **5.8** | Still saturated during graph resolution |
| 20:27 | 2.7 | Resolution winding down |
| **20:32** | **0.001** | Infinity goes idle — RAGFlow client gave up after TOO_MANY_CONNECTIONS |

The root cause appears to be Kubernetes CFS throttling of the Infinity container compounding with the bursty nature of graph resolution writes. When Infinity is CPU-throttled, connections queue up server-side waiting for CPU time slices. The resolution burst (333 merges → 294 node deletes + 1808 edge deletes + pagerank update) causes queued connections to exceed the `connection_pool_size` limit.

**Key observations:**
- The RAGFlow client-side connection pool (`infinity_conn_pool.py`) is hardcoded to `max_size=4` (initial) / `max_size=32` (on refresh) — there is no user-configurable option to increase this
- The server-side `connection_pool_size` of 512 was already 4x the default of 128, but was still insufficient
- The error retries 10 times before giving up, suggesting the connection backlog persists for an extended period

### Expected behavior

Graph resolution and community detection should complete successfully without hitting connection limits, or should gracefully handle connection exhaustion (e.g., exponential backoff with longer retry windows, or serializing the write operations to reduce concurrent connection demand).


### Steps to reproduce

```Markdown
1. Create a knowledge base with 28+ documents (JSON schema files work well to reproduce)
2. Enable GraphRAG with `resolution: true` and `community: true` at the dataset level
3. Enable RAPTOR with `scope: dataset`
4. Parse all documents
5. Trigger dataset-scope GraphRAG (resolution + community)
6. Observe the Knowledge Graph progress panel: resolution resolves ~5000+ candidate pairs, then fails with `TOO_MANY_CONNECTIONS: 5003`
7. The Infinity `connection_pool_size` is 512 (4x default)
```

### Additional information

**Workaround:** Increase Infinity CPU limits to 8 cores and increase `connection_pool_size` to 2048. This gives Infinity enough CPU headroom to process connections without CFS throttling causing stale connection pile-up.

**Feature request:** Expose the RAGFlow-side Infinity client connection pool size (`max_size` in `ConnectionPool()`) as an environment variable so operators can tune it independently of the server-side pool. Currently it is hardcoded in `common/doc_store/infinity_conn_pool.py`:
```python
conn_pool = ConnectionPool(self.infinity_uri, max_size=4)     # initial
self.conn_pool = ConnectionPool(self.infinity_uri, max_size=32)  # on refresh
```

**Related issue:** [#8706](https://github.com/infiniflow/ragflow/issues/8706), [#12006](https://github.com/infiniflow/ragflow/pull/12006)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: GraphRAG resolution fails with `TOO_MANY_CONNECTIONS` (Infinity ErrorCode 5003) during dataset-scope operations #14137

Self Checks

RAGFlow workspace code commit ID

RAGFlow image version

Other environment information

Actual behavior

Expected behavior

Steps to reproduce

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Time (CDT)	Infinity CPU (cores)	Notes
18:02	8.0	Peak CPU — 2x the 4-core limit at the time, severe CFS throttling
20:12	5.8	Still saturated during graph resolution
20:27	2.7	Resolution winding down
20:32	0.001	Infinity goes idle — RAGFlow client gave up after TOO_MANY_CONNECTIONS

[Bug]: GraphRAG resolution fails with TOO_MANY_CONNECTIONS (Infinity ErrorCode 5003) during dataset-scope operations #14137

Description

Self Checks

RAGFlow workspace code commit ID

RAGFlow image version

Other environment information

Actual behavior

Expected behavior

Steps to reproduce

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: GraphRAG resolution fails with `TOO_MANY_CONNECTIONS` (Infinity ErrorCode 5003) during dataset-scope operations #14137