You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(parser): support external Docling server via DOCLING_SERVER_URL (#13527)
### What problem does this PR solve?
This PR adds support for parsing PDFs through an external Docling
server, so RAGFlow can connect to remote `docling serve` deployments
instead of relying only on local in-process Docling.
It addresses the feature request in
[#13426](#13426) and aligns
with the external-server usage pattern already used by MinerU.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What is changed?
- Add external Docling server support in `DoclingParser`:
- Use `DOCLING_SERVER_URL` to enable remote parsing mode.
- Try `POST /v1/convert/source` first, and fallback to
`/v1alpha/convert/source`.
- Keep existing local Docling behavior when `DOCLING_SERVER_URL` is not
set.
- Wire Docling env settings into parser invocation paths:
- `rag/app/naive.py`
- `rag/flow/parser/parser.py`
- Add Docling env hints in constants and update docs:
- `docs/guides/dataset/select_pdf_parser.md`
- `docs/guides/agent/agent_component_reference/parser.md`
- `docs/faq.mdx`
### Why this approach?
This keeps the change focused on one issue and one capability (external
Docling connectivity), without introducing unrelated provider-model
plumbing.
### Validation
- Static checks:
- `python -m py_compile` on changed Python files
- `python -m ruff check` on changed Python files
- Functional checks:
- Remote v1 endpoint path works
- v1alpha fallback works
- Local Docling path remains available when server URL is unset
### Related links
- Feature request: [Support external Docling server (issue
#13426)](#13426)
- Compare view for this branch:
[main...feat/docling-server](https://github.com/infiniflow/ragflow/compare/main...spider-yamet:ragflow:feat/docling-server?expand=1)
##### Fixes [#13426](#13426)
Copy file name to clipboardExpand all lines: docs/faq.mdx
+18Lines changed: 18 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -567,6 +567,24 @@ RAGFlow supports MinerU's `vlm-http-client` backend, enabling you to delegate do
567
567
When using the `vlm-http-client` backend, the RAGFlow server requires no GPU, only network connectivity. This enables cost-effective distributed deployment with multiple RAGFlow instances sharing one remote vLLM server.
568
568
:::
569
569
570
+
### How to use an external Docling Serve server for document parsing?
571
+
572
+
RAGFlow supports Docling in two modes:
573
+
574
+
1.**Local Docling** (existing mode): install Docling in the RAGFlow runtime (`USE_DOCLING=true`) and parse in-process.
575
+
2.**External Docling Serve** (remote mode): point RAGFlow to a Docling Serve endpoint.
- When `DOCLING_SERVER_URL` is set, RAGFlow sends PDFs to Docling Serve using `/v1/convert/source` (and falls back to `/v1alpha/convert/source` for older servers).
586
+
- When `DOCLING_SERVER_URL` is not set, RAGFlow uses local in-process Docling.
587
+
570
588
### How to use PaddleOCR for document parsing?
571
589
572
590
From v0.24.0 onwards, RAGFlow includes PaddleOCR as an optional PDF parser. Please note that RAGFlow acts only as a *remote client* for PaddleOCR, calling the PaddleOCR API to parse PDFs and reading the returned files.
Copy file name to clipboardExpand all lines: docs/guides/agent/agent_component_reference/parser.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -65,6 +65,12 @@ Starting from v0.22.0, RAGFlow includes MinerU (≥ 2.6.3) as an optional PDF p
65
65
- If you decide to use a chunking method from the **Built-in** dropdown, ensure it supports PDF parsing, then select **MinerU** from the **PDF parser** dropdown.
66
66
- If you use a custom ingestion pipeline instead, select **MinerU** in the **PDF parser** section of the **Parser** component.
67
67
68
+
To use an external Docling Serve instance (instead of local in-process Docling), set:
69
+
70
+
-`DOCLING_SERVER_URL`: The Docling Serve API endpoint (for example, `http://docling-host:5001`).
71
+
72
+
When `DOCLING_SERVER_URL` is set, RAGFlow sends PDF content to Docling Serve (`/v1/convert/source`, with fallback to `/v1alpha/convert/source`) and ingests the returned markdown/text. If the variable is not set, RAGFlow keeps using local Docling (`USE_DOCLING=true` + installed package) behavior.
73
+
68
74
:::note
69
75
All MinerU environment variables are optional. When set, these values are used to auto-provision a MinerU OCR model for the tenant on first use. To avoid auto-provisioning, skip the environment variable settings and only configure MinerU from the **Model providers** page in the UI.
Copy file name to clipboardExpand all lines: docs/guides/dataset/select_pdf_parser.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -65,6 +65,12 @@ Starting from v0.22.0, RAGFlow includes MinerU (≥ 2.6.3) as an optional PDF p
65
65
- If you decide to use a chunking method from the **Built-in** dropdown, ensure it supports PDF parsing, then select **MinerU** from the **PDF parser** dropdown.
66
66
- If you use a custom ingestion pipeline instead, select **MinerU** in the **PDF parser** section of the **Parser** component.
67
67
68
+
To use an external Docling Serve instance (instead of local in-process Docling), set:
69
+
70
+
-`DOCLING_SERVER_URL`: The Docling Serve API endpoint (for example, `http://docling-host:5001`).
71
+
72
+
When `DOCLING_SERVER_URL` is set, RAGFlow sends PDF content to Docling Serve (`/v1/convert/source`, with fallback to `/v1alpha/convert/source`) and ingests the returned markdown/text. If the variable is not set, RAGFlow keeps using local Docling (`USE_DOCLING=true` + installed package) behavior.
73
+
68
74
:::note
69
75
All MinerU environment variables are optional. When set, these values are used to auto-provision a MinerU OCR model for the tenant on first use. To avoid auto-provisioning, skip the environment variable settings and only configure MinerU from the **Model providers** page in the UI.
0 commit comments