Feature/generic api connector by ahmadintisar · Pull Request #13545 · infiniflow/ragflow

ahmadintisar · 2026-03-11T12:23:30Z

feat: Add Generic REST API Connector

What problem does this PR solve?

RAGFlow supports many specific data source connectors (MySQL, Slack, Google Drive, etc.), but there was no way to connect an arbitrary REST API as a data source. Users with custom or third-party APIs had to write a new connector class for each one.

This PR adds a generic, configuration-driven REST API connector that lets users connect any REST API as a data source entirely through the UI — no code changes needed per API.

Features

Core Connector (`common/data_source/rest_api_connector.py`)

Implements LoadConnector and PollConnector interfaces for full and incremental sync
Configurable authentication: None, API Key (custom header), Bearer Token, Basic Auth
Pluggable pagination: Page-based, Offset-based, Cursor-based, or None
Smart page-size inference from user's query parameters to avoid duplicate/conflicting params
Configurable request delay between pages to prevent API rate limiting
Auto-detection of the items array in JSON responses (items, results, data, records, or first list found)
Advanced field mapping with dot-notation (country.name), array wildcards (newsType[*].name), type hints, and default values
Optional content template rendering ("Title: {title}\nBody: {body}")
HTML stripping for content fields
Stable document IDs via hash128 from a configurable ID field or auto-generated from item content
Pydantic configuration schema with automatic coercion of UI string inputs to dicts/lists

Backend Registration (`rag/svr/sync_data_source.py`, `common/constants.py`, `common/data_source/config.py`)

REST_API sync class wired into RAGFlow's func_factory
Full sync (load_from_state) and incremental polling (poll_source) support
Credentials and config passed from task to connector following existing patterns (MySQL, SeaFile, etc.)

Test Connection Endpoint (`api/apps/connector_app.py`)

POST /v1/connector/<id>/test validates config schema, authentication, and API connectivity without triggering a sync
Clear error messages for auth failures vs. config issues

Frontend UI (`web/src/pages/user-setting/data-source/constant/`)

Postman-style configuration: Base URL, Query Parameters (key=value per line), Auth, Content Fields, Metadata Fields, Pagination Type
Auth-type-aware form: fields for API key header/value, Bearer token, or Basic username/password appear only when relevant
Advanced Settings toggle for: Custom Headers, Max Pages, Request Delay, Poll Timestamp Field, Request Body (POST)
Connector icon (SVG) and i18n strings (English)
"Test Connection" button to validate before syncing

Controls & Safety

Configurable max pages safety cap (default: 1000, adjustable in UI)
Configurable request delay between pages (default: 0.5s, adjustable in UI)
Auth errors (401/403) fail immediately without retries; transient errors retry with exponential backoff
Diagnostic logging: auth setup confirmation, request details on failure, content field extraction status

Type of change

New Feature (non-breaking change which adds functionality)

##Visual Screenshots of Features

(Connector can be configured within the external data sources tab)

Configuration Parameters:

Connection can be tested before attaching to dataset:

Ingestion tested with API connector (works perfectly fine):

Search & Retrieval works as well with metadata flow:

…ion with config

… func_factory

…array join, type hints, defaults, templates)

Separate base URL from query params to prevent pagination from injecting duplicate keys (e.g. &page=1&page=1). Add a dedicated query_params field (key=value per line, like Postman) so users no longer embed params in the URL.

…hint bug

codecov · 2026-03-12T06:31:52Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.52%. Comparing base (af40be6) to head (1ac432c).

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #13545   +/-   ##
=======================================
  Coverage   96.52%   96.52%           
=======================================
  Files          10       10           
  Lines         690      690           
  Branches      108      108           
=======================================
  Hits          666      666           
  Misses          8        8           
  Partials       16       16

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ahmadintisar · 2026-03-15T12:43:47Z

@Magicbook1108 Hi, Can you please review the PR?

Magicbook1108 · 2026-03-16T11:35:43Z

Hello, I tried to list dataset from ragflow. Can you help me with this issue?

ahmadintisar · 2026-03-16T12:34:42Z

@Magicbook1108
Thanks for testing! I reviewed your attached screenshots.

The issue is with the field mapping configuration. The RAGFlow datasets API returns items in a data array, where code is just the status code (0 = success), not a content field. That's why your documents show as None.txt with 0 bytes — the connector couldn't find matching fields in the response items.

Suggested Fix

Try this configuration:

Field	Value
Content Fields	`name` (or `name,description`) — these should be the fields containing the actual text content you want to vectorize
Metadata Fields	`chunk_num,document_count`
ID Field	`id` (under Advanced Settings)

The connector auto-detects the items array from the response (data, results, stories, etc.).

Reference Example

Here's how I configured it for a news API that returns this structure:

Sample API Response

```json { "totalStories": 20, "stories": [ { "id": 27856, "title": "رئيس الوزراء العراقي...", "content": "

رئيس الوزراء العراقي: استمرار الصراع...

", "published": "2026-03-15T15:47:13+03:00", "category": "أخبار", "country": { "id": "108", "name": "العراق", "ISO2": "IQ" }, "newsType": [{ "name": "عاجل" }], "mediaType": "نص", "source": "تيليجرام" } ] } ```

My configuration:

Field	Value
Content Fields	`title,content`
Metadata Fields	`category,country.name,country.ISO2,mediaType,source,newsType[*].name`
ID Field	`id` (under Advanced Settings)
Pagination Type	Page-based (`?page=1&story_per_page=20`)
Poll Timestamp Field	`published` (under Advanced Settings, for incremental sync)

Note: The connector supports dot-notation for nested fields (country.name) and array wildcards (newsType[*].name) to extract values from nested objects and arrays.

How I Can Help

Could you share the JSON structure of your API response?. I can then tell you exactly which fields to map for Content Fields, Metadata Fields, and ID Field.

Also, if your API returns paginated results, make sure to set the Pagination Type accordingly (Page-based, Offset-based, or Cursor-based) — you currently have it set to None.

Copilot

Pull request overview

This PR adds a generic, configuration-driven REST API connector that allows users to connect any REST API as a data source through the UI without code changes. It implements full and incremental sync, configurable authentication, pagination, and field mapping.

Changes:

New RestAPIConnector class with support for multiple auth types, pagination strategies, and flexible field extraction
Backend registration in sync_data_source.py and a test-connection endpoint in connector_app.py
Frontend UI forms, i18n strings, and SVG icon for the REST API data source

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
common/data_source/rest_api_connector.py	Core connector implementation with auth, pagination, field mapping
common/data_source/config.py	Adds `REST_API` to `DocumentSource` enum
common/data_source/init.py	Exports `RestAPIConnector`
common/constants.py	Adds `REST_API` to `FileSource` enum
rag/svr/sync_data_source.py	Wires `REST_API` sync class into the factory
api/apps/connector_app.py	Test connection endpoint
web/src/pages/user-setting/data-source/constant/index.tsx	Form fields, defaults, and data source info for REST API
web/src/pages/user-setting/data-source/hooks.ts	`useTestDataSource` hook
web/src/pages/user-setting/data-source/data-source-detail-page/index.tsx	Test Connection button
web/src/services/data-source-service.ts	`testDataSource` service call
web/src/utils/api.ts	Test endpoint URL
web/src/locales/en.ts	English i18n strings
web/src/assets/svg/data-source/rest-api.svg	Connector icon

You can also share your feedback on Copilot code review. Take the survey.

common/data_source/rest_api_connector.py

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…onError immediately without retries

…onnector.py

ahmadintisar · 2026-03-17T14:43:15Z

@yingfeng @Magicbook1108

All comments have been resolved, ci also working great. Feel free to review :)

GitHub Actions runners time out cloning infinity resources from Gitee. Default Dockerfile already uses GitHub; workflows were overriding with NEED_MIRROR=1. Set NEED_MIRROR=0 in tests and release workflows.

ahmadintisar · 2026-03-21T18:21:32Z

CI failure is unrelated to this PR — the Build ragflow:nightly step fails on apt install with a 502 Bad Gateway from archive.ubuntu.com while fetching libglvnd0. This is a transient Ubuntu mirror issue. Ruff and Go build steps pass. @Magicbook1108 Could you re-run the job when the mirror is back?

Magicbook1108 · 2026-03-23T02:43:05Z

We will restart ci for you, please revert this changes.

ahmadintisar · 2026-03-24T15:07:10Z

@Magicbook1108 I have reverted the changes for CI. Please restart the CI!

ahmadintisar · 2026-03-26T08:16:10Z

@Magicbook1108 @yingfeng Could you please complete the review?

ahmadintisar · 2026-03-31T00:43:05Z

@Magicbook1108 @yingfeng Could you please complete the review?

!!!

Ahmad Intisar and others added 29 commits March 9, 2026 15:01

API connector backend class

4164e0b

Merge branch 'infiniflow:main' into feature/generic-api-connector

95f81d2

Added config schema, config validation entrypoint, connector integrat…

fdb5d31

…ion with config

Enums updated, connector exported, sync integration and registered in…

172bde6

… func_factory

Web enums integration for rest api

dfb307f

added API validation

f937c29

Added pagination helpers for different methods

9d1bf56

support advanced field mapping for REST API connector (nested paths, …

4bfe52e

…array join, type hints, defaults, templates)

Metadata flow

af27517

Resolve URL param duplication and simplify connector config

f75f2ce

Separate base URL from query params to prevent pagination from injecting duplicate keys (e.g. &page=1&page=1). Add a dedicated query_params field (key=value per line, like Postman) so users no longer embed params in the URL.

Remove hardcoded params for api testing, refactor code

37c595e

Merge branch 'infiniflow:main' into feature/generic-api-connector

32025a5

Merge branch 'infiniflow:main' into feature/generic-api-connector

7dd024c

Add REST API connector SVG icon

383703d

Add query parameters in dictionary for pydantic validation

3ca44ec

Merge branch 'infiniflow:main' into feature/generic-api-connector

5110814

Converted helper methods to static

e8dfda3

fix: align REST API connector with codebase conventions and fix date …

55db599

…hint bug

Merge branch 'infiniflow:main' into feature/generic-api-connector

6404543

Max pages safety variable changed to constant

d405da9

Merge branch 'infiniflow:main' into feature/generic-api-connector

ff00a61

Import added

604991e

Merge branch 'infiniflow:main' into feature/generic-api-connector

589f987

Merge branch 'infiniflow:main' into feature/generic-api-connector

eee391c

Added logs for api connection

41c4c0d

New _resolve_page_size() method with smart fallback priority

a927feb

Added rate limiting config in UI

57c4826

Merge branch 'infiniflow:main' into feature/generic-api-connector

34a40b5

removed unwanted config parameters

7549f1a

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Mar 11, 2026

yingfeng added the ci Continue Integration label Mar 12, 2026

yingfeng marked this pull request as draft March 12, 2026 05:38

yingfeng marked this pull request as ready for review March 12, 2026 05:38

yingfeng requested a review from Magicbook1108 March 12, 2026 05:57

yingfeng requested a review from Copilot March 17, 2026 09:09

Copilot started reviewing on behalf of yingfeng March 17, 2026 09:09 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

common/data_source/rest_api_connector.py Show resolved Hide resolved

common/data_source/rest_api_connector.py Show resolved Hide resolved

common/data_source/rest_api_connector.py Show resolved Hide resolved

dosubot bot mentioned this pull request Mar 17, 2026

[Question]: [Question] How to implement text-to-image generation in RAGFlow v23.0? #13653

Open

4 tasks

ahmadintisar and others added 5 commits March 17, 2026 18:55

Add SSRF protection

7550442

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Any 4xx error (except 401, 403, and 429) now raises ConnectorValidati…

bb55443

…onError immediately without retries

Unit tests for Rest API Connector (Data Source)

56d2065

Merge branch 'main' into feature/generic-api-connector

dd1fe6f

Fix ruff linting errors in test/unit_test/data_source/test_rest_api_c…

1ae285d

…onnector.py

ahmadintisar and others added 4 commits March 19, 2026 20:21

Merge branch 'infiniflow:main' into feature/generic-api-connector

7cef3c0

Merge branch 'main' into feature/generic-api-connector

0b1594b

ci: use NEED_MIRROR=0 in Docker builds (avoid Gitee timeouts)

2b3b620

GitHub Actions runners time out cloning infinity resources from Gitee. Default Dockerfile already uses GitHub; workflows were overriding with NEED_MIRROR=1. Set NEED_MIRROR=0 in tests and release workflows.

Merge branch 'main' into feature/generic-api-connector

3248d7b

yingfeng and others added 2 commits March 24, 2026 22:30

Merge branch 'main' into feature/generic-api-connector

1802046

Reverted CI changes

3240497

Merge branch 'infiniflow:main' into feature/generic-api-connector

1ac432c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/generic api connector#13545

Feature/generic api connector#13545
ahmadintisar wants to merge 43 commits intoinfiniflow:mainfrom
Attili-sys:feature/generic-api-connector

ahmadintisar commented Mar 11, 2026 •

edited by JinHai-CN

Loading

Uh oh!

codecov bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

ahmadintisar commented Mar 15, 2026

Uh oh!

Magicbook1108 commented Mar 16, 2026 •

edited

Loading

Uh oh!

ahmadintisar commented Mar 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahmadintisar commented Mar 17, 2026

Uh oh!

ahmadintisar commented Mar 21, 2026

Uh oh!

Magicbook1108 commented Mar 23, 2026

Uh oh!

ahmadintisar commented Mar 24, 2026

Uh oh!

ahmadintisar commented Mar 26, 2026

Uh oh!

ahmadintisar commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ahmadintisar commented Mar 11, 2026 • edited by JinHai-CN Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat: Add Generic REST API Connector

What problem does this PR solve?

Features

Core Connector (common/data_source/rest_api_connector.py)

Backend Registration (rag/svr/sync_data_source.py, common/constants.py, common/data_source/config.py)

Test Connection Endpoint (api/apps/connector_app.py)

Frontend UI (web/src/pages/user-setting/data-source/constant/)

Controls & Safety

Type of change

Uh oh!

codecov bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ahmadintisar commented Mar 15, 2026

Uh oh!

Magicbook1108 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahmadintisar commented Mar 16, 2026

Suggested Fix

Reference Example

How I Can Help

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahmadintisar commented Mar 17, 2026

Uh oh!

ahmadintisar commented Mar 21, 2026

Uh oh!

Magicbook1108 commented Mar 23, 2026

Uh oh!

ahmadintisar commented Mar 24, 2026

Uh oh!

ahmadintisar commented Mar 26, 2026

Uh oh!

ahmadintisar commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ahmadintisar commented Mar 11, 2026 •

edited by JinHai-CN

Loading

Core Connector (`common/data_source/rest_api_connector.py`)

Backend Registration (`rag/svr/sync_data_source.py`, `common/constants.py`, `common/data_source/config.py`)

Test Connection Endpoint (`api/apps/connector_app.py`)

Frontend UI (`web/src/pages/user-setting/data-source/constant/`)

codecov bot commented Mar 12, 2026 •

edited

Loading

Magicbook1108 commented Mar 16, 2026 •

edited

Loading