Refactor: Consolidation WEB API & HTTP API for document infos by xugangqiang · Pull Request #14239 · infiniflow/ragflow

xugangqiang · 2026-04-21T01:34:20Z

What problem does this PR solve?

Before consolidation
Web API: POST /v1/document/infos
Http API - GET /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- GET /api/v1/datasets/<dataset_id>/documents?ids=id1&ids=id2

Type of change

Refactoring

coderabbitai · 2026-04-21T01:34:26Z

📝 Walkthrough

Walkthrough

This PR consolidates POST /filter and /infos document endpoints into a unified GET /datasets/{dataset_id}/documents endpoint. It updates the document service signature to use a single doc_ids parameter, refactors the API layer to conditionally return filter aggregates or document lists based on query parameters, and updates corresponding SDK, test, and frontend code to use the new endpoint patterns.

Changes

Cohort / File(s)	Summary
Backend API Consolidation `api/apps/document_app.py`, `api/apps/restful_apis/document_api.py`	Removed POST `/filter` and `/infos` handlers from `document_app.py`. Added conditional response logic to `list_docs` to return aggregated filter data when `type=filter` query parameter is set, with new `_aggregate_filters` helper function for metadata/status/suffix counters.
Service Layer Refactoring `api/db/services/document_service.py`	Updated `DocumentService.get_by_kb_id` signature to replace separate `doc_id` and `doc_ids_filter` parameters with unified `doc_ids` parameter, simplifying the filtering interface.
SDK Dataset Module `sdk/python/ragflow_sdk/modules/dataset.py`	Added `ids` parameter to `DataSet.list_documents` alongside existing `id`, with validation rejecting simultaneous use of both parameters.
SDK Test Utilities `sdk/python/test/test_frontend_api/common.py`, `sdk/python/test/test_frontend_api/test_chunk.py`	Updated `get_docs_info` to accept dataset context and support both single/multi document ID parameters, switching from POST `/document/infos` to GET `/datasets/{dataset_id}/documents`. Updated test call sites accordingly.
Backend Integration Tests `test/testcases/test_web_api/test_common.py`	Updated `list_documents`, `document_filter`, and `document_infos` test helpers to use new GET endpoint signatures with dataset scoping and query parameter passing.
Backend Integration Tests (Metadata) `test/testcases/test_web_api/test_document_app/test_document_metadata.py`	Updated test assertions and call sites to match new endpoint signatures; removed tests directly exercising removed `get_filter()` and `doc_infos()` handlers.
Frontend Hooks & Services `web/src/hooks/use-document-request.ts`, `web/src/services/knowledge-service.ts`, `web/src/utils/api.ts`	Removed `documentInfos` endpoint mapping. Updated `documentFilter` to use GET request via new `getDatasetFilter(datasetId)` function returning dynamic URL instead of static string.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Refactor: Consolidation WEB API & HTTP API for document list_docs #14176: Directly related refactoring of the same document-listing endpoint consolidation and service signature changes.
Refactor: Standardize naming convention to camelCase #14079: Updates the same frontend API mappings and service call sites for document endpoints (getDatasetFilter/documentInfos).
Consolidate set_meta into update_document #14045: Modifies the same api/apps/document_app.py file by removing/reshaping document HTTP endpoints.

Suggested labels

☯️ refactor, size:XL, 🐖api

Suggested reviewers

yuzhichang
JinHai-CN
yingfeng

Poem

🐰 Endpoints hop and consolidate with grace,
Filter and docs find their rightful place,
GET requests dancing where POST once stood,
One unified path—now that's understood! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 21.74% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main refactoring objective: consolidating Web API and HTTP API endpoints for document information retrieval into a unified RESTful endpoint.
Description check	✅ Passed	The description provides a clear before/after comparison of the API consolidation. However, the 'Type of change' section shows unchecked boxes and only 'Refactoring' is mentioned without explicit confirmation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

sdk/python/test/test_frontend_api/test_chunk.py (1)

58-63: ⚠️ Potential issue | 🟠 Major

Poll res["data"]["docs"], not res["data"].

The migrated endpoint now returns data = {"total": ..., "docs": [...]}. Iterating res["data"] walks object keys, so doc_info["progress"] fails instead of checking document completion.

💡 Minimal fix

-        for doc_info in res['data']:
+        for doc_info in res['data']['docs']:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/python/test/test_frontend_api/test_chunk.py` around lines 58 - 63, The
test currently iterates over res (from get_docs_info) as for doc_info in
res['data'] which walks object keys and makes doc_info['progress'] fail; change
the loop to iterate the docs array returned by the migrated endpoint
(res['data']['docs']) — e.g. read docs = res['data']['docs'] or loop for
doc_info in res['data']['docs'] when checking doc_info['progress'] compared to
doc_count/doc_id_list in the test using get_docs_info.

sdk/python/ragflow_sdk/modules/dataset.py (1)

83-98: ⚠️ Potential issue | 🔴 Critical

params.append() will fail—params is a dict, not a list.

When ids is provided, the code calls params.append(("ids", doc_id)) on a dict object, raising AttributeError before the request is sent. The requests library accepts a list of tuples for repeated query parameters. Additionally, the new ids flow lacks logging, which the coding guidelines require for new flows in Python files.

Fix

-        params = {
-            "id": id,
-            "name": name,
-            "keywords": keywords,
-            "page": page,
-            "page_size": page_size,
-            "orderby": orderby,
-            "desc": desc,
-            "create_time_from": create_time_from,
-            "create_time_to": create_time_to,
-        }
+        params = [
+            (k, v)
+            for k, v in {
+                "id": id,
+                "name": name,
+                "keywords": keywords,
+                "page": page,
+                "page_size": page_size,
+                "orderby": orderby,
+                "desc": desc,
+                "create_time_from": create_time_from,
+                "create_time_to": create_time_to,
+            }.items()
+            if v is not None
+        ]
         # Handle ids parameter - convert to multiple query params
         if ids:
             for doc_id in ids:
                 params.append(("ids", doc_id))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/python/ragflow_sdk/modules/dataset.py` around lines 83 - 98, The params
variable is a dict so params.append will raise an AttributeError; change the
implementation to build query params as a list of tuples when ids is provided
(e.g., convert the base params dict to a list of (key, value) pairs or create a
new params_list and extend it with ("ids", doc_id) for each id), then pass that
list to self.get(f"/datasets/{self.id}/documents", params=...) so repeated ids
become repeated query params; also add a debug log (using the module/class
logger) before the request that records the ids being included (reference:
params, ids, and self.get).

api/apps/restful_apis/document_api.py (1)

435-451: ⚠️ Potential issue | 🟠 Major

type=filter is aggregating only the current page.

_get_docs_with_request() applies page/page_size before this branch, so the returned filter buckets are built from at most one page of docs while total still reflects the wider match set. On larger datasets this will produce incomplete filter counts and misleading UI state.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 435 - 451, The filter
branch is aggregating only the paginated page because _get_docs_with_request()
applies page/page_size; change the logic so when request.args.get("type") ==
"filter" you fetch the full match set before calling _aggregate_filters — e.g.,
call _get_docs_with_request for the same query but with pagination disabled (or
page_size set to total/match-all) so docs contains all matches, then pass that
full docs list to _aggregate_filters and return get_json_result with the
original total; keep the rest of the response handling (map_doc_keys,
IMG_BASE64_PREFIX, turn2jsonschema) unchanged.

test/testcases/test_web_api/test_document_app/test_document_metadata.py (1)

51-58: ⚠️ Potential issue | 🟡 Minor

Restore the migrated metadata_summary web-API coverage.

These cases are commented out instead of being adapted to the dataset-scoped GET route, so this refactor drops regression coverage for that changed behavior at the web-test layer.

If you want, I can draft the updated pytest cases for the new request shape. Based on learnings "Applies to tests/**/*.py : Add/adjust tests for behavior changes".

Also applies to: 101-108
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`
around lines 51 - 58, The commented-out auth tests removed coverage for the
migrated metadata_summary endpoint; restore and adapt them by re-enabling
test_metadata_summary_auth_invalid and updating calls to
document_metadata_summary to send the new GET request shape (include "doc_ids":
[] or appropriate list and keep "kb_id" if still required) so each
INVALID_AUTH_CASES case asserts res["code"] == expected_code and
expected_fragment in res["message"]; also update the equivalent commented block
around lines covering 101-108 the same way so both sets use the new request
payload shape and existing helpers (document_metadata_summary,
INVALID_AUTH_CASES) rather than the old payload.

🧹 Nitpick comments (2)

api/db/services/document_service.py (1)
150-151: Add a trace point for the new doc_ids branch.

The consolidated document-info flow now has a dedicated ID-filter path, but nothing records when it is used. A small debug log with kb_id and the requested ID count would make rollout issues much easier to diagnose without logging the raw IDs.

As per coding guidelines, "Add logging for new flows".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/db/services/document_service.py` around lines 150 - 151, Add a debug
trace when the doc_ids branch is used: before applying the filter (the block
with "if doc_ids: docs = docs.where(cls.model.id.in_(doc_ids))") emit a debug
log that records the kb_id and the number of requested IDs (len(doc_ids)) but
not the raw IDs; use the service/module logger (e.g., cls.logger or module-level
logger) and a message like "Applying doc_ids filter for kb_id=%s count=%d"
passing kb_id and len(doc_ids) so callers can see the new ID-filter path is
taken without leaking ID values.
api/apps/restful_apis/document_api.py (1)
439-451: Add trace logging for the new consolidated query modes.

The new type=filter and repeated-ids branches materially change this endpoint's behavior, but there is no log telling us which branch was taken or how many IDs were supplied. A small structured log here would make this migration much easier to debug in CI and prod.

As per coding guidelines "Add logging for new flows".

Also applies to: 530-534
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 439 - 451, Add structured
trace logging to record which consolidated query branch is taken and relevant
counts: when evaluating request.args.get("type") == "filter" (the
_aggregate_filters(docs) branch) log at trace/debug level that the "filter"
branch was taken and include total and number of filters/docs; in the else
branch (where renamed_doc_list is built via map_doc_keys and
thumbnail/source_type/parser_config adjustments) log that the "docs" branch was
taken and include total and len(renamed_doc_list); also add analogous logs for
the repeated-`ids` handling mentioned (the branch around lines 530-534) to
record the branch and number of ids supplied. Use the existing logger used in
this module and include clear keys (e.g., branch="filter" or branch="docs",
total=..., docs_count=..., ids_count=...) so logs are structured and searchable.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 523-537: The code validates a single doc_id but not the list
doc_ids; before calling DocumentService.get_by_kb_id() iterate over
q.getlist("ids") (doc_ids) and for each id call DocumentService.query(id=<id>,
kb_id=dataset_id) and if any check fails return RetCode.DATA_ERROR with a
message like "You don't own the document <id>" (mirroring the single-id branch),
then set doc_ids_filter = doc_ids and proceed; ensure you also preserve the
existing mutual-exclusion check between doc_id and doc_ids.

In `@sdk/python/test/test_frontend_api/common.py`:
- Around line 96-103: The bug is that params is initialized as a dict but used
like a list when handling repeated query params in the function handling
doc_ids/doc_id; change params = {} to params = [] and update the single-ID
branch to use params.append(("id", doc_id)) so both branches build a list of
tuples (used for repeated query parameters), ensuring symbols to edit are
params, doc_ids and doc_id; also run ruff check --fix to address the W293
whitespace violations reported around the nearby lines.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`:
- Around line 46-48: The test uses the old document-scoped contract: update
test_infos_auth_invalid and the other failing cases to the dataset-scoped
contract by replacing payload key "doc_ids" with "ids" and by changing calls
that currently use the removed (auth, payload) shape to the new (auth,
dataset_id, payload) shape; e.g., call document_infos(invalid_auth, "kb_id",
{"ids": ["doc_id"]}) instead of document_infos(invalid_auth, {"doc_ids":
["doc_id"]}) so the tests hit /datasets/{dataset_id}/documents with the new
payload format.

In `@web/src/hooks/use-document-request.ts`:
- Around line 218-220: The call to documentFilter in use-document-request.ts is
not passing the active search term, so sidebar counts use the whole dataset;
change the invocation of documentFilter (the line calling
documentFilter(knowledgeId || id)) to include the current keywords parameter
(forward the keywords variable used by useFetchDocumentList) in the request
payload/args so the filter endpoint receives the search term; ensure you pass
the same keywords shape the filter endpoint expects (e.g., add a second arg or
include keywords on the options object) and keep the existing response handling
(data.code === 0) intact.

---

Outside diff comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 435-451: The filter branch is aggregating only the paginated page
because _get_docs_with_request() applies page/page_size; change the logic so
when request.args.get("type") == "filter" you fetch the full match set before
calling _aggregate_filters — e.g., call _get_docs_with_request for the same
query but with pagination disabled (or page_size set to total/match-all) so docs
contains all matches, then pass that full docs list to _aggregate_filters and
return get_json_result with the original total; keep the rest of the response
handling (map_doc_keys, IMG_BASE64_PREFIX, turn2jsonschema) unchanged.

In `@sdk/python/ragflow_sdk/modules/dataset.py`:
- Around line 83-98: The params variable is a dict so params.append will raise
an AttributeError; change the implementation to build query params as a list of
tuples when ids is provided (e.g., convert the base params dict to a list of
(key, value) pairs or create a new params_list and extend it with ("ids",
doc_id) for each id), then pass that list to
self.get(f"/datasets/{self.id}/documents", params=...) so repeated ids become
repeated query params; also add a debug log (using the module/class logger)
before the request that records the ids being included (reference: params, ids,
and self.get).

In `@sdk/python/test/test_frontend_api/test_chunk.py`:
- Around line 58-63: The test currently iterates over res (from get_docs_info)
as for doc_info in res['data'] which walks object keys and makes
doc_info['progress'] fail; change the loop to iterate the docs array returned by
the migrated endpoint (res['data']['docs']) — e.g. read docs =
res['data']['docs'] or loop for doc_info in res['data']['docs'] when checking
doc_info['progress'] compared to doc_count/doc_id_list in the test using
get_docs_info.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`:
- Around line 51-58: The commented-out auth tests removed coverage for the
migrated metadata_summary endpoint; restore and adapt them by re-enabling
test_metadata_summary_auth_invalid and updating calls to
document_metadata_summary to send the new GET request shape (include "doc_ids":
[] or appropriate list and keep "kb_id" if still required) so each
INVALID_AUTH_CASES case asserts res["code"] == expected_code and
expected_fragment in res["message"]; also update the equivalent commented block
around lines covering 101-108 the same way so both sets use the new request
payload shape and existing helpers (document_metadata_summary,
INVALID_AUTH_CASES) rather than the old payload.

---

Nitpick comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 439-451: Add structured trace logging to record which consolidated
query branch is taken and relevant counts: when evaluating
request.args.get("type") == "filter" (the _aggregate_filters(docs) branch) log
at trace/debug level that the "filter" branch was taken and include total and
number of filters/docs; in the else branch (where renamed_doc_list is built via
map_doc_keys and thumbnail/source_type/parser_config adjustments) log that the
"docs" branch was taken and include total and len(renamed_doc_list); also add
analogous logs for the repeated-`ids` handling mentioned (the branch around
lines 530-534) to record the branch and number of ids supplied. Use the existing
logger used in this module and include clear keys (e.g., branch="filter" or
branch="docs", total=..., docs_count=..., ids_count=...) so logs are structured
and searchable.

In `@api/db/services/document_service.py`:
- Around line 150-151: Add a debug trace when the doc_ids branch is used: before
applying the filter (the block with "if doc_ids: docs =
docs.where(cls.model.id.in_(doc_ids))") emit a debug log that records the kb_id
and the number of requested IDs (len(doc_ids)) but not the raw IDs; use the
service/module logger (e.g., cls.logger or module-level logger) and a message
like "Applying doc_ids filter for kb_id=%s count=%d" passing kb_id and
len(doc_ids) so callers can see the new ID-filter path is taken without leaking
ID values.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 96fd7e16-a49f-49bd-a686-b4cd443dcfb2

📥 Commits

Reviewing files that changed from the base of the PR and between f269ee9 and 0eb4304.

📒 Files selected for processing (11)

api/apps/document_app.py
api/apps/restful_apis/document_api.py
api/db/services/document_service.py
sdk/python/ragflow_sdk/modules/dataset.py
sdk/python/test/test_frontend_api/common.py
sdk/python/test/test_frontend_api/test_chunk.py
test/testcases/test_web_api/test_common.py
test/testcases/test_web_api/test_document_app/test_document_metadata.py
web/src/hooks/use-document-request.ts
web/src/services/knowledge-service.ts
web/src/utils/api.ts

coderabbitai · 2026-04-21T01:44:14Z

+    if doc_id:
+        if not DocumentService.query(id=doc_id, kb_id=dataset_id):
+            return RetCode.DATA_ERROR, f"You don't own the document {doc_id}.", [], 0
+        doc_ids_filter = [doc_id] # id provided, ignore other filters
    if doc_name and not DocumentService.query(name=doc_name, kb_id=dataset_id):
        return RetCode.DATA_ERROR, f"You don't own the document {doc_name}.", [], 0

+    doc_ids = q.getlist("ids")
+    if doc_id and len(doc_ids) > 0:
+        return RetCode.DATA_ERROR, f"Should not provide both 'id':{doc_id} and 'ids'{doc_ids}"
+    if len(doc_ids) > 0:
+        doc_ids_filter = doc_ids
+
    docs, total = DocumentService.get_by_kb_id(dataset_id, page, page_size, orderby, desc, keywords, run_status_converted, types, suffix,
-                                               doc_id=doc_id, name=doc_name, doc_ids_filter=doc_ids_filter, return_empty_metadata=return_empty_metadata)
+                                               name=doc_name, doc_ids=doc_ids_filter, return_empty_metadata=return_empty_metadata)


⚠️ Potential issue | 🟠 Major

Validate ids the same way you validate id.

The single-document path rejects a non-owned/nonexistent id before querying, but the new repeated-ids path just forwards the list into get_by_kb_id(). That changes invalid multi-ID lookups from an explicit error into 200 with an empty docs list, so id and ids now behave inconsistently.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 523 - 537, The code validates a single doc_id but not the list doc_ids; before calling DocumentService.get_by_kb_id() iterate over q.getlist("ids") (doc_ids) and for each id call DocumentService.query(id=<id>, kb_id=dataset_id) and if any check fails return RetCode.DATA_ERROR with a message like "You don't own the document <id>" (mirroring the single-id branch), then set doc_ids_filter = doc_ids and proceed; ensure you also preserve the existing mutual-exclusion check between doc_id and doc_ids.

coderabbitai · 2026-04-21T01:44:14Z

+    params = {}
+    if doc_ids:
+        # Multiple IDs
+        for id in doc_ids:
+            params.append(("ids", id))
+    elif doc_id:
+        # Single ID
+        params["id"] = doc_id


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd sdk/python && find . -name "common.py" -path "*/test_frontend_api/*" | head -20

Repository: infiniflow/ragflow

Length of output: 96

🏁 Script executed:

cat -n sdk/python/test/test_frontend_api/common.py | sed -n '90,110p'

Repository: infiniflow/ragflow

Length of output: 828

🏁 Script executed:

cat -n sdk/python/test/test_frontend_api/common.py | sed -n '78,110p'

Repository: infiniflow/ragflow

Length of output: 1334

🏁 Script executed:

# Check if similar pattern exists in dataset.py find . -name "dataset.py" -path "*/ragflow_sdk/modules/*" | head -5

Repository: infiniflow/ragflow

Length of output: 105

🏁 Script executed:

cat -n sdk/python/ragflow_sdk/modules/dataset.py | grep -A 20 "params.*append\|params.*\[\]" | head -40

Repository: infiniflow/ragflow

Length of output: 1043

🏁 Script executed:

cat -n sdk/python/ragflow_sdk/modules/dataset.py | sed -n '85,110p'

Repository: infiniflow/ragflow

Length of output: 1270

🏁 Script executed:

cat -n sdk/python/ragflow_sdk/modules/dataset.py | sed -n '65,100p'

Repository: infiniflow/ragflow

Length of output: 1471

🏁 Script executed:

# Check ruff formatting/linting status of the file cd sdk/python && python -m pip install ruff -q && ruff check test/test_frontend_api/common.py --select=E,F,W 2>&1 | head -50

Repository: infiniflow/ragflow

Length of output: 2256

Fix params initialization to support repeated query parameters.

Line 96 initializes params = {}, but line 100 calls .append() on it. Dicts don't have .append(), so the first doc_ids request fails with AttributeError. The fix is to initialize params = [] and convert the single-ID branch to append as well:

Fix

- params = {} + params = [] if doc_ids: # Multiple IDs for id in doc_ids: params.append(("ids", id)) elif doc_id: # Single ID - params["id"] = doc_id + params.append(("id", doc_id))

Additionally, ruff reports 4 whitespace linting violations (W293) at lines 81, 87, 94, 104. Run ruff check --fix to resolve them per the coding guidelines.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk/python/test/test_frontend_api/common.py` around lines 96 - 103, The bug is that params is initialized as a dict but used like a list when handling repeated query params in the function handling doc_ids/doc_id; change params = {} to params = [] and update the single-ID branch to use params.append(("id", doc_id)) so both branches build a list of tuples (used for repeated query parameters), ensuring symbols to edit are params, doc_ids and doc_id; also run ruff check --fix to address the W293 whitespace violations reported around the nearby lines.

coderabbitai · 2026-04-21T01:44:14Z

    def test_infos_auth_invalid(self, invalid_auth, expected_code, expected_fragment):
-        res = document_infos(invalid_auth, {"doc_ids": ["doc_id"]})
+        res = document_infos(invalid_auth, "kb_id", {"doc_ids": ["doc_id"]})
        assert res["code"] == expected_code, res


⚠️ Potential issue | 🟠 Major

Update the remaining document_infos tests to the new dataset-scoped contract.

Line 47 still uses doc_ids instead of ids, and Lines 178-180 still use the removed (auth, payload) call shape. The invalid-doc case will now request /datasets/{'doc_ids': ['invalid_id']}/documents, so it no longer validates the migrated endpoint behavior.

Suggested test update

- res = document_infos(invalid_auth, "kb_id", {"doc_ids": ["doc_id"]}) + res = document_infos(invalid_auth, "kb_id", {"ids": ["doc_id"]}) ... - def test_infos_invalid_doc_id(self, WebApiAuth): - res = document_infos(WebApiAuth, {"doc_ids": ["invalid_id"]}) + def test_infos_invalid_doc_id(self, WebApiAuth, add_document_func): + dataset_id, _ = add_document_func + res = document_infos(WebApiAuth, dataset_id, {"ids": ["invalid_id"]})

Also applies to: 177-180

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py` around lines 46 - 48, The test uses the old document-scoped contract: update test_infos_auth_invalid and the other failing cases to the dataset-scoped contract by replacing payload key "doc_ids" with "ids" and by changing calls that currently use the removed (auth, payload) shape to the new (auth, dataset_id, payload) shape; e.g., call document_infos(invalid_auth, "kb_id", {"ids": ["doc_id"]}) instead of document_infos(invalid_auth, {"doc_ids": ["doc_id"]}) so the tests hit /datasets/{dataset_id}/documents with the new payload format.

coderabbitai · 2026-04-21T01:44:14Z

+      const { data } = await documentFilter(knowledgeId || id);
      if (data.code === 0) {
        return data.data;


⚠️ Potential issue | 🟠 Major

Forward keywords to the filter request.

useFetchDocumentList still filters the document list by keywords, but this call no longer does. Since the consolidated type=filter endpoint aggregates from the same request args, the sidebar counts will be computed for the whole dataset whenever a search term is active.

💡 Minimal fix

- const { data } = await documentFilter(knowledgeId || id); + const { data } = await documentFilter(knowledgeId || id, { + keywords: debouncedSearchString, + });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const { data } = await documentFilter(knowledgeId || id);

if (data.code === 0) {

return data.data;

const { data } = await documentFilter(knowledgeId || id, {

keywords: debouncedSearchString,

});

if (data.code === 0) {

return data.data;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@web/src/hooks/use-document-request.ts` around lines 218 - 220, The call to documentFilter in use-document-request.ts is not passing the active search term, so sidebar counts use the whole dataset; change the invocation of documentFilter (the line calling documentFilter(knowledgeId || id)) to include the current keywords parameter (forward the keywords variable used by useFetchDocumentList) in the request payload/args so the filter endpoint receives the search term; ensure you pass the same keywords shape the filter endpoint expects (e.g., add a second arg or include keywords on the options object) and keep the existing response handling (data.code === 0) intact.

codecov · 2026-04-21T01:47:31Z

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.97%. Comparing base (009e538) to head (09c5f02).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
sdk/python/ragflow_sdk/modules/dataset.py	0.00%	3 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #14239      +/-   ##
==========================================
- Coverage   96.66%   95.97%   -0.70%     
==========================================
  Files          10       10              
  Lines         690      695       +5     
  Branches      108      111       +3     
==========================================
  Hits          667      667              
- Misses          8       11       +3     
- Partials       15       17       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

xugangqiang added 17 commits April 17, 2026 10:38

merge list_docs api

375edba

fix

1cf35bd

Merge remote-tracking branch 'upstream/main' into merge-doc-list

93454f3

fix

b91b9df

trigger CI

d0bdcb0

trigger CI

f251468

trigger ci

ca02eaa

fix keyword filtering

d52bdba

Add id/name support for list docs

57dfaa7

merge filter

e5fc37c

Merge remote-tracking branch 'upstream/main' into migrate-doc-get-filter

179fea0

fix

a6ca87f

fix

3a5cd39

Merge branch 'migrate-doc-get-filter' into migrate-doc-infos

80914f6

Merge remote-tracking branch 'upstream/main' into migrate-doc-infos

4e147af

migrate doc infos

c239715

fix

0eb4304

xugangqiang self-assigned this Apr 21, 2026

xugangqiang added the ci Continue Integration label Apr 21, 2026

xugangqiang marked this pull request as ready for review April 21, 2026 01:35

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. 🌈 python Pull requests that update Python code 🐖api The modified files are located under directory 'api/apps/sdk' 🧰 typescript Pull requests that update Typescript code labels Apr 21, 2026

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

xugangqiang marked this pull request as draft April 21, 2026 02:03

Merge branch 'main' into migrate-doc-infos

09c5f02

xugangqiang marked this pull request as ready for review April 21, 2026 11:01

dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Apr 21, 2026

JinHai-CN merged commit 2d05475 into infiniflow:main Apr 21, 2026
3 checks passed

xugangqiang deleted the migrate-doc-infos branch April 22, 2026 07:50

This was referenced Apr 24, 2026

Refactor: Doc batch change status #14337

Open

Refa: unify document create flows under REST documents API #14345

Open

Refactor: migrate document thumbnails API #14344

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Consolidation WEB API & HTTP API for document infos#14239

Refactor: Consolidation WEB API & HTTP API for document infos#14239
JinHai-CN merged 18 commits intoinfiniflow:mainfrom
xugangqiang:migrate-doc-infos

xugangqiang commented Apr 21, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

codecov Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xugangqiang commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Type of change

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xugangqiang commented Apr 21, 2026 •

edited

Loading

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

codecov Bot commented Apr 21, 2026 •

edited

Loading