Skip to content

Refactor: Consolidation WEB API & HTTP API for document infos#14239

Merged
JinHai-CN merged 18 commits intoinfiniflow:mainfrom
xugangqiang:migrate-doc-infos
Apr 21, 2026
Merged

Refactor: Consolidation WEB API & HTTP API for document infos#14239
JinHai-CN merged 18 commits intoinfiniflow:mainfrom
xugangqiang:migrate-doc-infos

Conversation

@xugangqiang
Copy link
Copy Markdown
Contributor

@xugangqiang xugangqiang commented Apr 21, 2026

What problem does this PR solve?

Before consolidation
Web API: POST /v1/document/infos
Http API - GET /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- GET /api/v1/datasets/<dataset_id>/documents?ids=id1&ids=id2

Type of change

  • Refactoring

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

📝 Walkthrough

Walkthrough

This PR consolidates POST /filter and /infos document endpoints into a unified GET /datasets/{dataset_id}/documents endpoint. It updates the document service signature to use a single doc_ids parameter, refactors the API layer to conditionally return filter aggregates or document lists based on query parameters, and updates corresponding SDK, test, and frontend code to use the new endpoint patterns.

Changes

Cohort / File(s) Summary
Backend API Consolidation
api/apps/document_app.py, api/apps/restful_apis/document_api.py
Removed POST /filter and /infos handlers from document_app.py. Added conditional response logic to list_docs to return aggregated filter data when type=filter query parameter is set, with new _aggregate_filters helper function for metadata/status/suffix counters.
Service Layer Refactoring
api/db/services/document_service.py
Updated DocumentService.get_by_kb_id signature to replace separate doc_id and doc_ids_filter parameters with unified doc_ids parameter, simplifying the filtering interface.
SDK Dataset Module
sdk/python/ragflow_sdk/modules/dataset.py
Added ids parameter to DataSet.list_documents alongside existing id, with validation rejecting simultaneous use of both parameters.
SDK Test Utilities
sdk/python/test/test_frontend_api/common.py, sdk/python/test/test_frontend_api/test_chunk.py
Updated get_docs_info to accept dataset context and support both single/multi document ID parameters, switching from POST /document/infos to GET /datasets/{dataset_id}/documents. Updated test call sites accordingly.
Backend Integration Tests
test/testcases/test_web_api/test_common.py
Updated list_documents, document_filter, and document_infos test helpers to use new GET endpoint signatures with dataset scoping and query parameter passing.
Backend Integration Tests (Metadata)
test/testcases/test_web_api/test_document_app/test_document_metadata.py
Updated test assertions and call sites to match new endpoint signatures; removed tests directly exercising removed get_filter() and doc_infos() handlers.
Frontend Hooks & Services
web/src/hooks/use-document-request.ts, web/src/services/knowledge-service.ts, web/src/utils/api.ts
Removed documentInfos endpoint mapping. Updated documentFilter to use GET request via new getDatasetFilter(datasetId) function returning dynamic URL instead of static string.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

☯️ refactor, size:XL, 🐖api

Suggested reviewers

  • yuzhichang
  • JinHai-CN
  • yingfeng

Poem

🐰 Endpoints hop and consolidate with grace,
Filter and docs find their rightful place,
GET requests dancing where POST once stood,
One unified path—now that's understood! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main refactoring objective: consolidating Web API and HTTP API endpoints for document information retrieval into a unified RESTful endpoint.
Description check ✅ Passed The description provides a clear before/after comparison of the API consolidation. However, the 'Type of change' section shows unchecked boxes and only 'Refactoring' is mentioned without explicit confirmation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@xugangqiang xugangqiang self-assigned this Apr 21, 2026
@xugangqiang xugangqiang added the ci Continue Integration label Apr 21, 2026
@xugangqiang xugangqiang marked this pull request as ready for review April 21, 2026 01:35
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. 🌈 python Pull requests that update Python code 🐖api The modified files are located under directory 'api/apps/sdk' 🧰 typescript Pull requests that update Typescript code labels Apr 21, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
sdk/python/test/test_frontend_api/test_chunk.py (1)

58-63: ⚠️ Potential issue | 🟠 Major

Poll res["data"]["docs"], not res["data"].

The migrated endpoint now returns data = {"total": ..., "docs": [...]}. Iterating res["data"] walks object keys, so doc_info["progress"] fails instead of checking document completion.

💡 Minimal fix
-        for doc_info in res['data']:
+        for doc_info in res['data']['docs']:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/python/test/test_frontend_api/test_chunk.py` around lines 58 - 63, The
test currently iterates over res (from get_docs_info) as for doc_info in
res['data'] which walks object keys and makes doc_info['progress'] fail; change
the loop to iterate the docs array returned by the migrated endpoint
(res['data']['docs']) — e.g. read docs = res['data']['docs'] or loop for
doc_info in res['data']['docs'] when checking doc_info['progress'] compared to
doc_count/doc_id_list in the test using get_docs_info.
sdk/python/ragflow_sdk/modules/dataset.py (1)

83-98: ⚠️ Potential issue | 🔴 Critical

params.append() will fail—params is a dict, not a list.

When ids is provided, the code calls params.append(("ids", doc_id)) on a dict object, raising AttributeError before the request is sent. The requests library accepts a list of tuples for repeated query parameters. Additionally, the new ids flow lacks logging, which the coding guidelines require for new flows in Python files.

Fix
-        params = {
-            "id": id,
-            "name": name,
-            "keywords": keywords,
-            "page": page,
-            "page_size": page_size,
-            "orderby": orderby,
-            "desc": desc,
-            "create_time_from": create_time_from,
-            "create_time_to": create_time_to,
-        }
+        params = [
+            (k, v)
+            for k, v in {
+                "id": id,
+                "name": name,
+                "keywords": keywords,
+                "page": page,
+                "page_size": page_size,
+                "orderby": orderby,
+                "desc": desc,
+                "create_time_from": create_time_from,
+                "create_time_to": create_time_to,
+            }.items()
+            if v is not None
+        ]
         # Handle ids parameter - convert to multiple query params
         if ids:
             for doc_id in ids:
                 params.append(("ids", doc_id))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/python/ragflow_sdk/modules/dataset.py` around lines 83 - 98, The params
variable is a dict so params.append will raise an AttributeError; change the
implementation to build query params as a list of tuples when ids is provided
(e.g., convert the base params dict to a list of (key, value) pairs or create a
new params_list and extend it with ("ids", doc_id) for each id), then pass that
list to self.get(f"/datasets/{self.id}/documents", params=...) so repeated ids
become repeated query params; also add a debug log (using the module/class
logger) before the request that records the ids being included (reference:
params, ids, and self.get).
api/apps/restful_apis/document_api.py (1)

435-451: ⚠️ Potential issue | 🟠 Major

type=filter is aggregating only the current page.

_get_docs_with_request() applies page/page_size before this branch, so the returned filter buckets are built from at most one page of docs while total still reflects the wider match set. On larger datasets this will produce incomplete filter counts and misleading UI state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 435 - 451, The filter
branch is aggregating only the paginated page because _get_docs_with_request()
applies page/page_size; change the logic so when request.args.get("type") ==
"filter" you fetch the full match set before calling _aggregate_filters — e.g.,
call _get_docs_with_request for the same query but with pagination disabled (or
page_size set to total/match-all) so docs contains all matches, then pass that
full docs list to _aggregate_filters and return get_json_result with the
original total; keep the rest of the response handling (map_doc_keys,
IMG_BASE64_PREFIX, turn2jsonschema) unchanged.
test/testcases/test_web_api/test_document_app/test_document_metadata.py (1)

51-58: ⚠️ Potential issue | 🟡 Minor

Restore the migrated metadata_summary web-API coverage.

These cases are commented out instead of being adapted to the dataset-scoped GET route, so this refactor drops regression coverage for that changed behavior at the web-test layer.

If you want, I can draft the updated pytest cases for the new request shape. Based on learnings "Applies to tests/**/*.py : Add/adjust tests for behavior changes".

Also applies to: 101-108

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`
around lines 51 - 58, The commented-out auth tests removed coverage for the
migrated metadata_summary endpoint; restore and adapt them by re-enabling
test_metadata_summary_auth_invalid and updating calls to
document_metadata_summary to send the new GET request shape (include "doc_ids":
[] or appropriate list and keep "kb_id" if still required) so each
INVALID_AUTH_CASES case asserts res["code"] == expected_code and
expected_fragment in res["message"]; also update the equivalent commented block
around lines covering 101-108 the same way so both sets use the new request
payload shape and existing helpers (document_metadata_summary,
INVALID_AUTH_CASES) rather than the old payload.
🧹 Nitpick comments (2)
api/db/services/document_service.py (1)

150-151: Add a trace point for the new doc_ids branch.

The consolidated document-info flow now has a dedicated ID-filter path, but nothing records when it is used. A small debug log with kb_id and the requested ID count would make rollout issues much easier to diagnose without logging the raw IDs.

As per coding guidelines, "Add logging for new flows".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/db/services/document_service.py` around lines 150 - 151, Add a debug
trace when the doc_ids branch is used: before applying the filter (the block
with "if doc_ids: docs = docs.where(cls.model.id.in_(doc_ids))") emit a debug
log that records the kb_id and the number of requested IDs (len(doc_ids)) but
not the raw IDs; use the service/module logger (e.g., cls.logger or module-level
logger) and a message like "Applying doc_ids filter for kb_id=%s count=%d"
passing kb_id and len(doc_ids) so callers can see the new ID-filter path is
taken without leaking ID values.
api/apps/restful_apis/document_api.py (1)

439-451: Add trace logging for the new consolidated query modes.

The new type=filter and repeated-ids branches materially change this endpoint's behavior, but there is no log telling us which branch was taken or how many IDs were supplied. A small structured log here would make this migration much easier to debug in CI and prod.

As per coding guidelines "Add logging for new flows".

Also applies to: 530-534

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 439 - 451, Add structured
trace logging to record which consolidated query branch is taken and relevant
counts: when evaluating request.args.get("type") == "filter" (the
_aggregate_filters(docs) branch) log at trace/debug level that the "filter"
branch was taken and include total and number of filters/docs; in the else
branch (where renamed_doc_list is built via map_doc_keys and
thumbnail/source_type/parser_config adjustments) log that the "docs" branch was
taken and include total and len(renamed_doc_list); also add analogous logs for
the repeated-`ids` handling mentioned (the branch around lines 530-534) to
record the branch and number of ids supplied. Use the existing logger used in
this module and include clear keys (e.g., branch="filter" or branch="docs",
total=..., docs_count=..., ids_count=...) so logs are structured and searchable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 523-537: The code validates a single doc_id but not the list
doc_ids; before calling DocumentService.get_by_kb_id() iterate over
q.getlist("ids") (doc_ids) and for each id call DocumentService.query(id=<id>,
kb_id=dataset_id) and if any check fails return RetCode.DATA_ERROR with a
message like "You don't own the document <id>" (mirroring the single-id branch),
then set doc_ids_filter = doc_ids and proceed; ensure you also preserve the
existing mutual-exclusion check between doc_id and doc_ids.

In `@sdk/python/test/test_frontend_api/common.py`:
- Around line 96-103: The bug is that params is initialized as a dict but used
like a list when handling repeated query params in the function handling
doc_ids/doc_id; change params = {} to params = [] and update the single-ID
branch to use params.append(("id", doc_id)) so both branches build a list of
tuples (used for repeated query parameters), ensuring symbols to edit are
params, doc_ids and doc_id; also run ruff check --fix to address the W293
whitespace violations reported around the nearby lines.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`:
- Around line 46-48: The test uses the old document-scoped contract: update
test_infos_auth_invalid and the other failing cases to the dataset-scoped
contract by replacing payload key "doc_ids" with "ids" and by changing calls
that currently use the removed (auth, payload) shape to the new (auth,
dataset_id, payload) shape; e.g., call document_infos(invalid_auth, "kb_id",
{"ids": ["doc_id"]}) instead of document_infos(invalid_auth, {"doc_ids":
["doc_id"]}) so the tests hit /datasets/{dataset_id}/documents with the new
payload format.

In `@web/src/hooks/use-document-request.ts`:
- Around line 218-220: The call to documentFilter in use-document-request.ts is
not passing the active search term, so sidebar counts use the whole dataset;
change the invocation of documentFilter (the line calling
documentFilter(knowledgeId || id)) to include the current keywords parameter
(forward the keywords variable used by useFetchDocumentList) in the request
payload/args so the filter endpoint receives the search term; ensure you pass
the same keywords shape the filter endpoint expects (e.g., add a second arg or
include keywords on the options object) and keep the existing response handling
(data.code === 0) intact.

---

Outside diff comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 435-451: The filter branch is aggregating only the paginated page
because _get_docs_with_request() applies page/page_size; change the logic so
when request.args.get("type") == "filter" you fetch the full match set before
calling _aggregate_filters — e.g., call _get_docs_with_request for the same
query but with pagination disabled (or page_size set to total/match-all) so docs
contains all matches, then pass that full docs list to _aggregate_filters and
return get_json_result with the original total; keep the rest of the response
handling (map_doc_keys, IMG_BASE64_PREFIX, turn2jsonschema) unchanged.

In `@sdk/python/ragflow_sdk/modules/dataset.py`:
- Around line 83-98: The params variable is a dict so params.append will raise
an AttributeError; change the implementation to build query params as a list of
tuples when ids is provided (e.g., convert the base params dict to a list of
(key, value) pairs or create a new params_list and extend it with ("ids",
doc_id) for each id), then pass that list to
self.get(f"/datasets/{self.id}/documents", params=...) so repeated ids become
repeated query params; also add a debug log (using the module/class logger)
before the request that records the ids being included (reference: params, ids,
and self.get).

In `@sdk/python/test/test_frontend_api/test_chunk.py`:
- Around line 58-63: The test currently iterates over res (from get_docs_info)
as for doc_info in res['data'] which walks object keys and makes
doc_info['progress'] fail; change the loop to iterate the docs array returned by
the migrated endpoint (res['data']['docs']) — e.g. read docs =
res['data']['docs'] or loop for doc_info in res['data']['docs'] when checking
doc_info['progress'] compared to doc_count/doc_id_list in the test using
get_docs_info.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`:
- Around line 51-58: The commented-out auth tests removed coverage for the
migrated metadata_summary endpoint; restore and adapt them by re-enabling
test_metadata_summary_auth_invalid and updating calls to
document_metadata_summary to send the new GET request shape (include "doc_ids":
[] or appropriate list and keep "kb_id" if still required) so each
INVALID_AUTH_CASES case asserts res["code"] == expected_code and
expected_fragment in res["message"]; also update the equivalent commented block
around lines covering 101-108 the same way so both sets use the new request
payload shape and existing helpers (document_metadata_summary,
INVALID_AUTH_CASES) rather than the old payload.

---

Nitpick comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 439-451: Add structured trace logging to record which consolidated
query branch is taken and relevant counts: when evaluating
request.args.get("type") == "filter" (the _aggregate_filters(docs) branch) log
at trace/debug level that the "filter" branch was taken and include total and
number of filters/docs; in the else branch (where renamed_doc_list is built via
map_doc_keys and thumbnail/source_type/parser_config adjustments) log that the
"docs" branch was taken and include total and len(renamed_doc_list); also add
analogous logs for the repeated-`ids` handling mentioned (the branch around
lines 530-534) to record the branch and number of ids supplied. Use the existing
logger used in this module and include clear keys (e.g., branch="filter" or
branch="docs", total=..., docs_count=..., ids_count=...) so logs are structured
and searchable.

In `@api/db/services/document_service.py`:
- Around line 150-151: Add a debug trace when the doc_ids branch is used: before
applying the filter (the block with "if doc_ids: docs =
docs.where(cls.model.id.in_(doc_ids))") emit a debug log that records the kb_id
and the number of requested IDs (len(doc_ids)) but not the raw IDs; use the
service/module logger (e.g., cls.logger or module-level logger) and a message
like "Applying doc_ids filter for kb_id=%s count=%d" passing kb_id and
len(doc_ids) so callers can see the new ID-filter path is taken without leaking
ID values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 96fd7e16-a49f-49bd-a686-b4cd443dcfb2

📥 Commits

Reviewing files that changed from the base of the PR and between f269ee9 and 0eb4304.

📒 Files selected for processing (11)
  • api/apps/document_app.py
  • api/apps/restful_apis/document_api.py
  • api/db/services/document_service.py
  • sdk/python/ragflow_sdk/modules/dataset.py
  • sdk/python/test/test_frontend_api/common.py
  • sdk/python/test/test_frontend_api/test_chunk.py
  • test/testcases/test_web_api/test_common.py
  • test/testcases/test_web_api/test_document_app/test_document_metadata.py
  • web/src/hooks/use-document-request.ts
  • web/src/services/knowledge-service.ts
  • web/src/utils/api.ts

Comment on lines +523 to +537
if doc_id:
if not DocumentService.query(id=doc_id, kb_id=dataset_id):
return RetCode.DATA_ERROR, f"You don't own the document {doc_id}.", [], 0
doc_ids_filter = [doc_id] # id provided, ignore other filters
if doc_name and not DocumentService.query(name=doc_name, kb_id=dataset_id):
return RetCode.DATA_ERROR, f"You don't own the document {doc_name}.", [], 0

doc_ids = q.getlist("ids")
if doc_id and len(doc_ids) > 0:
return RetCode.DATA_ERROR, f"Should not provide both 'id':{doc_id} and 'ids'{doc_ids}"
if len(doc_ids) > 0:
doc_ids_filter = doc_ids

docs, total = DocumentService.get_by_kb_id(dataset_id, page, page_size, orderby, desc, keywords, run_status_converted, types, suffix,
doc_id=doc_id, name=doc_name, doc_ids_filter=doc_ids_filter, return_empty_metadata=return_empty_metadata)
name=doc_name, doc_ids=doc_ids_filter, return_empty_metadata=return_empty_metadata)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate ids the same way you validate id.

The single-document path rejects a non-owned/nonexistent id before querying, but the new repeated-ids path just forwards the list into get_by_kb_id(). That changes invalid multi-ID lookups from an explicit error into 200 with an empty docs list, so id and ids now behave inconsistently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 523 - 537, The code
validates a single doc_id but not the list doc_ids; before calling
DocumentService.get_by_kb_id() iterate over q.getlist("ids") (doc_ids) and for
each id call DocumentService.query(id=<id>, kb_id=dataset_id) and if any check
fails return RetCode.DATA_ERROR with a message like "You don't own the document
<id>" (mirroring the single-id branch), then set doc_ids_filter = doc_ids and
proceed; ensure you also preserve the existing mutual-exclusion check between
doc_id and doc_ids.

Comment on lines +96 to +103
params = {}
if doc_ids:
# Multiple IDs
for id in doc_ids:
params.append(("ids", id))
elif doc_id:
# Single ID
params["id"] = doc_id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd sdk/python && find . -name "common.py" -path "*/test_frontend_api/*" | head -20

Repository: infiniflow/ragflow

Length of output: 96


🏁 Script executed:

cat -n sdk/python/test/test_frontend_api/common.py | sed -n '90,110p'

Repository: infiniflow/ragflow

Length of output: 828


🏁 Script executed:

cat -n sdk/python/test/test_frontend_api/common.py | sed -n '78,110p'

Repository: infiniflow/ragflow

Length of output: 1334


🏁 Script executed:

# Check if similar pattern exists in dataset.py
find . -name "dataset.py" -path "*/ragflow_sdk/modules/*" | head -5

Repository: infiniflow/ragflow

Length of output: 105


🏁 Script executed:

cat -n sdk/python/ragflow_sdk/modules/dataset.py | grep -A 20 "params.*append\|params.*\[\]" | head -40

Repository: infiniflow/ragflow

Length of output: 1043


🏁 Script executed:

cat -n sdk/python/ragflow_sdk/modules/dataset.py | sed -n '85,110p'

Repository: infiniflow/ragflow

Length of output: 1270


🏁 Script executed:

cat -n sdk/python/ragflow_sdk/modules/dataset.py | sed -n '65,100p'

Repository: infiniflow/ragflow

Length of output: 1471


🏁 Script executed:

# Check ruff formatting/linting status of the file
cd sdk/python && python -m pip install ruff -q && ruff check test/test_frontend_api/common.py --select=E,F,W 2>&1 | head -50

Repository: infiniflow/ragflow

Length of output: 2256


Fix params initialization to support repeated query parameters.

Line 96 initializes params = {}, but line 100 calls .append() on it. Dicts don't have .append(), so the first doc_ids request fails with AttributeError. The fix is to initialize params = [] and convert the single-ID branch to append as well:

Fix
-    params = {}
+    params = []
     if doc_ids:
         # Multiple IDs
         for id in doc_ids:
             params.append(("ids", id))
     elif doc_id:
         # Single ID
-        params["id"] = doc_id
+        params.append(("id", doc_id))

Additionally, ruff reports 4 whitespace linting violations (W293) at lines 81, 87, 94, 104. Run ruff check --fix to resolve them per the coding guidelines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/python/test/test_frontend_api/common.py` around lines 96 - 103, The bug
is that params is initialized as a dict but used like a list when handling
repeated query params in the function handling doc_ids/doc_id; change params =
{} to params = [] and update the single-ID branch to use params.append(("id",
doc_id)) so both branches build a list of tuples (used for repeated query
parameters), ensuring symbols to edit are params, doc_ids and doc_id; also run
ruff check --fix to address the W293 whitespace violations reported around the
nearby lines.

Comment on lines 46 to 48
def test_infos_auth_invalid(self, invalid_auth, expected_code, expected_fragment):
res = document_infos(invalid_auth, {"doc_ids": ["doc_id"]})
res = document_infos(invalid_auth, "kb_id", {"doc_ids": ["doc_id"]})
assert res["code"] == expected_code, res
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update the remaining document_infos tests to the new dataset-scoped contract.

Line 47 still uses doc_ids instead of ids, and Lines 178-180 still use the removed (auth, payload) call shape. The invalid-doc case will now request /datasets/{'doc_ids': ['invalid_id']}/documents, so it no longer validates the migrated endpoint behavior.

Suggested test update
-        res = document_infos(invalid_auth, "kb_id", {"doc_ids": ["doc_id"]})
+        res = document_infos(invalid_auth, "kb_id", {"ids": ["doc_id"]})
...
-    def test_infos_invalid_doc_id(self, WebApiAuth):
-        res = document_infos(WebApiAuth, {"doc_ids": ["invalid_id"]})
+    def test_infos_invalid_doc_id(self, WebApiAuth, add_document_func):
+        dataset_id, _ = add_document_func
+        res = document_infos(WebApiAuth, dataset_id, {"ids": ["invalid_id"]})

Also applies to: 177-180

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`
around lines 46 - 48, The test uses the old document-scoped contract: update
test_infos_auth_invalid and the other failing cases to the dataset-scoped
contract by replacing payload key "doc_ids" with "ids" and by changing calls
that currently use the removed (auth, payload) shape to the new (auth,
dataset_id, payload) shape; e.g., call document_infos(invalid_auth, "kb_id",
{"ids": ["doc_id"]}) instead of document_infos(invalid_auth, {"doc_ids":
["doc_id"]}) so the tests hit /datasets/{dataset_id}/documents with the new
payload format.

Comment on lines +218 to 220
const { data } = await documentFilter(knowledgeId || id);
if (data.code === 0) {
return data.data;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Forward keywords to the filter request.

useFetchDocumentList still filters the document list by keywords, but this call no longer does. Since the consolidated type=filter endpoint aggregates from the same request args, the sidebar counts will be computed for the whole dataset whenever a search term is active.

💡 Minimal fix
-      const { data } = await documentFilter(knowledgeId || id);
+      const { data } = await documentFilter(knowledgeId || id, {
+        keywords: debouncedSearchString,
+      });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const { data } = await documentFilter(knowledgeId || id);
if (data.code === 0) {
return data.data;
const { data } = await documentFilter(knowledgeId || id, {
keywords: debouncedSearchString,
});
if (data.code === 0) {
return data.data;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/hooks/use-document-request.ts` around lines 218 - 220, The call to
documentFilter in use-document-request.ts is not passing the active search term,
so sidebar counts use the whole dataset; change the invocation of documentFilter
(the line calling documentFilter(knowledgeId || id)) to include the current
keywords parameter (forward the keywords variable used by useFetchDocumentList)
in the request payload/args so the filter endpoint receives the search term;
ensure you pass the same keywords shape the filter endpoint expects (e.g., add a
second arg or include keywords on the options object) and keep the existing
response handling (data.code === 0) intact.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.97%. Comparing base (009e538) to head (09c5f02).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
sdk/python/ragflow_sdk/modules/dataset.py 0.00% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14239      +/-   ##
==========================================
- Coverage   96.66%   95.97%   -0.70%     
==========================================
  Files          10       10              
  Lines         690      695       +5     
  Branches      108      111       +3     
==========================================
  Hits          667      667              
- Misses          8       11       +3     
- Partials       15       17       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xugangqiang xugangqiang marked this pull request as draft April 21, 2026 02:03
@xugangqiang xugangqiang marked this pull request as ready for review April 21, 2026 11:01
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Apr 21, 2026
@JinHai-CN JinHai-CN merged commit 2d05475 into infiniflow:main Apr 21, 2026
3 checks passed
@xugangqiang xugangqiang deleted the migrate-doc-infos branch April 22, 2026 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐖api The modified files are located under directory 'api/apps/sdk' ci Continue Integration 🌈 python Pull requests that update Python code size:M This PR changes 30-99 lines, ignoring generated files. 🧰 typescript Pull requests that update Typescript code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants