Refactor: Consolidation WEB API & HTTP API for document get_filter by xugangqiang · Pull Request #14230 · infiniflow/ragflow

xugangqiang · 2026-04-20T08:42:55Z

What problem does this PR solve?

Before consolidation
Web API: POST /v1/document/filter
Http API - GET /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- GET /api/v1/datasets/<dataset_id>/documents?type=filter

Type of change

Refactoring

coderabbitai · 2026-04-20T08:43:06Z

📝 Walkthrough

Walkthrough

Removed the standalone POST /filter endpoint and folded dataset-level filtering into GET /datasets/{id}/documents?type=filter. Updated backend service signature to use unified doc_ids, adjusted API aggregation logic, and updated frontend calls and tests to use the new GET-based filter flow. (50 words)

Changes

Cohort / File(s)	Summary
Backend: removed standalone filter route `api/apps/document_app.py`	Deleted the `/filter` POST handler and its JSON parsing, auth/validation, and service call; removed now-unused imports.
Backend: listing & aggregation `api/apps/restful_apis/document_api.py`	`list_docs` handles `type=filter` returning `{"total", "filter": _aggregate_filters(docs)}`; adjusted doc-id ownership/filter logic and switched call to `DocumentService.get_by_kb_id(..., doc_ids=...)`; added `_aggregate_filters`.
Backend: service signature `api/db/services/document_service.py`	Changed `get_by_kb_id` signature: removed `doc_id`/`doc_ids_filter`, added `doc_ids` parameter and applied `id IN (...)` filtering.
Tests: adapt to GET filter flow `test/testcases/test_web_api/test_common.py`, `test/testcases/test_web_api/test_document_app/test_document_metadata.py`	Test helpers and call sites switched from POST `/filter` to GET `/datasets/{id}/documents?type=filter`; signatures, request formation, expectations updated; some filter unit tests removed.
Frontend: API and hooks `web/src/utils/api.ts`, `web/src/services/knowledge-service.ts`, `web/src/hooks/use-document-request.ts`	`getDatasetFilter` became a function returning `/datasets/${id}/documents?type=filter`; `documentFilter` changed from POST to GET (empty params); hook no longer forwards explicit `keywords` payload.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant FE as Frontend
    participant API as REST API (document_api)
    participant Service as DocumentService
    participant DB as Database

    FE->>API: GET /datasets/{kb_id}/documents?type=filter
    API->>Service: get_by_kb_id(kb_id, doc_ids, suffix, name, return_empty_metadata)
    Service->>DB: SELECT ... WHERE kb_id=? AND id IN (...) AND other filters
    DB-->>Service: documents with meta_fields
    Service-->>API: documents list
    API->>API: _aggregate_filters(docs) => {suffix, run_status, metadata counts, empty_metadata}
    API-->>FE: 200 {"total": n, "filter": {...}}

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Refactor: Consolidation WEB API & HTTP API for document list_docs #14176 — modifies document-listing/filter endpoints and DocumentService call signatures (overlaps backend modules and endpoint shape).
Refactor: Standardize naming convention to camelCase #14079 — adjusts frontend API mappings and document filter calls (related frontend changes to api/utils/hooks).

Suggested labels

☯️ refactor, 🧪 test, 🐖api

Suggested reviewers

yuzhichang
JinHai-CN
yingfeng

Poem

🐰 I hopped from POST to GET today,

Filters folded neat along the way,
Docs and metas counted with delight,
Frontend, API, DB — all aligned right,
A carrot-coded change — quick, clean, and bright! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main refactoring objective: consolidating the document filter endpoints from two separate APIs into one unified endpoint.
Description check	✅ Passed	The PR description addresses the problem being solved (consolidation of two endpoints) and specifies the type of change as Refactoring, meeting the template requirements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

api/apps/restful_apis/document_api.py (1)

658-667: ⚠️ Potential issue | 🔴 Critical

Potential NameError: doc_ids_filter may be undefined.

At line 660, doc_ids_filter is referenced, but it's only defined inside the if metadata_condition: block at line 635. If metadata_condition is falsy but metadata is truthy, doc_ids_filter will be undefined when checked at line 660.

🐛 Proposed fix

+    doc_ids_filter = None
     if metadata_condition:
         doc_ids_filter = set(meta_filter(metas, convert_conditions(metadata_condition), metadata_condition.get("logic", "and")))
         if metadata_condition.get("conditions") and not doc_ids_filter:
             return RetCode.SUCCESS, "", [], return_empty_metadata

     if metadata:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 658 - 667, The code can
raise NameError because doc_ids_filter is only set inside the metadata_condition
branch; initialize it before that logic so it always exists. Specifically,
declare and set doc_ids_filter = None (or an empty set/list as intended) before
the metadata handling block that computes metadata_doc_ids and uses
metadata_condition so subsequent checks and the final return that reference
doc_ids_filter (and the metadata_doc_ids merging logic) are safe; update any
merging logic that assumes set semantics accordingly.

🧹 Nitpick comments (2)

web/src/hooks/use-document-request.ts (1)
211-223: Query key includes unused debouncedSearchString - potential stale cache or unnecessary refetches.

The queryKey at line 214 includes debouncedSearchString, but the actual documentFilter call at line 218 no longer uses it. This mismatch means:

Every keystroke in the search box triggers a new filter fetch (unnecessary network calls)

Cache entries are created per search string even though the response is identical

Consider removing debouncedSearchString from the query key if filter results are now independent of search input:
♻️ Suggested fix
   const { data } = useQuery({
     queryKey: [
       DocumentApiAction.FetchDocumentFilter,
-      debouncedSearchString,
       knowledgeId,
     ],
     queryFn: async () => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/hooks/use-document-request.ts` around lines 211 - 223, The queryKey
for useQuery includes debouncedSearchString but documentFilter(knowledgeId ||
id) doesn't use it, causing unnecessary refetches and cache entries; update the
queryKey in useQuery to only include DocumentApiAction.FetchDocumentFilter and
the knowledge identifier (knowledgeId || id) to match the actual dependency of
the queryFn (refer to the useQuery call and documentFilter invocation) so the
hook only refetches when the knowledge id changes.
web/src/services/knowledge-service.ts (1)
153-156: Remove unused methods.documentFilter entry to avoid type mismatch and confusion.

The methods.documentFilter configuration at line 153-156 is dead code. The actual documentFilter export at line 264 bypasses the methods object entirely and calls request.get directly. Additionally, methods.documentFilter.url is a function (api.getDatasetFilter), which violates the registerServer type requirement of url: string. Since kbService.documentFilter is never called anywhere in the codebase (only the exported documentFilter function is used), removing this entry prevents confusion and potential bugs.
♻️ Suggested cleanup
-  documentFilter: {
-    url: api.getDatasetFilter,
-    method: 'get',
-  },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/services/knowledge-service.ts` around lines 153 - 156, Remove the
dead methods.documentFilter entry from the methods object to avoid the type
mismatch and confusion: delete the block that sets methods.documentFilter with
url: api.getDatasetFilter and method: 'get', since the actual exported
documentFilter function (the standalone export that calls request.get) is used
instead; ensure no other code references methods.documentFilter and keep the
exported documentFilter (which calls request.get and uses api.getDatasetFilter)
intact so registerServer’s requirement of url: string is not violated by a
function-valued api.getDatasetFilter.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`:
- Around line 151-156: The test test_filter_missing_kb_id relies on a 405
routing side-effect when dataset_id is empty; update the documents filter
endpoint to explicitly validate the path parameter instead: in the request
handler used by document_filter (the route that handles GET
/api/v1/datasets/<dataset_id>/documents with type=filter) add a check that
dataset_id is non-empty and return the application-level error (code 101 and a
message referencing "KB ID") when missing, then adjust the test to expect code
101 and the KB ID message instead of 405; locate the validation logic inside the
controller/function that implements the documents filter route and add the guard
early before any routing or downstream logic runs.

---

Outside diff comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 658-667: The code can raise NameError because doc_ids_filter is
only set inside the metadata_condition branch; initialize it before that logic
so it always exists. Specifically, declare and set doc_ids_filter = None (or an
empty set/list as intended) before the metadata handling block that computes
metadata_doc_ids and uses metadata_condition so subsequent checks and the final
return that reference doc_ids_filter (and the metadata_doc_ids merging logic)
are safe; update any merging logic that assumes set semantics accordingly.

---

Nitpick comments:
In `@web/src/hooks/use-document-request.ts`:
- Around line 211-223: The queryKey for useQuery includes debouncedSearchString
but documentFilter(knowledgeId || id) doesn't use it, causing unnecessary
refetches and cache entries; update the queryKey in useQuery to only include
DocumentApiAction.FetchDocumentFilter and the knowledge identifier (knowledgeId
|| id) to match the actual dependency of the queryFn (refer to the useQuery call
and documentFilter invocation) so the hook only refetches when the knowledge id
changes.

In `@web/src/services/knowledge-service.ts`:
- Around line 153-156: Remove the dead methods.documentFilter entry from the
methods object to avoid the type mismatch and confusion: delete the block that
sets methods.documentFilter with url: api.getDatasetFilter and method: 'get',
since the actual exported documentFilter function (the standalone export that
calls request.get) is used instead; ensure no other code references
methods.documentFilter and keep the exported documentFilter (which calls
request.get and uses api.getDatasetFilter) intact so registerServer’s
requirement of url: string is not violated by a function-valued
api.getDatasetFilter.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 022365c0-de7f-4c04-a265-4aefb7030288

📥 Commits

Reviewing files that changed from the base of the PR and between 78c3583 and 179fea0.

📒 Files selected for processing (8)

api/apps/document_app.py
api/apps/restful_apis/document_api.py
api/db/services/document_service.py
test/testcases/test_web_api/test_common.py
test/testcases/test_web_api/test_document_app/test_document_metadata.py
web/src/hooks/use-document-request.ts
web/src/services/knowledge-service.ts
web/src/utils/api.ts

coderabbitai · 2026-04-20T08:49:47Z

+    @pytest.mark.p2
    def test_filter_missing_kb_id(self, WebApiAuth, add_document_func):
-        _, doc_id = add_document_func
-        res = document_filter(WebApiAuth, {"doc_ids": [doc_id]})
-        assert res["code"] == 101, res
-        assert "KB ID" in res["message"], res
+        kb_id, doc_id = add_document_func
+        res = document_filter(WebApiAuth, "", {"doc_ids": [doc_id]})
+        assert res["code"] == 100, res
+        assert "<MethodNotAllowed '405: Method Not Allowed'>" == res["message"], res


⚠️ Potential issue | 🟡 Minor

Test relies on 405 side-effect rather than explicit validation.

When dataset_id is empty, the URL becomes /api/v1/datasets//documents?type=filter, which results in a 405 Method Not Allowed because no route matches. This is a routing side-effect, not explicit parameter validation.

While this test still catches the error case, the behavior is fragile - it depends on the web framework's routing behavior rather than application-level validation. The original test expected code 101 with message containing "KB ID", which was explicit validation.

Consider whether the backend should add explicit validation for empty dataset_id in the path parameter, returning a more meaningful error message.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py` around lines 151 - 156, The test test_filter_missing_kb_id relies on a 405 routing side-effect when dataset_id is empty; update the documents filter endpoint to explicitly validate the path parameter instead: in the request handler used by document_filter (the route that handles GET /api/v1/datasets/<dataset_id>/documents with type=filter) add a check that dataset_id is non-empty and return the application-level error (code 101 and a message referencing "KB ID") when missing, then adjust the test to expect code 101 and the KB ID message instead of 405; locate the validation logic inside the controller/function that implements the documents filter route and add the guard early before any routing or downstream logic runs.

codecov · 2026-04-20T08:57:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.66%. Comparing base (af2ed41) to head (3a5cd39).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #14230   +/-   ##
=======================================
  Coverage   96.66%   96.66%           
=======================================
  Files          10       10           
  Lines         690      690           
  Branches      108      108           
=======================================
  Hits          667      667           
  Misses          8        8           
  Partials       15       15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

xugangqiang added 11 commits April 17, 2026 10:38

merge list_docs api

375edba

fix

1cf35bd

Merge remote-tracking branch 'upstream/main' into merge-doc-list

93454f3

fix

b91b9df

trigger CI

d0bdcb0

trigger CI

f251468

trigger ci

ca02eaa

fix keyword filtering

d52bdba

Add id/name support for list docs

57dfaa7

merge filter

e5fc37c

Merge remote-tracking branch 'upstream/main' into migrate-doc-get-filter

179fea0

xugangqiang added the ci Continue Integration label Apr 20, 2026

xugangqiang marked this pull request as ready for review April 20, 2026 08:43

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 20, 2026

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

fix

3a5cd39

xugangqiang force-pushed the migrate-doc-get-filter branch from a6ca87f to 3a5cd39 Compare April 20, 2026 09:27

xugangqiang self-assigned this Apr 21, 2026

xugangqiang requested review from JinHai-CN, yingfeng and yuzhichang April 21, 2026 05:52

yingfeng merged commit 5ae296a into infiniflow:main Apr 21, 2026
2 checks passed

This was referenced Apr 21, 2026

Refactor: Consolidation WEB API & HTTP API for document get_filter #14248

Merged

Refactor: migrate document change status API #14300

Closed

Refa: unify document create flows under REST documents API #14345

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Consolidation WEB API & HTTP API for document get_filter#14230

Refactor: Consolidation WEB API & HTTP API for document get_filter#14230
yingfeng merged 12 commits intoinfiniflow:mainfrom
xugangqiang:migrate-doc-get-filter

xugangqiang commented Apr 20, 2026 •

edited by JinHai-CN

Loading

Uh oh!

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Uh oh!

codecov Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xugangqiang commented Apr 20, 2026 • edited by JinHai-CN Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Type of change

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xugangqiang commented Apr 20, 2026 •

edited by JinHai-CN

Loading

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

codecov Bot commented Apr 20, 2026 •

edited

Loading