Skip to content

Refactor: Consolidation WEB API & HTTP API for document get_filter#14230

Merged
yingfeng merged 12 commits intoinfiniflow:mainfrom
xugangqiang:migrate-doc-get-filter
Apr 21, 2026
Merged

Refactor: Consolidation WEB API & HTTP API for document get_filter#14230
yingfeng merged 12 commits intoinfiniflow:mainfrom
xugangqiang:migrate-doc-get-filter

Conversation

@xugangqiang
Copy link
Copy Markdown
Contributor

@xugangqiang xugangqiang commented Apr 20, 2026

What problem does this PR solve?

Before consolidation
Web API: POST /v1/document/filter
Http API - GET /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- GET /api/v1/datasets/<dataset_id>/documents?type=filter

Type of change

  • Refactoring

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

📝 Walkthrough

Walkthrough

Removed the standalone POST /filter endpoint and folded dataset-level filtering into GET /datasets/{id}/documents?type=filter. Updated backend service signature to use unified doc_ids, adjusted API aggregation logic, and updated frontend calls and tests to use the new GET-based filter flow. (50 words)

Changes

Cohort / File(s) Summary
Backend: removed standalone filter route
api/apps/document_app.py
Deleted the /filter POST handler and its JSON parsing, auth/validation, and service call; removed now-unused imports.
Backend: listing & aggregation
api/apps/restful_apis/document_api.py
list_docs handles type=filter returning {"total", "filter": _aggregate_filters(docs)}; adjusted doc-id ownership/filter logic and switched call to DocumentService.get_by_kb_id(..., doc_ids=...); added _aggregate_filters.
Backend: service signature
api/db/services/document_service.py
Changed get_by_kb_id signature: removed doc_id/doc_ids_filter, added doc_ids parameter and applied id IN (...) filtering.
Tests: adapt to GET filter flow
test/testcases/test_web_api/test_common.py, test/testcases/test_web_api/test_document_app/test_document_metadata.py
Test helpers and call sites switched from POST /filter to GET /datasets/{id}/documents?type=filter; signatures, request formation, expectations updated; some filter unit tests removed.
Frontend: API and hooks
web/src/utils/api.ts, web/src/services/knowledge-service.ts, web/src/hooks/use-document-request.ts
getDatasetFilter became a function returning /datasets/${id}/documents?type=filter; documentFilter changed from POST to GET (empty params); hook no longer forwards explicit keywords payload.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant FE as Frontend
    participant API as REST API (document_api)
    participant Service as DocumentService
    participant DB as Database

    FE->>API: GET /datasets/{kb_id}/documents?type=filter
    API->>Service: get_by_kb_id(kb_id, doc_ids, suffix, name, return_empty_metadata)
    Service->>DB: SELECT ... WHERE kb_id=? AND id IN (...) AND other filters
    DB-->>Service: documents with meta_fields
    Service-->>API: documents list
    API->>API: _aggregate_filters(docs) => {suffix, run_status, metadata counts, empty_metadata}
    API-->>FE: 200 {"total": n, "filter": {...}}
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

☯️ refactor, 🧪 test, 🐖api

Suggested reviewers

  • yuzhichang
  • JinHai-CN
  • yingfeng

Poem

🐰 I hopped from POST to GET today,

Filters folded neat along the way,
Docs and metas counted with delight,
Frontend, API, DB — all aligned right,
A carrot-coded change — quick, clean, and bright! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main refactoring objective: consolidating the document filter endpoints from two separate APIs into one unified endpoint.
Description check ✅ Passed The PR description addresses the problem being solved (consolidation of two endpoints) and specifies the type of change as Refactoring, meeting the template requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@xugangqiang xugangqiang added the ci Continue Integration label Apr 20, 2026
@xugangqiang xugangqiang marked this pull request as ready for review April 20, 2026 08:43
@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 20, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/apps/restful_apis/document_api.py (1)

658-667: ⚠️ Potential issue | 🔴 Critical

Potential NameError: doc_ids_filter may be undefined.

At line 660, doc_ids_filter is referenced, but it's only defined inside the if metadata_condition: block at line 635. If metadata_condition is falsy but metadata is truthy, doc_ids_filter will be undefined when checked at line 660.

🐛 Proposed fix
+    doc_ids_filter = None
     if metadata_condition:
         doc_ids_filter = set(meta_filter(metas, convert_conditions(metadata_condition), metadata_condition.get("logic", "and")))
         if metadata_condition.get("conditions") and not doc_ids_filter:
             return RetCode.SUCCESS, "", [], return_empty_metadata

     if metadata:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 658 - 667, The code can
raise NameError because doc_ids_filter is only set inside the metadata_condition
branch; initialize it before that logic so it always exists. Specifically,
declare and set doc_ids_filter = None (or an empty set/list as intended) before
the metadata handling block that computes metadata_doc_ids and uses
metadata_condition so subsequent checks and the final return that reference
doc_ids_filter (and the metadata_doc_ids merging logic) are safe; update any
merging logic that assumes set semantics accordingly.
🧹 Nitpick comments (2)
web/src/hooks/use-document-request.ts (1)

211-223: Query key includes unused debouncedSearchString - potential stale cache or unnecessary refetches.

The queryKey at line 214 includes debouncedSearchString, but the actual documentFilter call at line 218 no longer uses it. This mismatch means:

  1. Every keystroke in the search box triggers a new filter fetch (unnecessary network calls)
  2. Cache entries are created per search string even though the response is identical

Consider removing debouncedSearchString from the query key if filter results are now independent of search input:

♻️ Suggested fix
   const { data } = useQuery({
     queryKey: [
       DocumentApiAction.FetchDocumentFilter,
-      debouncedSearchString,
       knowledgeId,
     ],
     queryFn: async () => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/hooks/use-document-request.ts` around lines 211 - 223, The queryKey
for useQuery includes debouncedSearchString but documentFilter(knowledgeId ||
id) doesn't use it, causing unnecessary refetches and cache entries; update the
queryKey in useQuery to only include DocumentApiAction.FetchDocumentFilter and
the knowledge identifier (knowledgeId || id) to match the actual dependency of
the queryFn (refer to the useQuery call and documentFilter invocation) so the
hook only refetches when the knowledge id changes.
web/src/services/knowledge-service.ts (1)

153-156: Remove unused methods.documentFilter entry to avoid type mismatch and confusion.

The methods.documentFilter configuration at line 153-156 is dead code. The actual documentFilter export at line 264 bypasses the methods object entirely and calls request.get directly. Additionally, methods.documentFilter.url is a function (api.getDatasetFilter), which violates the registerServer type requirement of url: string. Since kbService.documentFilter is never called anywhere in the codebase (only the exported documentFilter function is used), removing this entry prevents confusion and potential bugs.

♻️ Suggested cleanup
-  documentFilter: {
-    url: api.getDatasetFilter,
-    method: 'get',
-  },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/services/knowledge-service.ts` around lines 153 - 156, Remove the
dead methods.documentFilter entry from the methods object to avoid the type
mismatch and confusion: delete the block that sets methods.documentFilter with
url: api.getDatasetFilter and method: 'get', since the actual exported
documentFilter function (the standalone export that calls request.get) is used
instead; ensure no other code references methods.documentFilter and keep the
exported documentFilter (which calls request.get and uses api.getDatasetFilter)
intact so registerServer’s requirement of url: string is not violated by a
function-valued api.getDatasetFilter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`:
- Around line 151-156: The test test_filter_missing_kb_id relies on a 405
routing side-effect when dataset_id is empty; update the documents filter
endpoint to explicitly validate the path parameter instead: in the request
handler used by document_filter (the route that handles GET
/api/v1/datasets/<dataset_id>/documents with type=filter) add a check that
dataset_id is non-empty and return the application-level error (code 101 and a
message referencing "KB ID") when missing, then adjust the test to expect code
101 and the KB ID message instead of 405; locate the validation logic inside the
controller/function that implements the documents filter route and add the guard
early before any routing or downstream logic runs.

---

Outside diff comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 658-667: The code can raise NameError because doc_ids_filter is
only set inside the metadata_condition branch; initialize it before that logic
so it always exists. Specifically, declare and set doc_ids_filter = None (or an
empty set/list as intended) before the metadata handling block that computes
metadata_doc_ids and uses metadata_condition so subsequent checks and the final
return that reference doc_ids_filter (and the metadata_doc_ids merging logic)
are safe; update any merging logic that assumes set semantics accordingly.

---

Nitpick comments:
In `@web/src/hooks/use-document-request.ts`:
- Around line 211-223: The queryKey for useQuery includes debouncedSearchString
but documentFilter(knowledgeId || id) doesn't use it, causing unnecessary
refetches and cache entries; update the queryKey in useQuery to only include
DocumentApiAction.FetchDocumentFilter and the knowledge identifier (knowledgeId
|| id) to match the actual dependency of the queryFn (refer to the useQuery call
and documentFilter invocation) so the hook only refetches when the knowledge id
changes.

In `@web/src/services/knowledge-service.ts`:
- Around line 153-156: Remove the dead methods.documentFilter entry from the
methods object to avoid the type mismatch and confusion: delete the block that
sets methods.documentFilter with url: api.getDatasetFilter and method: 'get',
since the actual exported documentFilter function (the standalone export that
calls request.get) is used instead; ensure no other code references
methods.documentFilter and keep the exported documentFilter (which calls
request.get and uses api.getDatasetFilter) intact so registerServer’s
requirement of url: string is not violated by a function-valued
api.getDatasetFilter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 022365c0-de7f-4c04-a265-4aefb7030288

📥 Commits

Reviewing files that changed from the base of the PR and between 78c3583 and 179fea0.

📒 Files selected for processing (8)
  • api/apps/document_app.py
  • api/apps/restful_apis/document_api.py
  • api/db/services/document_service.py
  • test/testcases/test_web_api/test_common.py
  • test/testcases/test_web_api/test_document_app/test_document_metadata.py
  • web/src/hooks/use-document-request.ts
  • web/src/services/knowledge-service.ts
  • web/src/utils/api.ts

Comment on lines +151 to +156
@pytest.mark.p2
def test_filter_missing_kb_id(self, WebApiAuth, add_document_func):
_, doc_id = add_document_func
res = document_filter(WebApiAuth, {"doc_ids": [doc_id]})
assert res["code"] == 101, res
assert "KB ID" in res["message"], res
kb_id, doc_id = add_document_func
res = document_filter(WebApiAuth, "", {"doc_ids": [doc_id]})
assert res["code"] == 100, res
assert "<MethodNotAllowed '405: Method Not Allowed'>" == res["message"], res
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test relies on 405 side-effect rather than explicit validation.

When dataset_id is empty, the URL becomes /api/v1/datasets//documents?type=filter, which results in a 405 Method Not Allowed because no route matches. This is a routing side-effect, not explicit parameter validation.

While this test still catches the error case, the behavior is fragile - it depends on the web framework's routing behavior rather than application-level validation. The original test expected code 101 with message containing "KB ID", which was explicit validation.

Consider whether the backend should add explicit validation for empty dataset_id in the path parameter, returning a more meaningful error message.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`
around lines 151 - 156, The test test_filter_missing_kb_id relies on a 405
routing side-effect when dataset_id is empty; update the documents filter
endpoint to explicitly validate the path parameter instead: in the request
handler used by document_filter (the route that handles GET
/api/v1/datasets/<dataset_id>/documents with type=filter) add a check that
dataset_id is non-empty and return the application-level error (code 101 and a
message referencing "KB ID") when missing, then adjust the test to expect code
101 and the KB ID message instead of 405; locate the validation logic inside the
controller/function that implements the documents filter route and add the guard
early before any routing or downstream logic runs.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.66%. Comparing base (af2ed41) to head (3a5cd39).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #14230   +/-   ##
=======================================
  Coverage   96.66%   96.66%           
=======================================
  Files          10       10           
  Lines         690      690           
  Branches      108      108           
=======================================
  Hits          667      667           
  Misses          8        8           
  Partials       15       15           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xugangqiang xugangqiang force-pushed the migrate-doc-get-filter branch from a6ca87f to 3a5cd39 Compare April 20, 2026 09:27
@xugangqiang xugangqiang self-assigned this Apr 21, 2026
@yingfeng yingfeng merged commit 5ae296a into infiniflow:main Apr 21, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continue Integration size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants