Refactor: Consolidation WEB API & HTTP API for document get_filter#14230
Refactor: Consolidation WEB API & HTTP API for document get_filter#14230yingfeng merged 12 commits intoinfiniflow:mainfrom
Conversation
📝 WalkthroughWalkthroughRemoved the standalone Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant FE as Frontend
participant API as REST API (document_api)
participant Service as DocumentService
participant DB as Database
FE->>API: GET /datasets/{kb_id}/documents?type=filter
API->>Service: get_by_kb_id(kb_id, doc_ids, suffix, name, return_empty_metadata)
Service->>DB: SELECT ... WHERE kb_id=? AND id IN (...) AND other filters
DB-->>Service: documents with meta_fields
Service-->>API: documents list
API->>API: _aggregate_filters(docs) => {suffix, run_status, metadata counts, empty_metadata}
API-->>FE: 200 {"total": n, "filter": {...}}
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/apps/restful_apis/document_api.py (1)
658-667:⚠️ Potential issue | 🔴 CriticalPotential
NameError:doc_ids_filtermay be undefined.At line 660,
doc_ids_filteris referenced, but it's only defined inside theif metadata_condition:block at line 635. Ifmetadata_conditionis falsy butmetadatais truthy,doc_ids_filterwill be undefined when checked at line 660.🐛 Proposed fix
+ doc_ids_filter = None if metadata_condition: doc_ids_filter = set(meta_filter(metas, convert_conditions(metadata_condition), metadata_condition.get("logic", "and"))) if metadata_condition.get("conditions") and not doc_ids_filter: return RetCode.SUCCESS, "", [], return_empty_metadata if metadata:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 658 - 667, The code can raise NameError because doc_ids_filter is only set inside the metadata_condition branch; initialize it before that logic so it always exists. Specifically, declare and set doc_ids_filter = None (or an empty set/list as intended) before the metadata handling block that computes metadata_doc_ids and uses metadata_condition so subsequent checks and the final return that reference doc_ids_filter (and the metadata_doc_ids merging logic) are safe; update any merging logic that assumes set semantics accordingly.
🧹 Nitpick comments (2)
web/src/hooks/use-document-request.ts (1)
211-223: Query key includes unuseddebouncedSearchString- potential stale cache or unnecessary refetches.The
queryKeyat line 214 includesdebouncedSearchString, but the actualdocumentFiltercall at line 218 no longer uses it. This mismatch means:
- Every keystroke in the search box triggers a new filter fetch (unnecessary network calls)
- Cache entries are created per search string even though the response is identical
Consider removing
debouncedSearchStringfrom the query key if filter results are now independent of search input:♻️ Suggested fix
const { data } = useQuery({ queryKey: [ DocumentApiAction.FetchDocumentFilter, - debouncedSearchString, knowledgeId, ], queryFn: async () => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/src/hooks/use-document-request.ts` around lines 211 - 223, The queryKey for useQuery includes debouncedSearchString but documentFilter(knowledgeId || id) doesn't use it, causing unnecessary refetches and cache entries; update the queryKey in useQuery to only include DocumentApiAction.FetchDocumentFilter and the knowledge identifier (knowledgeId || id) to match the actual dependency of the queryFn (refer to the useQuery call and documentFilter invocation) so the hook only refetches when the knowledge id changes.web/src/services/knowledge-service.ts (1)
153-156: Remove unusedmethods.documentFilterentry to avoid type mismatch and confusion.The
methods.documentFilterconfiguration at line 153-156 is dead code. The actualdocumentFilterexport at line 264 bypasses themethodsobject entirely and callsrequest.getdirectly. Additionally,methods.documentFilter.urlis a function (api.getDatasetFilter), which violates theregisterServertype requirement ofurl: string. SincekbService.documentFilteris never called anywhere in the codebase (only the exporteddocumentFilterfunction is used), removing this entry prevents confusion and potential bugs.♻️ Suggested cleanup
- documentFilter: { - url: api.getDatasetFilter, - method: 'get', - },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/src/services/knowledge-service.ts` around lines 153 - 156, Remove the dead methods.documentFilter entry from the methods object to avoid the type mismatch and confusion: delete the block that sets methods.documentFilter with url: api.getDatasetFilter and method: 'get', since the actual exported documentFilter function (the standalone export that calls request.get) is used instead; ensure no other code references methods.documentFilter and keep the exported documentFilter (which calls request.get and uses api.getDatasetFilter) intact so registerServer’s requirement of url: string is not violated by a function-valued api.getDatasetFilter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`:
- Around line 151-156: The test test_filter_missing_kb_id relies on a 405
routing side-effect when dataset_id is empty; update the documents filter
endpoint to explicitly validate the path parameter instead: in the request
handler used by document_filter (the route that handles GET
/api/v1/datasets/<dataset_id>/documents with type=filter) add a check that
dataset_id is non-empty and return the application-level error (code 101 and a
message referencing "KB ID") when missing, then adjust the test to expect code
101 and the KB ID message instead of 405; locate the validation logic inside the
controller/function that implements the documents filter route and add the guard
early before any routing or downstream logic runs.
---
Outside diff comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 658-667: The code can raise NameError because doc_ids_filter is
only set inside the metadata_condition branch; initialize it before that logic
so it always exists. Specifically, declare and set doc_ids_filter = None (or an
empty set/list as intended) before the metadata handling block that computes
metadata_doc_ids and uses metadata_condition so subsequent checks and the final
return that reference doc_ids_filter (and the metadata_doc_ids merging logic)
are safe; update any merging logic that assumes set semantics accordingly.
---
Nitpick comments:
In `@web/src/hooks/use-document-request.ts`:
- Around line 211-223: The queryKey for useQuery includes debouncedSearchString
but documentFilter(knowledgeId || id) doesn't use it, causing unnecessary
refetches and cache entries; update the queryKey in useQuery to only include
DocumentApiAction.FetchDocumentFilter and the knowledge identifier (knowledgeId
|| id) to match the actual dependency of the queryFn (refer to the useQuery call
and documentFilter invocation) so the hook only refetches when the knowledge id
changes.
In `@web/src/services/knowledge-service.ts`:
- Around line 153-156: Remove the dead methods.documentFilter entry from the
methods object to avoid the type mismatch and confusion: delete the block that
sets methods.documentFilter with url: api.getDatasetFilter and method: 'get',
since the actual exported documentFilter function (the standalone export that
calls request.get) is used instead; ensure no other code references
methods.documentFilter and keep the exported documentFilter (which calls
request.get and uses api.getDatasetFilter) intact so registerServer’s
requirement of url: string is not violated by a function-valued
api.getDatasetFilter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 022365c0-de7f-4c04-a265-4aefb7030288
📒 Files selected for processing (8)
api/apps/document_app.pyapi/apps/restful_apis/document_api.pyapi/db/services/document_service.pytest/testcases/test_web_api/test_common.pytest/testcases/test_web_api/test_document_app/test_document_metadata.pyweb/src/hooks/use-document-request.tsweb/src/services/knowledge-service.tsweb/src/utils/api.ts
| @pytest.mark.p2 | ||
| def test_filter_missing_kb_id(self, WebApiAuth, add_document_func): | ||
| _, doc_id = add_document_func | ||
| res = document_filter(WebApiAuth, {"doc_ids": [doc_id]}) | ||
| assert res["code"] == 101, res | ||
| assert "KB ID" in res["message"], res | ||
| kb_id, doc_id = add_document_func | ||
| res = document_filter(WebApiAuth, "", {"doc_ids": [doc_id]}) | ||
| assert res["code"] == 100, res | ||
| assert "<MethodNotAllowed '405: Method Not Allowed'>" == res["message"], res |
There was a problem hiding this comment.
Test relies on 405 side-effect rather than explicit validation.
When dataset_id is empty, the URL becomes /api/v1/datasets//documents?type=filter, which results in a 405 Method Not Allowed because no route matches. This is a routing side-effect, not explicit parameter validation.
While this test still catches the error case, the behavior is fragile - it depends on the web framework's routing behavior rather than application-level validation. The original test expected code 101 with message containing "KB ID", which was explicit validation.
Consider whether the backend should add explicit validation for empty dataset_id in the path parameter, returning a more meaningful error message.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`
around lines 151 - 156, The test test_filter_missing_kb_id relies on a 405
routing side-effect when dataset_id is empty; update the documents filter
endpoint to explicitly validate the path parameter instead: in the request
handler used by document_filter (the route that handles GET
/api/v1/datasets/<dataset_id>/documents with type=filter) add a check that
dataset_id is non-empty and return the application-level error (code 101 and a
message referencing "KB ID") when missing, then adjust the test to expect code
101 and the KB ID message instead of 405; locate the validation logic inside the
controller/function that implements the documents filter route and add the guard
early before any routing or downstream logic runs.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #14230 +/- ##
=======================================
Coverage 96.66% 96.66%
=======================================
Files 10 10
Lines 690 690
Branches 108 108
=======================================
Hits 667 667
Misses 8 8
Partials 15 15 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a6ca87f to
3a5cd39
Compare
What problem does this PR solve?
Before consolidation
Web API: POST /v1/document/filter
Http API - GET /api/v1/datasets/<dataset_id>/documents
After consolidation, Restful API -- GET /api/v1/datasets/<dataset_id>/documents?type=filter
Type of change