Refactor: Consolidation WEB API & HTTP API for document delete api by xugangqiang · Pull Request #14254 · infiniflow/ragflow

xugangqiang · 2026-04-21T07:35:28Z

What problem does this PR solve?

Before consolidation
Web API: POST /v1/document/rm
Http API - DELETE /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- DELETE /api/v1/datasets/<dataset_id>/documents

Type of change

Refactoring

coderabbitai · 2026-04-21T07:35:36Z

📝 Walkthrough

Walkthrough

Consolidates document deletion by removing legacy POST /rm and adding a dataset-scoped DELETE /datasets/<dataset_id>/documents endpoint; updates SDK, frontend API, hooks, and tests to use the new request shape and async deletion flow.

Changes

Cohort / File(s)	Summary
Backend — REST endpoint added / legacy removed `api/apps/document_app.py`, `api/apps/restful_apis/document_api.py`, `api/apps/sdk/doc.py`	Removed legacy `POST /rm` handler; added `DELETE /datasets/<dataset_id>/documents` REST handler with JSON `DeleteDocumentReq` parsing, dataset accessibility check, ids/delete_all validation, duplicate-id check, and async deletion via `FileService.delete_docs`/`thread_pool_exec`. Removed duplicate SDK route.
Validation models `api/utils/validation_utils.py`	Added `DeleteDocumentReq(DeleteReq)` request model for document deletion requests.
Frontend — API surface / service / hooks `web/src/utils/api.ts`, `web/src/services/knowledge-service.ts`, `web/src/hooks/use-document-request.ts`	Removed `documentRm` mapping; changed `documentDelete` to function taking `datasetId`; added `deleteDocument(datasetId, ids)` service function; updated hook `useRemoveDocument` to call new API and normalize `ids`.
Backend tests (HTTP / SDK / unit) updates `test/testcases/test_http_api/test_file_management_within_dataset/test_delete_documents.py`, `test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py`, `test/testcases/test_sdk_api/test_file_management_within_dataset/test_delete_documents.py`	Adjusted expected HTTP auth codes/messages; replaced `doc_id`-based tests with `ids`/`delete_all` validation and field-level errors; removed SDK route unit test covering old handler.
Web tests / helpers / fixtures `test/testcases/test_web_api/test_chunk_app/test_create_chunk.py`, `test/testcases/test_web_api/test_chunk_app/test_update_chunk.py`, `test/testcases/test_web_api/test_common.py`, `test/testcases/test_web_api/test_document_app/conftest.py`, `test/testcases/test_web_api/test_document_app/test_rm_documents.py`	Updated `delete_document` helper signature to accept `dataset_id` and send `DELETE /datasets/<dataset_id>/documents` with `{"ids": [...]}` or `{"delete_all": true}`; updated fixtures and tests to use new signature and adjusted assertions.
Frontend usage `web/src/services/...`, `web/src/hooks/...`	Rewired frontend calls to use the new dataset-scoped delete API and adjusted error messaging in one hook.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant REST as REST API (document_api.delete_documents)
  participant KB as KnowledgebaseService
  participant Doc as DocumentService
  participant File as FileService / ThreadPool

  Client->>REST: DELETE /datasets/{dataset_id}/documents\njson: { ids } or { delete_all: true }
  REST->>KB: accessible(kb_id, user_id)
  KB-->>REST: accessible / not accessible
  alt accessible
    REST->>Doc: if delete_all -> query(kb_id) else validate ids
    Doc-->>REST: doc IDs (or validation errors)
    REST->>Doc: check_duplicate_ids(doc_ids)
    REST->>File: thread_pool_exec(FileService.delete_docs, doc_ids, tenant_id)
    File-->>REST: success / errors
    REST-->>Client: { deleted: <count> } or error
  else not accessible
    REST-->>Client: auth error (401)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Refactor: Standardize naming convention to camelCase #14079: Frontend API name/mapping changes for document deletion that overlap modifications in web/src/utils/api.ts and service/hooks.

Suggested reviewers

JinHai-CN
yingfeng
yuzhichang

Poem

🐰 I hopped from /rm to /datasets wide,
ids in a list, no more single-slide,
Threads hum in clusters to tidy each file,
Deletions now tidy, efficient, and agile. 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main objective: consolidating two document deletion endpoints into a single RESTful API.
Description check	✅ Passed	The description addresses the problem and specifies the refactoring type, but lacks detail on implementation scope and breaking changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

api/apps/restful_apis/document_api.py (1)

666-748: Add audit logging for the destructive delete flow.

This new endpoint should log the validated delete intent and outcome, without logging full document ID lists. That makes bulk deletes and delete_all traceable in production. As per coding guidelines, **/*.py: Add logging for new flows

🪵 Proposed logging additions

         if delete_all:
             doc_ids = [doc.id for doc in DocumentService.query(kb_id=dataset_id)]
 
+        logging.info(
+            "Deleting documents from dataset %s: delete_all=%s, count=%s",
+            dataset_id,
+            delete_all,
+            len(doc_ids),
+        )
+
         # make sure each id is unique
         unique_doc_ids, duplicate_messages = check_duplicate_ids(doc_ids, "document")
         if duplicate_messages:
             logging.warning(f"duplicate_messages:{duplicate_messages}")
         else:
@@
         if errors:
+            logging.warning("Document deletion failed for dataset %s: %s", dataset_id, errors)
             return get_error_data_result(message=str(errors))
 
+        logging.info("Deleted %s documents from dataset %s", len(doc_ids), dataset_id)
         return get_result(data={"deleted": len(doc_ids)})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 666 - 748, Add audit
logging to the delete_documents flow to record the validated delete intent and
outcome without printing full document id lists: after the access check and
after preparing doc_ids (and after check_duplicate_ids) emit an audit log (using
logging.info or logging.warning) that includes dataset_id, tenant_id, whether
delete_all is true, and the number of documents to delete (len(doc_ids)) and
note if duplicates were detected (duplicate_messages present) — do this before
calling FileService.delete_docs; then log the deletion outcome after await
thread_pool_exec(FileService.delete_docs, ...) including success/failure, count
deleted (len(doc_ids) on success) and any error summary (do not include full
ids). Reference functions/classes: delete_documents,
KnowledgebaseService.accessible, check_duplicate_ids, DocumentService.query,
FileService.delete_docs.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 721-740: Reject duplicate IDs and verify ownership before calling
FileService.delete_docs: if check_duplicate_ids(doc_ids, "document") returns any
duplicates, return an error instead of only logging; then preflight all doc_ids
by fetching their records (e.g., via DocumentService.query or a
DocumentService.get_by_ids helper) and ensure each returned document belongs to
the requested dataset_id (and tenant_id if applicable); if any requested id is
missing or belongs to a different dataset/tenant, return an error listing
offending ids; only after duplicates are absent and all ids are verified to
belong to dataset_id call FileService.delete_docs(doc_ids, tenant_id).

In `@web/src/hooks/use-document-request.ts`:
- Around line 319-330: The mutation in useRemoveDocument uses datasetId from
useParams() directly (datasetId!) which can be undefined for routes that supply
the dataset via the knowledgeId || id pattern used elsewhere; change the dataset
resolution to mirror the list/filter flow (derive datasetId using the same logic
as the other hooks — e.g., resolve const datasetId = knowledgeId || id or read
from search params the same way the document list/filter does) and pass that
resolved datasetId into deleteDocument(documentIds) so deleteDocument is never
called with undefined; update the useRemoveDocument function where mutationFn
calls deleteDocument to use the resolved id.

---

Nitpick comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 666-748: Add audit logging to the delete_documents flow to record
the validated delete intent and outcome without printing full document id lists:
after the access check and after preparing doc_ids (and after
check_duplicate_ids) emit an audit log (using logging.info or logging.warning)
that includes dataset_id, tenant_id, whether delete_all is true, and the number
of documents to delete (len(doc_ids)) and note if duplicates were detected
(duplicate_messages present) — do this before calling FileService.delete_docs;
then log the deletion outcome after await
thread_pool_exec(FileService.delete_docs, ...) including success/failure, count
deleted (len(doc_ids) on success) and any error summary (do not include full
ids). Reference functions/classes: delete_documents,
KnowledgebaseService.accessible, check_duplicate_ids, DocumentService.query,
FileService.delete_docs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 88753fc8-625d-4da1-9c5e-c2f4f798b726

📥 Commits

Reviewing files that changed from the base of the PR and between 05c39b9 and 3dc56d6.

📒 Files selected for processing (15)

api/apps/document_app.py
api/apps/restful_apis/document_api.py
api/apps/sdk/doc.py
api/utils/validation_utils.py
test/testcases/test_http_api/test_file_management_within_dataset/test_delete_documents.py
test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py
test/testcases/test_sdk_api/test_file_management_within_dataset/test_delete_documents.py
test/testcases/test_web_api/test_chunk_app/test_create_chunk.py
test/testcases/test_web_api/test_chunk_app/test_update_chunk.py
test/testcases/test_web_api/test_common.py
test/testcases/test_web_api/test_document_app/conftest.py
test/testcases/test_web_api/test_document_app/test_rm_documents.py
web/src/hooks/use-document-request.ts
web/src/services/knowledge-service.ts
web/src/utils/api.ts

💤 Files with no reviewable changes (2)

test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py
api/apps/document_app.py

coderabbitai · 2026-04-21T07:42:24Z

+        # Get documents to delete
+        doc_ids = req.get("ids") or []
+        delete_all = req.get("delete_all", False)
+        if not delete_all and len(doc_ids) == 0:
+            return get_error_data_result(message=f"should either provide doc ids or set delete_all(true), dataset: {dataset_id}. ")
+
+        if len(doc_ids) > 0 and delete_all:
+            return get_error_data_result(message=f"should not provide both doc ids and delete_all(true), dataset: {dataset_id}. ")
+        if delete_all:
+            doc_ids = [doc.id for doc in DocumentService.query(kb_id=dataset_id)]
+
+        # make sure each id is unique
+        unique_doc_ids, duplicate_messages = check_duplicate_ids(doc_ids, "document")
+        if duplicate_messages:
+            logging.warning(f"duplicate_messages:{duplicate_messages}")
+        else:
+            doc_ids = unique_doc_ids
+
+        # Delete documents using existing FileService.delete_docs
+        errors = await thread_pool_exec(FileService.delete_docs, doc_ids, tenant_id)


⚠️ Potential issue | 🔴 Critical

Preflight document IDs before deleting.

This route checks access to dataset_id, but then passes arbitrary doc_ids to FileService.delete_docs, which deletes by document ID and resolves the document tenant internally. That allows a request scoped to one dataset to delete a document from another dataset if its ID is supplied. Also, duplicate IDs are only logged, so duplicate requests can partially delete and then fail.

Reject duplicates and verify every requested ID belongs to dataset_id before calling FileService.delete_docs.

🛡️ Proposed fix

if delete_all: doc_ids = [doc.id for doc in DocumentService.query(kb_id=dataset_id)] # make sure each id is unique unique_doc_ids, duplicate_messages = check_duplicate_ids(doc_ids, "document") if duplicate_messages: - logging.warning(f"duplicate_messages:{duplicate_messages}") - else: - doc_ids = unique_doc_ids + logging.warning(f"duplicate_messages:{duplicate_messages}") + return get_error_data_result(message="; ".join(duplicate_messages), code=RetCode.ARGUMENT_ERROR) + doc_ids = unique_doc_ids + + dataset_doc_ids = set(KnowledgebaseService.list_documents_by_ids([dataset_id])) + missing_doc_ids = [doc_id for doc_id in doc_ids if doc_id not in dataset_doc_ids] + if missing_doc_ids: + return get_error_data_result(message=f"Document not found: {missing_doc_ids[0]}") # Delete documents using existing FileService.delete_docs errors = await thread_pool_exec(FileService.delete_docs, doc_ids, tenant_id)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 721 - 740, Reject duplicate IDs and verify ownership before calling FileService.delete_docs: if check_duplicate_ids(doc_ids, "document") returns any duplicates, return an error instead of only logging; then preflight all doc_ids by fetching their records (e.g., via DocumentService.query or a DocumentService.get_by_ids helper) and ensure each returned document belongs to the requested dataset_id (and tenant_id if applicable); if any requested id is missing or belongs to a different dataset/tenant, return an error listing offending ids; only after duplicates are absent and all ids are verified to belong to dataset_id call FileService.delete_docs(doc_ids, tenant_id).

coderabbitai · 2026-04-21T07:42:24Z

 export const useRemoveDocument = () => {
  const queryClient = useQueryClient();
+  const { id: datasetId } = useParams();
  const {
    data,
    isPending: loading,
    mutateAsync,
  } = useMutation({
    mutationKey: [DocumentApiAction.RemoveDocument],
    mutationFn: async (documentIds: string | string[]) => {
-      const { data } = await kbService.documentRm({ doc_id: documentIds });
+      const ids = Array.isArray(documentIds) ? documentIds : [documentIds];
+      const { data } = await deleteDocument(datasetId!, ids);


⚠️ Potential issue | 🟠 Major

Resolve the dataset id the same way as document list/filter.

Line 330 uses datasetId! from useParams(), but this file’s list/filter flows use knowledgeId || id. On routes where the dataset id comes from search params, delete calls /datasets/undefined/documents.

🐛 Proposed fix

export const useRemoveDocument = () => { const queryClient = useQueryClient(); - const { id: datasetId } = useParams(); + const { knowledgeId } = useGetKnowledgeSearchParams(); + const { id } = useParams(); const { data, isPending: loading, mutateAsync, } = useMutation({ mutationKey: [DocumentApiAction.RemoveDocument], mutationFn: async (documentIds: string | string[]) => { const ids = Array.isArray(documentIds) ? documentIds : [documentIds]; - const { data } = await deleteDocument(datasetId!, ids); + const datasetId = knowledgeId || id; + if (!datasetId) { + throw new Error('Dataset ID is required'); + } + const { data } = await deleteDocument(datasetId, ids);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export const useRemoveDocument = () => {

const queryClient = useQueryClient();

const { id: datasetId } = useParams();

const {

data,

isPending: loading,

mutateAsync,

} = useMutation({

mutationKey: [DocumentApiAction.RemoveDocument],

mutationFn: async (documentIds: string | string[]) => {

const { data } = await kbService.documentRm({ doc_id: documentIds });

const ids = Array.isArray(documentIds) ? documentIds : [documentIds];

const { data } = await deleteDocument(datasetId!, ids);

export const useRemoveDocument = () => {

const queryClient = useQueryClient();

const { knowledgeId } = useGetKnowledgeSearchParams();

const { id } = useParams();

const {

data,

isPending: loading,

mutateAsync,

} = useMutation({

mutationKey: [DocumentApiAction.RemoveDocument],

mutationFn: async (documentIds: string | string[]) => {

const ids = Array.isArray(documentIds) ? documentIds : [documentIds];

const datasetId = knowledgeId || id;

if (!datasetId) {

throw new Error('Dataset ID is required');

}

const { data } = await deleteDocument(datasetId, ids);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@web/src/hooks/use-document-request.ts` around lines 319 - 330, The mutation in useRemoveDocument uses datasetId from useParams() directly (datasetId!) which can be undefined for routes that supply the dataset via the knowledgeId || id pattern used elsewhere; change the dataset resolution to mirror the list/filter flow (derive datasetId using the same logic as the other hooks — e.g., resolve const datasetId = knowledgeId || id or read from search params the same way the document list/filter does) and pass that resolved datasetId into deleteDocument(documentIds) so deleteDocument is never called with undefined; update the useRemoveDocument function where mutationFn calls deleteDocument to use the resolved id.

codecov · 2026-04-21T07:55:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.66%. Comparing base (009e538) to head (47380c9).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #14254   +/-   ##
=======================================
  Coverage   96.66%   96.66%           
=======================================
  Files          10       10           
  Lines         690      690           
  Branches      108      108           
=======================================
  Hits          667      667           
  Misses          8        8           
  Partials       15       15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

api/apps/restful_apis/document_api.py (1)
738-743: ⚠️ Potential issue | 🔴 Critical

Deduplication logic is inverted — duplicates fall through unchanged.

When check_duplicate_ids finds duplicates (duplicate_messages non-empty), the code only logs and leaves doc_ids as the original list with duplicates; the else branch is the only place doc_ids = unique_doc_ids happens, which is the case where there were no duplicates to strip anyway. Result: duplicate IDs are passed to FileService.delete_docs, and the second attempt at the same ID will raise "Document not found!" (see api/db/services/file_service.py L582-625), turning a duplicate-input request into a partial-delete + error response.

This also overlaps with the previously raised ownership/preflight issue, where the suggested fix was to reject duplicates outright rather than silently deduplicate.
🛠️ Minimal fix (always use the deduplicated list)
-        # make sure each id is unique
-        unique_doc_ids, duplicate_messages = check_duplicate_ids(doc_ids, "document")
-        if duplicate_messages:
-            logging.warning(f"duplicate_messages:{duplicate_messages}")
-        else:
-            doc_ids = unique_doc_ids
+        # make sure each id is unique
+        unique_doc_ids, duplicate_messages = check_duplicate_ids(doc_ids, "document")
+        if duplicate_messages:
+            logging.warning(f"duplicate_messages:{duplicate_messages}")
+        doc_ids = unique_doc_ids
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 738 - 743, The
deduplication branch is inverted: when check_duplicate_ids returns
duplicate_messages the code logs but keeps the original doc_ids; change this to
reject duplicate input instead of falling through. In the handler where
check_duplicate_ids(doc_ids, "document") is called, if duplicate_messages is
non-empty return a 400/BadRequest indicating duplicate IDs (include
duplicate_messages) and do not call FileService.delete_docs; otherwise set
doc_ids = unique_doc_ids and proceed. Ensure you reference check_duplicate_ids,
duplicate_messages, unique_doc_ids, doc_ids and FileService.delete_docs when
making the change.

🧹 Nitpick comments (1)

api/apps/restful_apis/document_api.py (1)
735-736: delete_all materializes all documents in the dataset in memory.

For large datasets, [doc.id for doc in DocumentService.query(kb_id=dataset_id)] loads every document row just to extract IDs, then FileService.delete_docs processes them serially. Consider batching (page through IDs) or adding a dedicated DocumentService.list_ids_by_kb projection to reduce memory/DB load, and/or cap the number of docs deletable per request.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 735 - 736, The current
delete_all path materializes every Document row via
DocumentService.query(kb_id=dataset_id) which can OOM for large datasets; change
this to stream or page IDs and delete in batches: add a new projection/utility
like DocumentService.list_ids_by_kb(kb_id, page_size) that yields only document
IDs (or modify DocumentService.query to support id-only projection), then call
FileService.delete_docs in fixed-size batches (and enforce an optional
max_delete cap per request). Update the delete_all branch to iterate the
paginated ID generator, collect a batch of IDs, call
FileService.delete_docs(batch), and repeat until exhausted.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 746-751: FileService.delete_docs currently swallows partial
successes by concatenating exception messages and returning a non-empty errors
string so the API never reports how many docs were actually deleted; change
FileService.delete_docs to return structured results (e.g., (deleted_ids,
errors) or (deleted_count, errors_list)) and update the callsite in
document_api.py (where thread_pool_exec(FileService.delete_docs, doc_ids,
tenant_id) is invoked) to unpack that tuple, return get_result with the deleted
count when any docs were removed, and when returning get_error_data_result
include both the failed error details and the attempted/failed counts so clients
can reconcile partial failures instead of only receiving a generic error.

---

Duplicate comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 738-743: The deduplication branch is inverted: when
check_duplicate_ids returns duplicate_messages the code logs but keeps the
original doc_ids; change this to reject duplicate input instead of falling
through. In the handler where check_duplicate_ids(doc_ids, "document") is
called, if duplicate_messages is non-empty return a 400/BadRequest indicating
duplicate IDs (include duplicate_messages) and do not call
FileService.delete_docs; otherwise set doc_ids = unique_doc_ids and proceed.
Ensure you reference check_duplicate_ids, duplicate_messages, unique_doc_ids,
doc_ids and FileService.delete_docs when making the change.

---

Nitpick comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 735-736: The current delete_all path materializes every Document
row via DocumentService.query(kb_id=dataset_id) which can OOM for large
datasets; change this to stream or page IDs and delete in batches: add a new
projection/utility like DocumentService.list_ids_by_kb(kb_id, page_size) that
yields only document IDs (or modify DocumentService.query to support id-only
projection), then call FileService.delete_docs in fixed-size batches (and
enforce an optional max_delete cap per request). Update the delete_all branch to
iterate the paginated ID generator, collect a batch of IDs, call
FileService.delete_docs(batch), and repeat until exhausted.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c83f74ee-2646-4c20-99f9-104abbbb0c4b

📥 Commits

Reviewing files that changed from the base of the PR and between 3dc56d6 and 47380c9.

📒 Files selected for processing (6)

api/apps/document_app.py
api/apps/restful_apis/document_api.py
test/testcases/test_web_api/test_common.py
web/src/hooks/use-document-request.ts
web/src/services/knowledge-service.ts
web/src/utils/api.ts

💤 Files with no reviewable changes (1)

api/apps/document_app.py

🚧 Files skipped from review as they are similar to previous changes (4)

web/src/services/knowledge-service.ts
web/src/utils/api.ts
test/testcases/test_web_api/test_common.py
web/src/hooks/use-document-request.ts

coderabbitai · 2026-04-21T11:25:07Z

+        errors = await thread_pool_exec(FileService.delete_docs, doc_ids, tenant_id)
+
+        if errors:
+            return get_error_data_result(message=str(errors))
+
+        return get_result(data={"deleted": len(doc_ids)})


⚠️ Potential issue | 🟡 Minor

Partial-failure reporting loses successful deletions.

FileService.delete_docs iterates IDs and concatenates exception messages into a single string without stopping, so on partial failure it has already deleted some docs. Here any non-empty errors causes a generic error response and len(doc_ids) is never returned, so the caller cannot tell how many documents were actually removed. Consider (a) having delete_docs return (deleted_ids, errors) or (b) at minimum, including the attempted/failed counts in the error payload so clients can reconcile state.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 746 - 751, FileService.delete_docs currently swallows partial successes by concatenating exception messages and returning a non-empty errors string so the API never reports how many docs were actually deleted; change FileService.delete_docs to return structured results (e.g., (deleted_ids, errors) or (deleted_count, errors_list)) and update the callsite in document_api.py (where thread_pool_exec(FileService.delete_docs, doc_ids, tenant_id) is invoked) to unpack that tuple, return get_result with the deleted count when any docs were removed, and when returning get_error_data_result include both the failed error details and the attempted/failed counts so clients can reconcile partial failures instead of only receiving a generic error.

consolidate doc delete api

3dc56d6

xugangqiang added the ci Continue Integration label Apr 21, 2026

xugangqiang marked this pull request as ready for review April 21, 2026 07:35

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. ☯️ refactor Pull request that refactor/refine code 🐖api The modified files are located under directory 'api/apps/sdk' 🧪 test Pull requests that update test cases. labels Apr 21, 2026

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

xugangqiang added 2 commits April 21, 2026 16:19

retrigger-ci

a3a9eb6

Merge branch 'main' into merge-doc-delete

47380c9

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

xugangqiang self-assigned this Apr 22, 2026

xugangqiang requested review from JinHai-CN, yingfeng and yuzhichang April 22, 2026 02:13

JinHai-CN merged commit 3d8a82c into infiniflow:main Apr 22, 2026
2 of 3 checks passed

coderabbitai Bot mentioned this pull request Apr 23, 2026

Refactor: migrate document change status API #14300

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Consolidation WEB API & HTTP API for document delete api#14254

Refactor: Consolidation WEB API & HTTP API for document delete api#14254
JinHai-CN merged 3 commits intoinfiniflow:mainfrom
xugangqiang:merge-doc-delete

xugangqiang commented Apr 21, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

codecov Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xugangqiang commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Type of change

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xugangqiang commented Apr 21, 2026 •

edited

Loading

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

codecov Bot commented Apr 21, 2026 •

edited

Loading