Skip to content

Refactor: Doc batch change status#14337

Open
xugangqiang wants to merge 9 commits intoinfiniflow:mainfrom
xugangqiang:doc-batch-change-status
Open

Refactor: Doc batch change status#14337
xugangqiang wants to merge 9 commits intoinfiniflow:mainfrom
xugangqiang:doc-batch-change-status

Conversation

@xugangqiang
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Before migration
Web API: POST /v1/document/change_status

After consolidation, Restful API
POST /api/v1/datasets/<dataset_id>/documents/batch-update-status

Type of change

  • Refactoring

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

This PR migrates document status change functionality from a document-app-scoped endpoint to a dataset-scoped REST API endpoint, reorganizing backend routing and updating frontend integration layers to pass datasetId consistently throughout.

Changes

Cohort / File(s) Summary
Backend API Refactoring
api/apps/document_app.py, api/apps/restful_apis/document_api.py
Deleted the generic POST /change_status endpoint from document_app and replaced it with a new authenticated POST /datasets/<dataset_id>/documents/batch-update-status endpoint in restful_apis that enforces dataset ownership, validates document existence, and performs per-document status updates with document-store propagation.
Frontend Service Layer
web/src/utils/api.ts, web/src/services/knowledge-service.ts
Updated documentChangeStatus API mapping from a static endpoint to a parameterized dataset-scoped function; removed the method from generic registered services and added explicit changeDocumentsStatus export that targets the new endpoint with kb_id in the request body.
Frontend Hooks & Components
web/src/hooks/use-document-request.ts, web/src/pages/dataset/dataset/dataset-table.tsx, web/src/pages/dataset/dataset/use-bulk-operate-dataset.tsx, web/src/pages/dataset/dataset/use-dataset-table-columns.tsx
Extended document status-change hooks and components to accept and propagate datasetId from knowledge-base context; updated all status-update calls to include datasetId alongside status and documentId.
Test Updates
test/testcases/test_web_api/test_common.py, test/testcases/test_web_api/test_document_app/test_document_metadata.py
Updated test helper document_change_status to require dataset_id parameter and target the new endpoint URL; replaced three unit-style tests with E2E batch status-change tests that validate real document updates and persistence.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

🧪 test, 🐖 api, ☯️ refactor, 🌈 python

Suggested reviewers

  • JinHai-CN
  • yuzhichang
  • yingfeng

Poem

🐰 A dataset's scope, now crystal clear,
Status updates channeled here!
From doc_app's ancient ways,
To REST's elegant cascading phase—
The refactored path leads all astray... to better days! 📚✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Refactor: Doc batch change status' accurately describes the main change—migrating the document status endpoint to a dataset-scoped RESTful path.
Description check ✅ Passed The description covers the key before/after API paths and correctly marks this as a refactoring, though it lacks detailed context about the broader changes across backend and frontend.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@xugangqiang xugangqiang added the ci Continue Integration label Apr 23, 2026
@xugangqiang xugangqiang marked this pull request as ready for review April 23, 2026 14:44
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. python Pull requests that update python code labels Apr 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
web/src/hooks/use-document-request.ts (1)

251-273: ⚠️ Potential issue | 🟠 Major

Guard dataset IDs and handle partial status results.

Callers pass optional knowledgeBase?.id through context (via useKnowledgeBaseContext()), but the mutation accepts datasetId: string without a runtime check. This allows undefined values to reach changeDocumentsStatus, creating requests to /datasets/undefined/.... Additionally, cache invalidation and error feedback only occur when code === 0; partial success (some documents updated, others failed) leaves rows stale with no error shown.

🛡️ Proposed mutation hardening
     }: {
       status: boolean;
       documentId: string | string[];
-      datasetId: string;
+      datasetId?: string;
     }) => {
       const ids = Array.isArray(documentId) ? documentId : [documentId];
+      if (!datasetId) {
+        message.error('Dataset ID is required');
+        return { code: 500, message: 'Dataset ID is required' };
+      }
+      if (ids.length === 0 || ids.some((id) => !id)) {
+        message.error('Document ID is required');
+        return { code: 500, message: 'Document ID is required' };
+      }
+
       const { data } = await changeDocumentsStatus({
         kb_id: datasetId,
         doc_ids: ids,
         status: Number(status),
       });
 
-      if (data.code === 0) {
-        message.success(i18n.t('message.modified'));
+      const hasUpdatedDocuments = Object.values(
+        (data.data ?? {}) as Record<string, { status?: string }>,
+      ).some((item) => item.status !== undefined);
+
+      if (data.code === 0 || hasUpdatedDocuments) {
         queryClient.invalidateQueries({
           queryKey: [DocumentApiAction.FetchDocumentList],
         });
       }
+      if (data.code === 0) {
+        message.success(i18n.t('message.modified'));
+      } else {
+        message.error(data.message || 'Failed to update document status');
+      }
       return data;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/hooks/use-document-request.ts` around lines 251 - 273, The mutationFn
in use-document-request.ts passes datasetId straight into changeDocumentsStatus
and only treats data.code === 0 as success; add a runtime guard to validate
datasetId (from callers like useKnowledgeBaseContext) and the computed ids
array, returning or throwing a clear error (and calling message.error) if
missing so you never call changeDocumentsStatus with undefined; after the API
call, inspect the response for partial results (e.g., fields like code,
updated_count, failed_count or per-doc statuses) and: show message.success only
for full success, show message.error or message.warn when any failures occurred,
and always call
queryClient.invalidateQueries([DocumentApiAction.FetchDocumentList]) (or
selectively refresh affected items) so UI rows don’t remain stale; reference
mutationFn, changeDocumentsStatus, queryClient.invalidateQueries, and
message.success/message.error when making these changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 1107-1132: The code updates Document.status via
DocumentService.update_by_id before updating the external doc store
(settings.docStoreConn.update), which can leave DB state inconsistent if the doc
store update fails; modify the flow so that for chunked documents (getattr(doc,
"chunk_num", 0) > 0) you perform the settings.docStoreConn.update first and only
call DocumentService.update_by_id({"status": str(status)}) after the doc store
update returns truthy, or if you must update DB first then catch failures from
settings.docStoreConn.update and call DocumentService.update_by_id(doc_id,
{"status": previous_status}) to roll back the change (use the current stored
status from `doc`), and ensure error handling sets result[doc_id] and has_error
accordingly; reference functions/objects: DocumentService.update_by_id,
settings.docStoreConn.update, search.index_name(kb.tenant_id), and doc.kb_id.
- Around line 1072-1078: The handler reads req = await get_request_json() and
sets doc_ids = req.get("doc_ids", []) but doesn't validate it; update the logic
around doc_ids (before iterating) to ensure it's either a list of non-empty
string IDs or a single non-empty string (accept and normalize by wrapping into a
list), reject other types and empty lists/IDs by returning
get_error_argument_result with a clear message; keep the existing status
validation and use the same error helper (get_error_argument_result) for
consistency.
- Around line 1134-1136: The except block that sets result[doc_id] and has_error
swallows the traceback; add structured logging of the unexpected per-document
failure by calling the module-level logger (e.g., logger.exception or
logger.error with traceback) inside the except for the function that processes
documents (reference variables doc_id, result, has_error) so the error message
includes the doc_id and full exception/traceback; ensure a logger is
defined/imported in this module if missing.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`:
- Around line 654-669: The test currently asserts the initial status is "1" then
updates to "1", which allows a no-op to pass; change the test to prove an actual
transition by first asserting the current status via document_infos(WebApiAuth,
dataset_id, {"ids": [doc_id]}) is the opposite value (e.g., "0" for
unprocessed), then call document_change_status(WebApiAuth, dataset_id,
{"doc_ids": [doc_id], "status": "1"}) to flip it, and finally assert res["code"]
== 0, res["data"][doc_id]["status"] == "1" and confirm the database reflects the
new status with a subsequent document_infos call; use the existing symbols
document_infos, document_change_status, WebApiAuth, dataset_id, and doc_id to
locate and update the assertions.
- Around line 584-623: The test
TestDocumentBatchChangeStatus.test_change_status_partial_failure_matrix
currently only uses valid ids and asserts len(doc_ids)==3 before the
try/finally, so it doesn't exercise partial failures and can leak documents on
upload failure; update the setup to allow fewer-than-3 uploaded ids (remove or
move the strict assert), ensure all cleanup runs regardless of setup by moving
delete_document into the finally and building the list of to-be-deleted ids from
the actual bulk_upload_documents return value, and extend the test to submit a
mixed batch to document_change_status containing some invalid/nonexistent ids
(e.g., append a fake id) to verify partial-failure handling and that response
data contains success/failure per id.

---

Outside diff comments:
In `@web/src/hooks/use-document-request.ts`:
- Around line 251-273: The mutationFn in use-document-request.ts passes
datasetId straight into changeDocumentsStatus and only treats data.code === 0 as
success; add a runtime guard to validate datasetId (from callers like
useKnowledgeBaseContext) and the computed ids array, returning or throwing a
clear error (and calling message.error) if missing so you never call
changeDocumentsStatus with undefined; after the API call, inspect the response
for partial results (e.g., fields like code, updated_count, failed_count or
per-doc statuses) and: show message.success only for full success, show
message.error or message.warn when any failures occurred, and always call
queryClient.invalidateQueries([DocumentApiAction.FetchDocumentList]) (or
selectively refresh affected items) so UI rows don’t remain stale; reference
mutationFn, changeDocumentsStatus, queryClient.invalidateQueries, and
message.success/message.error when making these changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5f848f7d-a05f-4f2e-8c33-d7d0ac6397e7

📥 Commits

Reviewing files that changed from the base of the PR and between d84438f and 9098ca3.

📒 Files selected for processing (12)
  • api/apps/document_app.py
  • api/apps/restful_apis/document_api.py
  • test/testcases/test_web_api/test_common.py
  • test/testcases/test_web_api/test_document_app/test_document_metadata.py
  • web/src/hooks/use-document-request.ts
  • web/src/pages/dataset/dataset/dataset-table.tsx
  • web/src/pages/dataset/dataset/use-bulk-operate-dataset.tsx
  • web/src/pages/dataset/dataset/use-dataset-table-columns.tsx
  • web/src/pages/user-setting/data-source/data-source-detail-page/index.tsx
  • web/src/pages/user-setting/data-source/hooks.ts
  • web/src/services/knowledge-service.ts
  • web/src/utils/api.ts
💤 Files with no reviewable changes (1)
  • api/apps/document_app.py

Comment on lines +1072 to +1078
req = await get_request_json()
doc_ids = req.get("doc_ids", [])
status = str(req.get("status", -1))

if status not in ["0", "1"]:
return get_error_argument_result(message=f'"Status" must be either 0 or 1:{status}!')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate doc_ids before iterating.

doc_ids defaults to [] and is not checked for list/non-empty string IDs. A malformed JSON body like "doc_ids": "abc" would iterate characters, while a missing field returns success with no work.

🛡️ Proposed validation
     req = await get_request_json()
-    doc_ids = req.get("doc_ids", [])
+    if not isinstance(req, dict):
+        return get_error_argument_result(message="Request body must be a JSON object.")
+
+    doc_ids = req.get("doc_ids")
     status = str(req.get("status", -1))
 
+    if not isinstance(doc_ids, list) or not doc_ids:
+        return get_error_argument_result(message='"doc_ids" must be a non-empty list.')
+    if any(not isinstance(doc_id, str) or not doc_id for doc_id in doc_ids):
+        return get_error_argument_result(message='"doc_ids" must contain non-empty document IDs.')
+
     if status not in ["0", "1"]:
         return get_error_argument_result(message=f'"Status" must be either 0 or 1:{status}!')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 1072 - 1078, The handler
reads req = await get_request_json() and sets doc_ids = req.get("doc_ids", [])
but doesn't validate it; update the logic around doc_ids (before iterating) to
ensure it's either a list of non-empty string IDs or a single non-empty string
(accept and normalize by wrapping into a list), reject other types and empty
lists/IDs by returning get_error_argument_result with a clear message; keep the
existing status validation and use the same error helper
(get_error_argument_result) for consistency.

Comment on lines +1107 to +1132
if not DocumentService.update_by_id(doc_id, {"status": str(status)}):
result[doc_id] = {"error": "Database error (Document update)!"}
has_error = True
continue

status_int = int(status)
if getattr(doc, "chunk_num", 0) > 0:
try:
ok = settings.docStoreConn.update(
{"doc_id": doc_id},
{"available_int": status_int},
search.index_name(kb.tenant_id),
doc.kb_id,
)
except Exception as exc:
msg = str(exc)
if "3022" in msg:
result[doc_id] = {"error": "Document store table missing."}
else:
result[doc_id] = {"error": f"Document store update failed: {msg}"}
has_error = True
continue
if not ok:
result[doc_id] = {"error": "Database error (docStore update)!"}
has_error = True
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid committing DB status before doc store update succeeds.

For chunked docs, Line 1107 updates the DB first. If docStoreConn.update fails afterward, the API returns partial failure but leaves Document.status changed while search availability remains stale.

🔁 Proposed rollback guard
-            if not DocumentService.update_by_id(doc_id, {"status": str(status)}):
+            db_updated = False
+            if not DocumentService.update_by_id(doc_id, {"status": str(status)}):
                 result[doc_id] = {"error": "Database error (Document update)!"}
                 has_error = True
                 continue
+            db_updated = True
 
             status_int = int(status)
             if getattr(doc, "chunk_num", 0) > 0:
                 try:
                     ok = settings.docStoreConn.update(
@@
                 except Exception as exc:
+                    if db_updated:
+                        DocumentService.update_by_id(doc_id, {"status": current_status})
                     msg = str(exc)
                     if "3022" in msg:
                         result[doc_id] = {"error": "Document store table missing."}
                     else:
                         result[doc_id] = {"error": f"Document store update failed: {msg}"}
@@
                 if not ok:
+                    if db_updated:
+                        DocumentService.update_by_id(doc_id, {"status": current_status})
                     result[doc_id] = {"error": "Database error (docStore update)!"}
                     has_error = True
                     continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 1107 - 1132, The code
updates Document.status via DocumentService.update_by_id before updating the
external doc store (settings.docStoreConn.update), which can leave DB state
inconsistent if the doc store update fails; modify the flow so that for chunked
documents (getattr(doc, "chunk_num", 0) > 0) you perform the
settings.docStoreConn.update first and only call
DocumentService.update_by_id({"status": str(status)}) after the doc store update
returns truthy, or if you must update DB first then catch failures from
settings.docStoreConn.update and call DocumentService.update_by_id(doc_id,
{"status": previous_status}) to roll back the change (use the current stored
status from `doc`), and ensure error handling sets result[doc_id] and has_error
accordingly; reference functions/objects: DocumentService.update_by_id,
settings.docStoreConn.update, search.index_name(kb.tenant_id), and doc.kb_id.

Comment on lines +1134 to +1136
except Exception as e:
result[doc_id] = {"error": f"Internal server error: {str(e)}"}
has_error = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log unexpected per-document failures.

The broad handler returns an internal-error result but drops the traceback, making this new endpoint hard to diagnose in production.

🪵 Proposed logging
         except Exception as e:
+            logging.exception("Failed to batch update document status for document %s", doc_id)
             result[doc_id] = {"error": f"Internal server error: {str(e)}"}
             has_error = True

As per coding guidelines, **/*.py: Add logging for new flows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 1134 - 1136, The except
block that sets result[doc_id] and has_error swallows the traceback; add
structured logging of the unexpected per-document failure by calling the
module-level logger (e.g., logger.exception or logger.error with traceback)
inside the except for the function that processes documents (reference variables
doc_id, result, has_error) so the error message includes the doc_id and full
exception/traceback; ensure a logger is defined/imported in this module if
missing.

Comment on lines +584 to +623
class TestDocumentBatchChangeStatus:
@pytest.mark.p2
def test_change_status_partial_failure_matrix(self, WebApiAuth, add_dataset, ragflow_tmp_dir):
"""
E2E test for partial failure matrix in batch document status change.

This test creates multiple documents and verifies that the batch status change
operation handles various failure scenarios correctly.
"""

dataset_id = add_dataset

# Create multiple documents for testing
doc_ids = bulk_upload_documents(WebApiAuth, dataset_id, 3, ragflow_tmp_dir)
assert len(doc_ids) == 3, f"Expected 3 documents, got {len(doc_ids)}"

try:
# Test batch status change with all valid documents
# This should succeed since all documents are valid
res = document_change_status(WebApiAuth, dataset_id, {"doc_ids": doc_ids, "status": "1"})

# Verify the response structure
assert res["code"] == 0, f"Expected success code 0, got {res}"
assert res["data"] is not None, "Response data should not be None"

# Verify each document status was updated
for doc_id in doc_ids:
assert doc_id in res["data"], f"Document {doc_id} should be in response"
assert res["data"][doc_id]["status"] == "1", f"Document {doc_id} status should be 1"

# Verify the status was actually updated in the database
info_res = document_infos(WebApiAuth, dataset_id, {"ids": doc_ids})
assert info_res["code"] == 0, info_res

for doc in info_res["data"]["docs"]:
assert doc["status"] == "1", f"Document {doc['id']} status should be 1 in database"

finally:
# Cleanup: delete all documents
delete_document(WebApiAuth, dataset_id, {"ids": doc_ids})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Exercise actual partial failures and make cleanup cover setup failures.

This test only submits valid doc_ids, so it does not cover the partial-failure matrix described by the name/docstring. Also, assert len(doc_ids) == 3 runs before the try/finally, so a short upload result can leave created documents behind.

Suggested cleanup shape
-        # Create multiple documents for testing
-        doc_ids = bulk_upload_documents(WebApiAuth, dataset_id, 3, ragflow_tmp_dir)
-        assert len(doc_ids) == 3, f"Expected 3 documents, got {len(doc_ids)}"
-
+        doc_ids = []
         try:
+            # Create multiple documents for testing
+            doc_ids = bulk_upload_documents(WebApiAuth, dataset_id, 3, ragflow_tmp_dir)
+            assert len(doc_ids) == 3, f"Expected 3 documents, got {len(doc_ids)}"
+
+            # Include at least one missing/unauthorized/cross-dataset document id here
+            # and assert the per-item success/failure contract.
             # Test batch status change with all valid documents
             # This should succeed since all documents are valid
             res = document_change_status(WebApiAuth, dataset_id, {"doc_ids": doc_ids, "status": "1"})
@@
         finally:
             # Cleanup: delete all documents
-            delete_document(WebApiAuth, dataset_id, {"ids": doc_ids})
+            if doc_ids:
+                delete_document(WebApiAuth, dataset_id, {"ids": doc_ids})

Based on learnings, Applies to tests/**/*.py : Add/adjust tests for behavior changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`
around lines 584 - 623, The test
TestDocumentBatchChangeStatus.test_change_status_partial_failure_matrix
currently only uses valid ids and asserts len(doc_ids)==3 before the
try/finally, so it doesn't exercise partial failures and can leak documents on
upload failure; update the setup to allow fewer-than-3 uploaded ids (remove or
move the strict assert), ensure all cleanup runs regardless of setup by moving
delete_document into the finally and building the list of to-be-deleted ids from
the actual bulk_upload_documents return value, and extend the test to submit a
mixed batch to document_change_status containing some invalid/nonexistent ids
(e.g., append a fake id) to verify partial-failure handling and that response
data contains success/failure per id.

Comment on lines +654 to +669
# Verify initial status is "0" (unprocessed)
info_res = document_infos(WebApiAuth, dataset_id, {"ids": [doc_id]})
assert info_res["code"] == 0, info_res
assert info_res["data"]["docs"][0]["status"] == "1", "Initial status should be 1"

# Update status to "1" (processed)
res = document_change_status(WebApiAuth, dataset_id, {"doc_ids": [doc_id], "status": "1"})

# Verify success
assert res["code"] == 0, f"Expected success code 0, got {res}"
assert res["data"][doc_id]["status"] == "1", "Document status should be 1"

# Verify the status was actually updated in the database
info_res = document_infos(WebApiAuth, dataset_id, {"ids": [doc_id]})
assert info_res["code"] == 0, info_res
assert info_res["data"]["docs"][0]["status"] == "1", "Document status should be 1 in database"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the success case prove a status transition.

The test asserts the initial status is already "1" and then updates to "1" again, so a no-op implementation can pass. Flip to the opposite valid status before asserting the response and database state.

Suggested fix
-        # Verify initial status is "0" (unprocessed)
+        # Verify initial status and choose the opposite valid target status.
         info_res = document_infos(WebApiAuth, dataset_id, {"ids": [doc_id]})
         assert info_res["code"] == 0, info_res
-        assert info_res["data"]["docs"][0]["status"] == "1", "Initial status should be 1"
+        initial_status = info_res["data"]["docs"][0]["status"]
+        assert initial_status in {"0", "1"}, f"Unexpected initial status: {initial_status}"
+        target_status = "0" if initial_status == "1" else "1"
 
-        # Update status to "1" (processed)
-        res = document_change_status(WebApiAuth, dataset_id, {"doc_ids": [doc_id], "status": "1"})
+        # Update status to the opposite value.
+        res = document_change_status(WebApiAuth, dataset_id, {"doc_ids": [doc_id], "status": target_status})
 
         # Verify success
         assert res["code"] == 0, f"Expected success code 0, got {res}"
-        assert res["data"][doc_id]["status"] == "1", "Document status should be 1"
+        assert res["data"][doc_id]["status"] == target_status, f"Document status should be {target_status}"
 
         # Verify the status was actually updated in the database
         info_res = document_infos(WebApiAuth, dataset_id, {"ids": [doc_id]})
         assert info_res["code"] == 0, info_res
-        assert info_res["data"]["docs"][0]["status"] == "1", "Document status should be 1 in database"
+        assert info_res["data"]["docs"][0]["status"] == target_status, f"Document status should be {target_status} in database"

Based on learnings, Applies to tests/**/*.py : Add/adjust tests for behavior changes.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Verify initial status is "0" (unprocessed)
info_res = document_infos(WebApiAuth, dataset_id, {"ids": [doc_id]})
assert info_res["code"] == 0, info_res
assert info_res["data"]["docs"][0]["status"] == "1", "Initial status should be 1"
# Update status to "1" (processed)
res = document_change_status(WebApiAuth, dataset_id, {"doc_ids": [doc_id], "status": "1"})
# Verify success
assert res["code"] == 0, f"Expected success code 0, got {res}"
assert res["data"][doc_id]["status"] == "1", "Document status should be 1"
# Verify the status was actually updated in the database
info_res = document_infos(WebApiAuth, dataset_id, {"ids": [doc_id]})
assert info_res["code"] == 0, info_res
assert info_res["data"]["docs"][0]["status"] == "1", "Document status should be 1 in database"
# Verify initial status and choose the opposite valid target status.
info_res = document_infos(WebApiAuth, dataset_id, {"ids": [doc_id]})
assert info_res["code"] == 0, info_res
initial_status = info_res["data"]["docs"][0]["status"]
assert initial_status in {"0", "1"}, f"Unexpected initial status: {initial_status}"
target_status = "0" if initial_status == "1" else "1"
# Update status to the opposite value.
res = document_change_status(WebApiAuth, dataset_id, {"doc_ids": [doc_id], "status": target_status})
# Verify success
assert res["code"] == 0, f"Expected success code 0, got {res}"
assert res["data"][doc_id]["status"] == target_status, f"Document status should be {target_status}"
# Verify the status was actually updated in the database
info_res = document_infos(WebApiAuth, dataset_id, {"ids": [doc_id]})
assert info_res["code"] == 0, info_res
assert info_res["data"]["docs"][0]["status"] == target_status, f"Document status should be {target_status} in database"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`
around lines 654 - 669, The test currently asserts the initial status is "1"
then updates to "1", which allows a no-op to pass; change the test to prove an
actual transition by first asserting the current status via
document_infos(WebApiAuth, dataset_id, {"ids": [doc_id]}) is the opposite value
(e.g., "0" for unprocessed), then call document_change_status(WebApiAuth,
dataset_id, {"doc_ids": [doc_id], "status": "1"}) to flip it, and finally assert
res["code"] == 0, res["data"][doc_id]["status"] == "1" and confirm the database
reflects the new status with a subsequent document_infos call; use the existing
symbols document_infos, document_change_status, WebApiAuth, dataset_id, and
doc_id to locate and update the assertions.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.97%. Comparing base (d84438f) to head (e9ffa1d).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14337      +/-   ##
==========================================
- Coverage   97.41%   95.97%   -1.44%     
==========================================
  Files          10       10              
  Lines         695      695              
  Branches      111      111              
==========================================
- Hits          677      667      -10     
- Misses          7       11       +4     
- Partials       11       17       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xugangqiang xugangqiang force-pushed the doc-batch-change-status branch from 9098ca3 to e9ffa1d Compare April 24, 2026 01:43
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (5)
api/apps/restful_apis/document_api.py (3)

1132-1134: ⚠️ Potential issue | 🟡 Minor

Duplicate of prior feedback: log per-document exceptions.

The broad except still swallows the traceback. As per coding guidelines (**/*.py: Add logging for new flows), add logging.exception(...) here for production diagnosability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 1132 - 1134, In the
except block that sets result[doc_id] and has_error, add a
logging.exception(...) call to record the full traceback and context for this
per-document failure (include doc_id in the log message) instead of only storing
the error string; this ensures the exception handler around the loop (where
result, doc_id, and has_error are used) logs diagnostic details for production
while preserving the existing result[doc_id] assignment and has_error flag.

1070-1075: ⚠️ Potential issue | 🟠 Major

Duplicate of prior feedback: validate doc_ids.

doc_ids is still not validated; a missing field silently returns success with result = {}, and a string body would iterate characters. See the prior comment on this block.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 1070 - 1075, Validate the
doc_ids from get_request_json(): ensure req contains "doc_ids", that doc_ids is
a list (not a string) and not empty, and that each element is an expected ID
type (e.g., int or str) before proceeding; if validation fails, return
get_error_argument_result with a clear message; update the block that currently
reads doc_ids = req.get("doc_ids", []) to perform these checks and reject
invalid inputs to avoid iterating over a string or silently succeeding.

1105-1130: ⚠️ Potential issue | 🟠 Major

Duplicate of prior feedback: DB commit before doc store update can desync state.

The DB status is still updated before docStoreConn.update, leaving Document.status changed if the doc store call raises or returns falsy. Per the earlier review, either update the doc store first or roll back on failure. Also note: since status is already coerced to str on line 1072, the inner str(status) on line 1105 is redundant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 1105 - 1130, The code
updates the DB via DocumentService.update_by_id before calling
settings.docStoreConn.update which can leave Document.status inconsistent if the
doc store update fails; change the flow to perform the doc store update
(settings.docStoreConn.update using doc_id, available_int/status_int,
search.index_name(kb.tenant_id), doc.kb_id) first and only call
DocumentService.update_by_id to persist the status after confirming the doc
store update succeeded, or if you prefer to keep the DB-first approach implement
an atomic rollback that resets Document.status on doc store failure; also remove
the redundant str(status) in the DocumentService.update_by_id call (status is
already a string earlier) to avoid unnecessary coercion.
test/testcases/test_web_api/test_document_app/test_document_metadata.py (2)

584-623: ⚠️ Potential issue | 🟡 Minor

Duplicate of prior feedback: this test does not actually cover partial failures, and cleanup can leak docs.

The test name promises a partial-failure matrix but submits only valid doc_ids; additionally, assert len(doc_ids) == 3 runs before the try/finally, so a short upload result will leak created docs. See the prior review comment on this block for the recommended restructure (move the assert inside the try and include a mix of invalid/cross-dataset ids).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`
around lines 584 - 623, The test
TestDocumentBatchChangeStatus.test_change_status_partial_failure_matrix
currently only submits valid doc_ids and asserts length before the try/finally
which can leak docs; change it to (1) move the assert len(doc_ids) == 3 inside
the try block after creation so cleanup always runs even when upload fails, (2)
construct a mixed doc_ids payload that includes valid IDs from
bulk_upload_documents(WebApiAuth, dataset_id, 3, ragflow_tmp_dir), plus at least
one invalid id and one id from another dataset to exercise partial failures, (3)
update assertions against document_change_status(WebApiAuth, dataset_id,
{"doc_ids": doc_ids, "status": "1"}) to expect partial success/failure entries
and verify database state via document_infos(WebApiAuth, dataset_id, {"ids":
valid_doc_ids}) only for the valid IDs, and (4) make delete_document(WebApiAuth,
dataset_id, {"ids": created_doc_ids}) robust by deleting only the actual created
IDs (track created_doc_ids) and wrapping cleanup in try/except to avoid leaks.

644-669: ⚠️ Potential issue | 🟠 Major

Duplicate of prior feedback: no-op success test + inconsistent docstring.

The comment on line 654 says "Verify initial status is '0' (unprocessed)" but the assertion on line 657 expects "1", then the update targets "1" as well. As flagged previously, flip to the opposite valid status so an accidental no-op implementation cannot pass this test, and update the comment to match.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`
around lines 644 - 669, The test test_change_status_all_success currently
asserts an initial status that contradicts its comment and then updates to the
same status (allowing a no-op to pass); change the test so the initial status
and comment are consistent and the status_update in document_change_status is
the opposite value to force a real change (e.g., verify initial status via
document_infos is "1" and update using document_change_status to "0", then
assert res["code"] == 0 and that res["data"][doc_id]["status"] == "0" and the
subsequent document_infos shows "0"). Update the docstring/comment to reflect
the correct initial status and the target status change, and use the existing
helpers document_infos and document_change_status and the test function name
test_change_status_all_success to locate where to modify assertions and the
update payload.
🧹 Nitpick comments (1)
api/apps/restful_apis/document_api.py (1)

1136-1138: Reconsider RetCode.SERVER_ERROR for non-server partial failures.

Returning SERVER_ERROR even when the only failing items are client-attributable (e.g., Document not found, Document not found in this dataset.) muddies error semantics for clients who cannot distinguish a backend incident from an invalid input id. Consider returning a 4xx-style code (e.g., RetCode.DATA_ERROR) when all errors are per-document not-found/ownership issues, and reserving SERVER_ERROR for true infrastructure failures (DB/docStore).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 1136 - 1138, The current
branch always returns RetCode.SERVER_ERROR when has_error is true; change it so
get_json_result uses a 4xx RetCode (e.g., RetCode.DATA_ERROR) if all recorded
errors are client-caused (check the per-item error messages in result for
strings like "Document not found" or "Document not found in this dataset." or an
equivalent error code), and only return RetCode.SERVER_ERROR when any error
indicates a server/infrastructure failure (DB/docStore exceptions). Update the
conditional around has_error (the return using get_json_result(data=result,
message="Partial failure", code=RetCode.SERVER_ERROR)) to inspect result’s error
entries and choose RetCode.DATA_ERROR vs RetCode.SERVER_ERROR accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 1096-1099: The warning uses doc.kb_id instead of the actual
document identifier, causing misleading logs; update the log in the error branch
(where variables doc_id and dataset_id are used, alongside result and has_error)
to log the actual document id (doc_id) and dataset id, and optionally include
doc.kb_id for extra context so the message reads something like "Document
{doc_id} (kb_id={doc.kb_id}) not in dataset {dataset_id}" while leaving the rest
of the error handling (setting result[doc_id], has_error=True, continue)
unchanged.
- Around line 1077-1079: The access check for batch status updates uses
KnowledgebaseService.query(id=dataset_id, tenant_id=tenant_id) which restricts
to the owner, but similar batch endpoints (update_metadata and delete_documents)
use KnowledgebaseService.accessible(kb_id=dataset_id, user_id=tenant_id) to
allow team members; change the check to use
KnowledgebaseService.accessible(kb_id=dataset_id, user_id=tenant_id) so team
members can perform batch status updates, or if stricter owner-only behavior is
intended, add a clarifying comment above this block explaining why query is
required instead of accessible.

In `@web/src/pages/dataset/dataset/use-bulk-operate-dataset.tsx`:
- Around line 88-97: The onChangeStatus handler currently calls
setDocumentStatus with datasetId: knowledgeBase?.id which may be undefined;
update the onChangeStatus function to guard against absent knowledgeBase?.id by
early-returning (or disabling the UI action) and show a clear user
error/notification instead of calling setDocumentStatus when knowledgeBase?.id
is falsy; specifically, in the onChangeStatus callback (referencing
onChangeStatus, setDocumentStatus, selectedRowKeys, knowledgeBase?.id) check for
a valid knowledgeBase.id and only invoke setDocumentStatus with { status:
enabled, documentId: selectedRowKeys, datasetId: knowledgeBase.id } when
present.

---

Duplicate comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 1132-1134: In the except block that sets result[doc_id] and
has_error, add a logging.exception(...) call to record the full traceback and
context for this per-document failure (include doc_id in the log message)
instead of only storing the error string; this ensures the exception handler
around the loop (where result, doc_id, and has_error are used) logs diagnostic
details for production while preserving the existing result[doc_id] assignment
and has_error flag.
- Around line 1070-1075: Validate the doc_ids from get_request_json(): ensure
req contains "doc_ids", that doc_ids is a list (not a string) and not empty, and
that each element is an expected ID type (e.g., int or str) before proceeding;
if validation fails, return get_error_argument_result with a clear message;
update the block that currently reads doc_ids = req.get("doc_ids", []) to
perform these checks and reject invalid inputs to avoid iterating over a string
or silently succeeding.
- Around line 1105-1130: The code updates the DB via
DocumentService.update_by_id before calling settings.docStoreConn.update which
can leave Document.status inconsistent if the doc store update fails; change the
flow to perform the doc store update (settings.docStoreConn.update using doc_id,
available_int/status_int, search.index_name(kb.tenant_id), doc.kb_id) first and
only call DocumentService.update_by_id to persist the status after confirming
the doc store update succeeded, or if you prefer to keep the DB-first approach
implement an atomic rollback that resets Document.status on doc store failure;
also remove the redundant str(status) in the DocumentService.update_by_id call
(status is already a string earlier) to avoid unnecessary coercion.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`:
- Around line 584-623: The test
TestDocumentBatchChangeStatus.test_change_status_partial_failure_matrix
currently only submits valid doc_ids and asserts length before the try/finally
which can leak docs; change it to (1) move the assert len(doc_ids) == 3 inside
the try block after creation so cleanup always runs even when upload fails, (2)
construct a mixed doc_ids payload that includes valid IDs from
bulk_upload_documents(WebApiAuth, dataset_id, 3, ragflow_tmp_dir), plus at least
one invalid id and one id from another dataset to exercise partial failures, (3)
update assertions against document_change_status(WebApiAuth, dataset_id,
{"doc_ids": doc_ids, "status": "1"}) to expect partial success/failure entries
and verify database state via document_infos(WebApiAuth, dataset_id, {"ids":
valid_doc_ids}) only for the valid IDs, and (4) make delete_document(WebApiAuth,
dataset_id, {"ids": created_doc_ids}) robust by deleting only the actual created
IDs (track created_doc_ids) and wrapping cleanup in try/except to avoid leaks.
- Around line 644-669: The test test_change_status_all_success currently asserts
an initial status that contradicts its comment and then updates to the same
status (allowing a no-op to pass); change the test so the initial status and
comment are consistent and the status_update in document_change_status is the
opposite value to force a real change (e.g., verify initial status via
document_infos is "1" and update using document_change_status to "0", then
assert res["code"] == 0 and that res["data"][doc_id]["status"] == "0" and the
subsequent document_infos shows "0"). Update the docstring/comment to reflect
the correct initial status and the target status change, and use the existing
helpers document_infos and document_change_status and the test function name
test_change_status_all_success to locate where to modify assertions and the
update payload.

---

Nitpick comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 1136-1138: The current branch always returns RetCode.SERVER_ERROR
when has_error is true; change it so get_json_result uses a 4xx RetCode (e.g.,
RetCode.DATA_ERROR) if all recorded errors are client-caused (check the per-item
error messages in result for strings like "Document not found" or "Document not
found in this dataset." or an equivalent error code), and only return
RetCode.SERVER_ERROR when any error indicates a server/infrastructure failure
(DB/docStore exceptions). Update the conditional around has_error (the return
using get_json_result(data=result, message="Partial failure",
code=RetCode.SERVER_ERROR)) to inspect result’s error entries and choose
RetCode.DATA_ERROR vs RetCode.SERVER_ERROR accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f44df773-bd6e-45b1-9d80-00258913066e

📥 Commits

Reviewing files that changed from the base of the PR and between 9098ca3 and e9ffa1d.

📒 Files selected for processing (10)
  • api/apps/document_app.py
  • api/apps/restful_apis/document_api.py
  • test/testcases/test_web_api/test_common.py
  • test/testcases/test_web_api/test_document_app/test_document_metadata.py
  • web/src/hooks/use-document-request.ts
  • web/src/pages/dataset/dataset/dataset-table.tsx
  • web/src/pages/dataset/dataset/use-bulk-operate-dataset.tsx
  • web/src/pages/dataset/dataset/use-dataset-table-columns.tsx
  • web/src/services/knowledge-service.ts
  • web/src/utils/api.ts
💤 Files with no reviewable changes (1)
  • api/apps/document_app.py
✅ Files skipped from review due to trivial changes (1)
  • web/src/pages/dataset/dataset/dataset-table.tsx
🚧 Files skipped from review as they are similar to previous changes (3)
  • web/src/utils/api.ts
  • web/src/pages/dataset/dataset/use-dataset-table-columns.tsx
  • web/src/hooks/use-document-request.ts

Comment on lines +1077 to +1079
# Verify dataset ownership
if not KnowledgebaseService.query(id=dataset_id, tenant_id=tenant_id):
return get_error_data_result(message="You don't own the dataset.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify which access check is used across dataset-scoped routes in the restful_apis module
rg -nP -C2 'KnowledgebaseService\.(accessible|query)\(' api/apps/restful_apis/

Repository: infiniflow/ragflow

Length of output: 6975


🏁 Script executed:

sed -n '1050,1090p' api/apps/restful_apis/document_api.py | cat -n

Repository: infiniflow/ragflow

Length of output: 1614


🏁 Script executed:

sed -n '1030,1055p' api/apps/restful_apis/document_api.py | cat -n

Repository: infiniflow/ragflow

Length of output: 871


🏁 Script executed:

sed -n '1010,1035p' api/apps/restful_apis/document_api.py | cat -n

Repository: infiniflow/ragflow

Length of output: 1235


🏁 Script executed:

sed -n '400,450p' api/apps/restful_apis/document_api.py | cat -n

Repository: infiniflow/ragflow

Length of output: 2564


🏁 Script executed:

sed -n '850,900p' api/apps/restful_apis/document_api.py | cat -n

Repository: infiniflow/ragflow

Length of output: 2175


🏁 Script executed:

sed -n '900,950p' api/apps/restful_apis/document_api.py | cat -n

Repository: infiniflow/ragflow

Length of output: 1838


🏁 Script executed:

sed -n '948,980p' api/apps/restful_apis/document_api.py | cat -n

Repository: infiniflow/ragflow

Length of output: 1649


Align access check with similar batch operations for consistency.

The endpoints update_metadata (line 958) and delete_documents (line 432) use KnowledgebaseService.accessible(kb_id=dataset_id, user_id=tenant_id), granting team members access to batch operations. This endpoint uses the stricter KnowledgebaseService.query(id=dataset_id, tenant_id=tenant_id), restricting to the owner. If batch status updates should be accessible to team members like delete/update operations, switch to accessible; otherwise, consider adding a comment explaining the intentionally stricter access requirement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 1077 - 1079, The access
check for batch status updates uses KnowledgebaseService.query(id=dataset_id,
tenant_id=tenant_id) which restricts to the owner, but similar batch endpoints
(update_metadata and delete_documents) use
KnowledgebaseService.accessible(kb_id=dataset_id, user_id=tenant_id) to allow
team members; change the check to use
KnowledgebaseService.accessible(kb_id=dataset_id, user_id=tenant_id) so team
members can perform batch status updates, or if stricter owner-only behavior is
intended, add a clarifying comment above this block explaining why query is
required instead of accessible.

Comment on lines +1096 to +1099
logging.warning(f"Document {doc.kb_id} not in dataset {dataset_id}")
result[doc_id] = {"error": "Document not found in this dataset."}
has_error = True
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix misleading log message.

The warning logs doc.kb_id labeled as "Document ... not in dataset", which prints the document's kb_id rather than the document id. This makes partial-failure diagnosis harder for the new endpoint.

🪵 Proposed fix
-            if doc.kb_id != dataset_id:
-                logging.warning(f"Document {doc.kb_id} not in dataset {dataset_id}")
+            if doc.kb_id != dataset_id:
+                logging.warning(f"Document {doc_id} (kb_id={doc.kb_id}) not in dataset {dataset_id}")
                 result[doc_id] = {"error": "Document not found in this dataset."}
                 has_error = True
                 continue

As per coding guidelines, **/*.py: Add logging for new flows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 1096 - 1099, The warning
uses doc.kb_id instead of the actual document identifier, causing misleading
logs; update the log in the error branch (where variables doc_id and dataset_id
are used, alongside result and has_error) to log the actual document id (doc_id)
and dataset id, and optionally include doc.kb_id for extra context so the
message reads something like "Document {doc_id} (kb_id={doc.kb_id}) not in
dataset {dataset_id}" while leaving the rest of the error handling (setting
result[doc_id], has_error=True, continue) unchanged.

Comment on lines 88 to 97
const onChangeStatus = useCallback(
(enabled: boolean) => {
setDocumentStatus({ status: enabled, documentId: selectedRowKeys });
setDocumentStatus({
status: enabled,
documentId: selectedRowKeys,
datasetId: knowledgeBase?.id,
});
},
[selectedRowKeys, setDocumentStatus],
[selectedRowKeys, setDocumentStatus, knowledgeBase],
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against undefined datasetId.

If knowledgeBase is not yet available, datasetId will be undefined, producing a request URL like /api/v1/datasets/undefined/documents/batch-update-status that silently fails on the backend. Consider early-returning (or disabling the action) when knowledgeBase?.id is absent so the user gets a clear signal.

🛡️ Proposed guard
   const onChangeStatus = useCallback(
     (enabled: boolean) => {
+      if (!knowledgeBase?.id) {
+        return;
+      }
       setDocumentStatus({
         status: enabled,
         documentId: selectedRowKeys,
         datasetId: knowledgeBase?.id,
       });
     },
     [selectedRowKeys, setDocumentStatus, knowledgeBase],
   );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const onChangeStatus = useCallback(
(enabled: boolean) => {
setDocumentStatus({ status: enabled, documentId: selectedRowKeys });
setDocumentStatus({
status: enabled,
documentId: selectedRowKeys,
datasetId: knowledgeBase?.id,
});
},
[selectedRowKeys, setDocumentStatus],
[selectedRowKeys, setDocumentStatus, knowledgeBase],
);
const onChangeStatus = useCallback(
(enabled: boolean) => {
if (!knowledgeBase?.id) {
return;
}
setDocumentStatus({
status: enabled,
documentId: selectedRowKeys,
datasetId: knowledgeBase?.id,
});
},
[selectedRowKeys, setDocumentStatus, knowledgeBase],
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/pages/dataset/dataset/use-bulk-operate-dataset.tsx` around lines 88 -
97, The onChangeStatus handler currently calls setDocumentStatus with datasetId:
knowledgeBase?.id which may be undefined; update the onChangeStatus function to
guard against absent knowledgeBase?.id by early-returning (or disabling the UI
action) and show a clear user error/notification instead of calling
setDocumentStatus when knowledgeBase?.id is falsy; specifically, in the
onChangeStatus callback (referencing onChangeStatus, setDocumentStatus,
selectedRowKeys, knowledgeBase?.id) check for a valid knowledgeBase.id and only
invoke setDocumentStatus with { status: enabled, documentId: selectedRowKeys,
datasetId: knowledgeBase.id } when present.

@wangq8 wangq8 self-requested a review April 24, 2026 09:51
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continue Integration lgtm This PR has been approved by a maintainer python Pull requests that update python code size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants