Skip to content

fix: defer KG rebuild during batch document deletion#2819

Open
he-yufeng wants to merge 2 commits intoHKUDS:mainfrom
he-yufeng:fix/batch-delete-deferred-rebuild
Open

fix: defer KG rebuild during batch document deletion#2819
he-yufeng wants to merge 2 commits intoHKUDS:mainfrom
he-yufeng:fix/batch-delete-deferred-rebuild

Conversation

@he-yufeng
Copy link
Contributor

Summary

When deleting N documents via background_delete_documents(), each adelete_by_doc_id() call triggers a full rebuild_knowledge_from_chunks(). For entities shared across documents, this means the same summaries get recomputed N times. On an 85-document batch, the issue author measured ~75x redundant LLM calls.

This PR adds a skip_rebuild parameter to adelete_by_doc_id() (defaults to False for backwards compat). The batch deletion loop passes skip_rebuild=True, collects rebuild targets from each result, and does a single combined rebuild at the end.

Changes:

  • lightrag/base.pyDeletionResult gains optional entities_to_rebuild / relationships_to_rebuild fields
  • lightrag/lightrag.pyadelete_by_doc_id() conditionally skips rebuild when skip_rebuild=True, populates rebuild targets on the result
  • lightrag/api/routers/document_routes.pybackground_delete_documents() aggregates targets across loop, single rebuild_knowledge_from_chunks() call at the end
  • tests/test_batch_delete_deferred_rebuild.py — 4 tests covering both modes

Single-doc deletion (direct adelete_by_doc_id() calls) is completely unaffected — same code path as before.

Closes #2795

@danielaskdd
Copy link
Collaborator

Deferring the refactoring of entity relationships is a sound approach; however, there are several issues in the current PR that need to be addressed:

  1. Redundant Tasks in Overlapping Pools: If multiple documents scheduled for deletion share common entities, those entities might be tagged for both "Refactor" and "Delete" pools simultaneously. In such cases, the refactoring task becomes redundant and should be invalidated or skipped.
  2. Lack of Robust Error Handling: The LLM-driven KG rebuild stage is prone to failure. If an error occurs during this phase, the pipeline should abort without removing the file from doc_status. This ensures the system maintains state and allows the user to re-trigger the deletion process later, and idempotency for deletion operation should be ensured.

@he-yufeng
Copy link
Contributor Author

Addressed the feedback — latest push includes:

  • Pruned stale entities/relations from the rebuild pool (checks graph existence before queueing)
  • Added error handling around the deferred rebuild so a failure there doesn't take down the whole delete operation
  • Cleaned up the pool merging logic to avoid duplicates across overlapping batches

@danielaskdd
Copy link
Collaborator

This PR conflicts with PR #2826 ("make document deletion retry-safe"). Please resolve the conflicts before we proceed with the review.

When deleting N documents via background_delete_documents(), each call
to adelete_by_doc_id() was triggering a full rebuild_knowledge_from_chunks().
For shared entities spanning multiple docs, this means the same summaries
get recomputed N times -- 75x wasted LLM calls on an 85-doc batch.

Added skip_rebuild parameter to adelete_by_doc_id(). When set, the
per-doc rebuild is skipped and the affected entities/relationships are
returned in DeletionResult so the caller can aggregate them. The batch
delete loop now collects all targets and does a single rebuild pass at
the end.

Backwards compatible: skip_rebuild defaults to False, so single-doc
deletion and direct API callers behave exactly as before.

Closes HKUDS#2795
…failure

Address review feedback on two issues:

1. When multiple documents share entities, a later deletion may fully remove
   an entity that an earlier deletion tagged for rebuild.  Before the deferred
   rebuild, verify each entity/relation still exists in the graph and drop any
   that were already deleted — avoids wasted LLM calls and potential errors.

2. adelete_by_doc_id(skip_rebuild=True) now keeps doc_status alive.  The batch
   caller only removes doc_status after a successful rebuild.  If rebuild fails,
   docs remain visible so the user can re-trigger deletion (idempotent).
@he-yufeng he-yufeng force-pushed the fix/batch-delete-deferred-rebuild branch from 7e74326 to 7c7f84b Compare March 24, 2026 02:55
@he-yufeng
Copy link
Contributor Author

Rebased onto main, conflicts with #2826 resolved.

@danielaskdd
Copy link
Collaborator

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7c7f84b884

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +3971 to 3973
if not skip_rebuild:
await self.doc_status.delete([doc_id])
await self.full_docs.delete([doc_id])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve rebuild metadata when deferring cleanup

When skip_rebuild=True, this path skips doc_status deletion but still proceeds after stage 10, where full_entities/full_relations are removed before returning success. If the later deferred rebuild fails in background_delete_documents, a retry cannot reconstruct entities_to_rebuild/relationships_to_rebuild because the per-doc graph metadata is already gone, so the promised “re-trigger deletion to retry” flow cannot actually repair the KG and can leave stale summaries indefinitely.

Useful? React with 👍 / 👎.

Comment on lines +2146 to +2150
elif not rebuild_ok:
keep_msg = (
f"Keeping doc_status for {len(successful_deletions)} docs "
f"because rebuild failed — re-trigger deletion to retry"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mark deferred rebuild failures as failed deletions

In the rebuild-failure branch, the code logs a warning but never reclassifies the previously successful_deletions as failed. The finally block then emits a completion summary from these counters, so users can see a fully successful batch even though doc cleanup was intentionally withheld due to rebuild failure; this makes operational status inaccurate and can hide that manual retry is still required.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Performance]: batch document deletion triggers N redundant knowledge graph rebuilds

2 participants