fix: optimize batch document deletion - eliminate N redundant KG rebuilds#2812
fix: optimize batch document deletion - eliminate N redundant KG rebuilds#2812yang1002378395-cmyk wants to merge 2 commits intoHKUDS:mainfrom
Conversation
…ilds - Add skip_rebuild parameter to adelete_by_doc_id() (default: False) - Add entities_to_rebuild/relationships_to_rebuild fields to DeletionResult - Refactor background_delete_documents() to collect all rebuild data - Call rebuild_knowledge_from_chunks() once after all deletions - Fixes HKUDS#2795: 75x performance improvement for batch deletions Backward compatible: skip_rebuild=False maintains existing behavior for single document deletions and external API callers.
Code reviewFound 1 issue:
The new Phase 2 block calls LightRAG/lightrag/api/routers/document_routes.py Lines 2084 to 2091 in 5c6aff8 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
|
@jules review |
|
This PR implements the fix and returns entities_to_rebuild and relationships_to_rebuild early from adelete_by_doc_id() if skip_rebuild=True. However, by returning early, it skips "Step 9. Delete from full_entities and full_relations storage" in adelete_by_doc_id(), which is a side effect. PR 2819 fixes this exact issue. It defers the rebuild step but lets adelete_by_doc_id() finish to "Step 9" and then returns the rebuild data along with the final DeletionResult. It also includes tests (tests/test_batch_delete_deferred_rebuild.py). |
Summary
skip_rebuildparameter toadelete_by_doc_id()(default False for backward compatibility)background_delete_documents()to collect rebuild data from all deletionsrebuild_knowledge_from_chunks()once after all deletions completePerformance Impact
Test Plan
skip_rebuild=False)Fixes #2795