feat: batch Neo4j graph writes with UNWIND queries#2816
feat: batch Neo4j graph writes with UNWIND queries#2816ndcorder wants to merge 1 commit intoHKUDS:mainfrom
Conversation
Replace individual upsert_node/upsert_edge calls with batched UNWIND queries during the merge phase. Nodes are grouped by entity_type, large batches chunked at 500. Also adds has_nodes_batch for bulk existence checks.
7d6dc48 to
4f6ba51
Compare
|
Thanks for your interesting in LightRAG and contributions. However I noticed a critical concurrency issue (Race Condition) introduced by deferring the graph writes. In the current implementation of lightrag/operate.py, the _locked_process_entity_name (and similarly _locked_process_edges) function acquires an application-level distributed lock using async with get_storage_keyed_lock: By deferring the upsert_node out of the loop and putting it into a batch at the end of the phase (after all locks are released): This breaks the "Read-Modify-Write" atomic cycle:
This could lead to severe data loss and inconsistencies when multiple documents are processed concurrently and contain the same entities/edges. To maintain concurrent safety while benefiting from batching, we might need to rely entirely on DB-level atomic merge operations (e.g., using ON CREATE SET and ON MATCH SET in Cypher) and skip application-level reading before writing. (Though this might break compatibility with other GraphDB implementations). This PR will not be merged until you find a way to work it out. Thanks! |
Currently every entity and relationship gets its own Neo4j session + transaction + MERGE query during the merge phase. For a document producing 80 entities and 120 relations, that's 200+ round-trips.
This adds batch_upsert_nodes/batch_upsert_edges to the graph storage interface with a default loop fallback so other backends aren't affected. Neo4JStorage overrides them with UNWIND-based Cypher. The merge phase in operate.py now collects results and writes them in bulk instead of one-at-a-time.
Should help with #1387, #1957, #1648, #2264.