feat: skip LLM summary for entities with unchanged descriptions by ndcorder · Pull Request #2817 · HKUDS/LightRAG

ndcorder · 2026-03-21T08:57:53Z

When re-ingesting a document or doing incremental updates, most entities already have the same descriptions. Right now we still call the LLM to re-summarize them every time, which is wasteful.

This adds an early return in _merge_nodes_then_upsert — if all incoming descriptions are already present on the existing node, we skip the summary call and just update source tracking.

Check incoming descriptions against what's already on the node. If nothing new was added, reuse the existing summary instead of calling the LLM again. Saves a lot of time on re-ingestion.

danielaskdd · 2026-03-21T15:35:45Z

Hi, thanks for the contribution!

However, we have some concerns regarding the effectiveness of this optimization. Since incoming_descriptions are generated by LLM during the extraction phase, the wording of these descriptions will almost certainly vary slightly each time, even for the same entity in the same context.

Because of these variations, an exact string comparison like incoming_descriptions.issubset(existing_descriptions) is highly unlikely to evaluate to True in a real-world scenario (unless the exact same LLM extraction cache is hit).

Could you provide any practical testing or metrics showing that this optimization actually triggers and skips the summarization step effectively in your use cases?

An alternative approach might be to use vector embeddings to check for semantic similarity between the new and existing descriptions:

Pros: It would correctly identify when a new description adds no new information, regardless of wording changes.
Cons: It would introduce additional overhead by requiring vector database queries/embedding calculations during the merge step, which might negate the performance benefits of skipping the LLM summarization call.

We'd love to hear your thoughts on this!

ndcorder · 2026-03-26T06:04:11Z

I think I was too tunnel-visioned when I originally made this PR because it's not necessary.

If the cache is enabled, the summary LLM call itself would also be cached, so the skip logic is redundant. And if the cache is off, descriptions vary too much for exact matching to help. Either way it doesn't add real value. I'm sorry for wasting your time!

danielaskdd · 2026-03-26T08:45:18Z

You are welcome.

skip LLM summary when entity descriptions haven't changed

70c5a9a

Check incoming descriptions against what's already on the node. If nothing new was added, reuse the existing summary instead of calling the LLM again. Saves a lot of time on re-ingestion.

ndcorder force-pushed the feat/skip-unchanged-entity-summaries branch from ad4c882 to 70c5a9a Compare March 21, 2026 09:34

ndcorder closed this Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: skip LLM summary for entities with unchanged descriptions#2817

feat: skip LLM summary for entities with unchanged descriptions#2817
ndcorder wants to merge 1 commit intoHKUDS:mainfrom
ndcorder:feat/skip-unchanged-entity-summaries

ndcorder commented Mar 21, 2026 •

edited

Loading

Uh oh!

danielaskdd commented Mar 21, 2026

Uh oh!

ndcorder commented Mar 26, 2026

Uh oh!

danielaskdd commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ndcorder commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielaskdd commented Mar 21, 2026

Uh oh!

ndcorder commented Mar 26, 2026

Uh oh!

danielaskdd commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ndcorder commented Mar 21, 2026 •

edited

Loading