Skip to content

feat: skip LLM summary for entities with unchanged descriptions#2817

Closed
ndcorder wants to merge 1 commit intoHKUDS:mainfrom
ndcorder:feat/skip-unchanged-entity-summaries
Closed

feat: skip LLM summary for entities with unchanged descriptions#2817
ndcorder wants to merge 1 commit intoHKUDS:mainfrom
ndcorder:feat/skip-unchanged-entity-summaries

Conversation

@ndcorder
Copy link
Copy Markdown

@ndcorder ndcorder commented Mar 21, 2026

When re-ingesting a document or doing incremental updates, most entities already have the same descriptions. Right now we still call the LLM to re-summarize them every time, which is wasteful.

This adds an early return in _merge_nodes_then_upsert — if all incoming descriptions are already present on the existing node, we skip the summary call and just update source tracking.

Check incoming descriptions against what's already on the node.
If nothing new was added, reuse the existing summary instead of
calling the LLM again. Saves a lot of time on re-ingestion.
@ndcorder ndcorder force-pushed the feat/skip-unchanged-entity-summaries branch from ad4c882 to 70c5a9a Compare March 21, 2026 09:34
@danielaskdd
Copy link
Copy Markdown
Collaborator

Hi, thanks for the contribution!

However, we have some concerns regarding the effectiveness of this optimization. Since incoming_descriptions are generated by LLM during the extraction phase, the wording of these descriptions will almost certainly vary slightly each time, even for the same entity in the same context.

Because of these variations, an exact string comparison like incoming_descriptions.issubset(existing_descriptions) is highly unlikely to evaluate to True in a real-world scenario (unless the exact same LLM extraction cache is hit).

Could you provide any practical testing or metrics showing that this optimization actually triggers and skips the summarization step effectively in your use cases?

An alternative approach might be to use vector embeddings to check for semantic similarity between the new and existing descriptions:

  • Pros: It would correctly identify when a new description adds no new information, regardless of wording changes.
  • Cons: It would introduce additional overhead by requiring vector database queries/embedding calculations during the merge step, which might negate the performance benefits of skipping the LLM summarization call.

We'd love to hear your thoughts on this!

@ndcorder
Copy link
Copy Markdown
Author

I think I was too tunnel-visioned when I originally made this PR because it's not necessary.

If the cache is enabled, the summary LLM call itself would also be cached, so the skip logic is redundant. And if the cache is off, descriptions vary too much for exact matching to help. Either way it doesn't add real value. I'm sorry for wasting your time!

@ndcorder ndcorder closed this Mar 26, 2026
@danielaskdd
Copy link
Copy Markdown
Collaborator

You are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants