feat: skip LLM summary for entities with unchanged descriptions#2817
feat: skip LLM summary for entities with unchanged descriptions#2817ndcorder wants to merge 1 commit intoHKUDS:mainfrom
Conversation
Check incoming descriptions against what's already on the node. If nothing new was added, reuse the existing summary instead of calling the LLM again. Saves a lot of time on re-ingestion.
ad4c882 to
70c5a9a
Compare
|
Hi, thanks for the contribution! However, we have some concerns regarding the effectiveness of this optimization. Since Because of these variations, an exact string comparison like Could you provide any practical testing or metrics showing that this optimization actually triggers and skips the summarization step effectively in your use cases? An alternative approach might be to use vector embeddings to check for semantic similarity between the new and existing descriptions:
We'd love to hear your thoughts on this! |
|
I think I was too tunnel-visioned when I originally made this PR because it's not necessary. If the cache is enabled, the summary LLM call itself would also be cached, so the skip logic is redundant. And if the cache is off, descriptions vary too much for exact matching to help. Either way it doesn't add real value. I'm sorry for wasting your time! |
|
You are welcome. |
When re-ingesting a document or doing incremental updates, most entities already have the same descriptions. Right now we still call the LLM to re-summarize them every time, which is wasteful.
This adds an early return in _merge_nodes_then_upsert — if all incoming descriptions are already present on the existing node, we skip the summary call and just update source tracking.