-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Open
Open
Copy link
Labels
bugSomething isn't workingSomething isn't working
Description
Description
When calling await knowledge.ainsert() with different text_content values, the system generates the same content_hash for every entry. This causes new data to overwrite existing data instead of being added as unique records.
Steps to Reproduce
- Initialize the Knowledge instance:
knowledge = Knowledge(vector_db=vector_db) - Call
await knowledge.ainsert(text_content="xxx")multiple times in a loop or sequence. - Ensure that each text_content string is unique/different.
Agent Configuration (if applicable)
No response
Expected Behavior
- A unique content_hash should be generated for each distinct text_content.
- All entries should be stored independently.
Actual Behavior
- The content_hash generated is identical for every insertion, regardless of the differing
text_content. - Consequently, new entries overwrite the previous data instead of being appended.
Screenshots or Logs (if applicable)
No response
Environment
- OS: Windows 11
- Agno Version: [v2.5.9]Possible Solutions (optional)
No response
Additional Context
I suspect the issue lies in the _build_content_hash function (lines 2167–2183 in knowledge.py). The current logic uses an if-elif chain, which means if content.file_data.type exists, the code skips calculating the hash for content.file_data.content entirely. This causes different content with the same type to generate identical hashes.
Current Code Logic
# For file_data, always add filename, type, size, or content for uniqueness
if content.file_data.filename:
hash_parts.append(content.file_data.filename)
elif content.file_data.type:
# Problem: If 'type' exists, it appends the type and skips the content hash below
hash_parts.append(content.file_data.type)
elif content.file_data.size is not None:
hash_parts.append(str(content.file_data.size))
else:
# Fallback: use the content for uniqueness
# Include type information to distinguish str vs bytes
content_type = "str" if isinstance(content.file_data.content, str) else "bytes"
content_bytes = (
content.file_data.content.encode()
if isinstance(content.file_data.content, str)
else content.file_data.content
)
content_hash = hashlib.sha256(content_bytes).hexdigest()[:16] # Use first 16 chars
hash_parts.append(f"{content_type}:{content_hash}")Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working