Skip to content

fix: resolve KnowledgeGraphNode hash collision due to template rendering bugs#709

Open
3em0 wants to merge 1 commit intopingcap:mainfrom
3em0:fix/kg-node-hash-collision
Open

fix: resolve KnowledgeGraphNode hash collision due to template rendering bugs#709
3em0 wants to merge 1 commit intopingcap:mainfrom
3em0:fix/kg-node-hash-collision

Conversation

@3em0
Copy link
Copy Markdown

@3em0 3em0 commented Mar 27, 2026

VUL-01 — KnowledgeGraphNode Hash Collision (Severity: High)

Category: Content collision / Hash invalidation
Affected: backend/app/rag/retrievers/knowledge_graph/schema.pyKnowledgeGraphNode.hash, get_content(), _get_entities_str(), _get_relationships_str()

Root Cause

Two independent bugs combine to make KnowledgeGraphNode.hash and get_content() completely ignore actual entity and relationship data:

Bug 1 — Template syntax mismatch (L199-208):
Templates used Jinja2 double-brace syntax ({{ name }}) but were rendered with Python str.format(), which treats {{ as a literal { escape. All format keyword arguments were silently ignored — every entity and relationship rendered to the same constant string regardless of content.

Bug 2 — Wrong template reference (L286):
_get_relationships_str() used self.entity_template instead of self.relationship_template. Relationship fields (rag_description, weight, last_modified_at, meta) were passed as kwargs to the entity template which has no matching placeholders — all silently discarded.

Attack Chain

Attacker controls two knowledge bases KB-A and KB-B:

KB-A: query="X", entities=[Alice, Bob]   → hash = SHA256("...\n{ name }\n\n{ name }\n...")
KB-B: query="X", entities=[Evil1, Evil2] → hash = SHA256("...\n{ name }\n\n{ name }\n...")
                                                   ↑ identical!

→ Two completely different knowledge graph nodes produce the same hash
→ In fusion retrieval, Evil1/Evil2 nodes may replace Alice/Bob nodes, or vice versa
→ Retrieved knowledge graph content does not match what the hash identifies

Impact

  • hash is effectively: f(query_text, len(entities), len(relationships)) — actual content never enters the computation
  • Vector store node deduplication based on hash will incorrectly merge semantically different KG nodes
  • get_content() returns constant placeholder text instead of real entity/relationship data, affecting any downstream consumer (rerankers, tracing, logging)
  • RAG system may return context from the wrong knowledge base

CVSS estimate: 7.5 (High) — No authentication required, impacts data integrity and availability.

Fix

# Location Change
1 DEFAULT_ENTITY_TMPL (L199-202) {{ name }}{name}, {{ description }}{description}
2 DEFAULT_RELATIONSHIP_TMPL (L203-208) {{ rag_description }}{rag_description}, etc.
3 _get_relationships_str() (L286) self.entity_template.format(self.relationship_template.format(

Verification

Tested with two KnowledgeGraphNode instances sharing the same query and entity/relationship counts but completely different content:

  • Before fix: Both produce hash 5bfcc6e84d7dc0f987e19626d09c3fbf... (collision)
  • After fix: c5838f01... vs fd7e8c14... (no collision, content correctly differentiated)

🤖 Generated with Claude Code

…ing bugs

Two bugs in KnowledgeGraphNode caused hash/get_content to ignore actual
entity and relationship data, producing identical outputs for completely
different knowledge graph nodes.

Bug 1: Templates used Jinja2 double-brace syntax ({{ name }}) but were
rendered with Python str.format(), which treats {{ as literal {. All
format kwargs were silently ignored.

Bug 2: _get_relationships_str() used self.entity_template instead of
self.relationship_template, so relationship data never reached the output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown
Contributor

vercel bot commented Mar 27, 2026

Someone is attempting to deploy a commit to the pingcap Team on Vercel.

A member of the Team first needs to authorize it.

@3em0
Copy link
Copy Markdown
Author

3em0 commented Mar 27, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant