Skip to content

feat: add entity deduplication after graph building#141

Open
Stayfoool wants to merge 1 commit into666ghj:mainfrom
Stayfoool:feature/entity-deduplication
Open

feat: add entity deduplication after graph building#141
Stayfoool wants to merge 1 commit into666ghj:mainfrom
Stayfoool:feature/entity-deduplication

Conversation

@Stayfoool
Copy link

@Stayfoool Stayfoool commented Mar 11, 2026

Hi @666ghj

I noticed that during graph building, Zep sometimes creates duplicate
entity nodes for the same real-world entity (e.g. "特朗普" and "美国总统特朗普"
appear as separate nodes). This affects the accuracy of the knowledge graph.

This PR adds an automatic entity deduplication step after graph building,
using name similarity pre-filtering + type compatibility check + LLM
confirmation to identify and merge duplicates.

Would appreciate it if you could take a look when you have time.
Happy to make any changes based on your feedback. Thanks!

Summary

  • Add entity deduplication service that identifies and merges duplicate
    nodes in the knowledge graph after building (e.g. "特朗普" vs "美国总统特朗普")
  • When merging duplicate nodes, migrates all edges from removed nodes
    to the primary node before deletion, preserving graph connectivity
  • Three-layer filtering: name similarity pre-filter → type compatibility
    check → LLM confirmation
  • Integrate into graph build pipeline automatically (80%-90% progress stage)
  • Add standalone POST /api/graph/deduplicate endpoint for manual dedup
  • Display dedup report in frontend showing which entities were merged
duplicate remove

Changes

  • New: backend/app/services/entity_deduplicator.py
  • Modified: backend/app/api/graph.py
  • Modified: backend/app/services/__init__.py
  • Modified: frontend/src/views/Process.vue
  • Modified: frontend/src/views/MainView.vue
  • Modified: frontend/src/components/Step1GraphBuild.vue

Closes #145

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

知识图谱中存在重复实体节点

1 participant