Skip to content

Fuzzy Mapping for User Profiles #89

@VivekVinushanth

Description

@VivekVinushanth

Overview

This project introduces fuzzy profile matching within the Customer Data Service (CDS) to detect and suggest potential duplicate user profiles based on attribute similarity rather than exact matches.
It aims to enhance the existing deterministic unification system by integrating probabilistic matching and threshold-based decisioning, optionally requiring admin or user approvals before merging profiles.


Objectives

  • Develop a fuzzy matching mechanism to identify potential duplicate profiles.
  • Integrate fuzzy match detection into the existing profile unification workflow.
  • Enable configurable thresholds for different confidence levels (auto-merge, approval-required).
  • Optionally implement an approval workflow for admins or users to confirm or reject merges.

Scope of Work

1. Fuzzy Matching Core

  • Conduct research on suitable fuzzy matching algorithms (e.g., Levenshtein distance, Jaro-Winkler, cosine similarity, ML-based embedding comparisons).
  • Implement similarity-based comparison for key identity attributes (e.g., name, email, phone, location).
  • Introduce configurable thresholds to classify match confidence levels:
    • High-confidence (Auto-Merge)
    • Medium-confidence (Approval Required)
    • Low-confidence (Ignore)
  • Store match scores and thresholds in the CDS database for audit and debugging.

2. Unification Workflow Integration

  • Integrate the fuzzy matching logic into the existing profile unification pipeline.
  • Extend the unification service to support probabilistic decisions.
  • Implement actions based on confidence levels:
    • Auto-Merge: Automatically unify profiles exceeding the high threshold.
    • Approval Queue: Add medium-confidence matches to a review queue for manual verification.
  • Update APIs to expose fuzzy match results and thresholds for debugging and visibility.

3. Approval Flow (Optional Phase)

  • Implement a lightweight approval mechanism (admin/user UI or API endpoint) to confirm or reject potential merges.
  • Record approval actions, timestamps, and decision metadata for auditability.
  • Ensure approvals and rejections are stored in an immutable log for traceability.

Expected Outcomes

  • A Fuzzy Matching Engine integrated into CDS with configurable matching algorithms and thresholds.
  • Improved profile unification accuracy by identifying probabilistic matches.
  • Optional admin/user approval workflows for controlled merges.
  • Enhanced auditability and data integrity in customer profile management.

Deliverables

  • Fuzzy matching core implementation and integration tests.
  • Updated unification workflow with threshold-based decisioning.
  • (Optional) Approval mechanism for merge confirmation.
  • Documentation covering algorithms, thresholds, and API usage.
  • Evaluation report comparing deterministic vs fuzzy unification accuracy.

Metadata

Metadata

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions