Skip to content
benzsevern edited this page May 10, 2026 · 12 revisions

GoldenMatch Wiki

Deduplicate records, match across sources, and maintain golden records.

pip install goldenmatch
goldenmatch demo
goldenmatch setup

What's new in v1.12

Path Y (negative evidence on exact matchkeys) delivers the largest single-release jump in suite history. DQbench ER composite 66.99 → 91.04 (+24.05pp) by filtering adversarial-collision pairs at the exact matchkey level — same name+email shared across distinct entities now get scrubbed when phone+address disagree.

Dataset F1 (zero-config v1.12) Hand-tuned ceiling
DBLP-ACM 0.964 0.918 (above ceiling)
Febrl3 0.944 0.971 (97% of ceiling)
NCVR 0.972
DQbench ER (no LLM) 91.04 95.30 (with LLM)

DQbench tier breakdown: T1 89.3% / T2 97.5% / T3 85.5%. v1.10 + v1.11 + v1.12 progression documented on the Auto-Config Controller page.

See release notes: v1.10.0 (5 indicators), v1.11.0 (NE foundation), v1.12.0 (Path Y).

Getting Started

Page What you'll learn
Installation pip install with optional extras
Quick Start 5 usage scenarios from zero-config to database sync
Auto-Config Controller Zero-config, controller iteration, refit rules, indicators, negative evidence (v1.8 → v1.12)
Interactive TUI Gold-themed terminal UI with keyboard shortcuts

Core Concepts

Page What it covers
Pipeline Overview Ingest, Block, Score, Cluster, Golden Records
Blocking Strategies 8+ strategies: static, adaptive, multi-pass, ANN, canopy, learned
Python API import goldenmatch as gm -- 95 exports, zero-config to advanced

Advanced Features

Page What it does
GPU Routing & Vertex AI Managed embeddings via Google Cloud
Database Integration Incremental Postgres sync with golden record versioning
LLM Boost Claude/GPT-4 labeling + fine-tuning
Learning Memory Persistent corrections + threshold learning (v1.6.0)
dbt Integration Post-hooks, Postgres sync, Snowflake/BigQuery recipes

Reference

Page Details
Benchmarks 97.2% DBLP-ACM, 72.2% Abt-Buy, 92.4% PPRL FEBRL4
Comparison vs dedupe, Splink, Zingg, Ditto
Architecture Project structure and module map

CLI Commands

Command Description
goldenmatch demo Built-in demo with sample data
goldenmatch setup Interactive setup wizard
goldenmatch dedupe FILE [...] Deduplicate files
goldenmatch match TARGET --against REF Match across files
goldenmatch sync --table TABLE Sync against Postgres
goldenmatch watch --table TABLE Live stream mode
goldenmatch schedule --every 1h FILE Scheduled runs
goldenmatch serve FILE [...] REST API server
goldenmatch mcp-serve FILE [...] MCP server (Claude Desktop)
goldenmatch rollback RUN_ID Undo a previous run
goldenmatch unmerge RECORD_ID Remove a record from its cluster
goldenmatch runs List run history
goldenmatch init Interactive config wizard
goldenmatch interactive FILE [...] Launch TUI
goldenmatch profile FILE Profile data quality
goldenmatch evaluate FILE --gt GT.csv Evaluate against ground truth (P/R/F1)
goldenmatch incremental BASE --new NEW Match new records against existing base
goldenmatch analyze-blocking FILE Analyze and suggest blocking strategies
goldenmatch label FILE --config --gt Interactively label pairs to build ground truth CSV
goldenmatch config save/load/list/show Manage config presets
goldenmatch memory stats/learn/export/import/show Manage Learning Memory store (v1.6.0)
goldenmatch compare-clusters A.json B.json Compare two clustering outcomes (CCMS)
goldenmatch sensitivity FILE --sweep ... Parameter sensitivity analysis

Key Dedupe Flags

Flag Description
--llm-auto Auto-enable LLM scorer + memory when API key available
--anomalies Detect fake/suspicious records
--preview Show what will change before writing
--diff / --diff-html Before/after change report
--dashboard Data quality dashboard (HTML)
--html-report Detailed match report with charts
--chunked Large dataset mode
--llm-boost LLM-powered accuracy boost
s3:// / gs:// / az:// Cloud storage ingest
--daemon Run watch mode as a background service with health endpoint

GoldenMatch

PyPI npm

🟡 Golden Suite (Monorepo)

Suite Packages

Getting Started

Core Concepts

AI Integration

Advanced

Reference


pip install goldenmatch
npm install goldenmatch

Clone this wiki locally