-
Notifications
You must be signed in to change notification settings - Fork 10
Home
Deduplicate records, match across sources, and maintain golden records.
pip install goldenmatch
goldenmatch demo
goldenmatch setupPath Y (negative evidence on exact matchkeys) delivers the largest single-release jump in suite history. DQbench ER composite 66.99 → 91.04 (+24.05pp) by filtering adversarial-collision pairs at the exact matchkey level — same name+email shared across distinct entities now get scrubbed when phone+address disagree.
| Dataset | F1 (zero-config v1.12) | Hand-tuned ceiling |
|---|---|---|
| DBLP-ACM | 0.964 | 0.918 (above ceiling) |
| Febrl3 | 0.944 | 0.971 (97% of ceiling) |
| NCVR | 0.972 | — |
| DQbench ER (no LLM) | 91.04 | 95.30 (with LLM) |
DQbench tier breakdown: T1 89.3% / T2 97.5% / T3 85.5%. v1.10 + v1.11 + v1.12 progression documented on the Auto-Config Controller page.
See release notes: v1.10.0 (5 indicators), v1.11.0 (NE foundation), v1.12.0 (Path Y).
| Page | What you'll learn |
|---|---|
| Installation | pip install with optional extras |
| Quick Start | 5 usage scenarios from zero-config to database sync |
| Auto-Config Controller | Zero-config, controller iteration, refit rules, indicators, negative evidence (v1.8 → v1.12) |
| Interactive TUI | Gold-themed terminal UI with keyboard shortcuts |
| Page | What it covers |
|---|---|
| Pipeline Overview | Ingest, Block, Score, Cluster, Golden Records |
| Blocking Strategies | 8+ strategies: static, adaptive, multi-pass, ANN, canopy, learned |
| Python API |
import goldenmatch as gm -- 95 exports, zero-config to advanced |
| Page | What it does |
|---|---|
| GPU Routing & Vertex AI | Managed embeddings via Google Cloud |
| Database Integration | Incremental Postgres sync with golden record versioning |
| LLM Boost | Claude/GPT-4 labeling + fine-tuning |
| Learning Memory | Persistent corrections + threshold learning (v1.6.0) |
| dbt Integration | Post-hooks, Postgres sync, Snowflake/BigQuery recipes |
| Page | Details |
|---|---|
| Benchmarks | 97.2% DBLP-ACM, 72.2% Abt-Buy, 92.4% PPRL FEBRL4 |
| Comparison | vs dedupe, Splink, Zingg, Ditto |
| Architecture | Project structure and module map |
| Command | Description |
|---|---|
goldenmatch demo |
Built-in demo with sample data |
goldenmatch setup |
Interactive setup wizard |
goldenmatch dedupe FILE [...] |
Deduplicate files |
goldenmatch match TARGET --against REF |
Match across files |
goldenmatch sync --table TABLE |
Sync against Postgres |
goldenmatch watch --table TABLE |
Live stream mode |
goldenmatch schedule --every 1h FILE |
Scheduled runs |
goldenmatch serve FILE [...] |
REST API server |
goldenmatch mcp-serve FILE [...] |
MCP server (Claude Desktop) |
goldenmatch rollback RUN_ID |
Undo a previous run |
goldenmatch unmerge RECORD_ID |
Remove a record from its cluster |
goldenmatch runs |
List run history |
goldenmatch init |
Interactive config wizard |
goldenmatch interactive FILE [...] |
Launch TUI |
goldenmatch profile FILE |
Profile data quality |
goldenmatch evaluate FILE --gt GT.csv |
Evaluate against ground truth (P/R/F1) |
goldenmatch incremental BASE --new NEW |
Match new records against existing base |
goldenmatch analyze-blocking FILE |
Analyze and suggest blocking strategies |
goldenmatch label FILE --config --gt |
Interactively label pairs to build ground truth CSV |
goldenmatch config save/load/list/show |
Manage config presets |
goldenmatch memory stats/learn/export/import/show |
Manage Learning Memory store (v1.6.0) |
goldenmatch compare-clusters A.json B.json |
Compare two clustering outcomes (CCMS) |
goldenmatch sensitivity FILE --sweep ... |
Parameter sensitivity analysis |
| Flag | Description |
|---|---|
--llm-auto |
Auto-enable LLM scorer + memory when API key available |
--anomalies |
Detect fake/suspicious records |
--preview |
Show what will change before writing |
--diff / --diff-html
|
Before/after change report |
--dashboard |
Data quality dashboard (HTML) |
--html-report |
Detailed match report with charts |
--chunked |
Large dataset mode |
--llm-boost |
LLM-powered accuracy boost |
s3:// / gs:// / az://
|
Cloud storage ingest |
--daemon |
Run watch mode as a background service with health endpoint |
⚡ GoldenMatch — Entity resolution toolkit | PyPI | GitHub | Open in Colab | MIT License
🟡 Golden Suite (Monorepo)
Suite Packages
- GoldenCheck · data quality
- GoldenFlow · transforms
- GoldenPipe · orchestrator
- InferMap · schema mapping
Getting Started
- Installation
- Quick Start
- Auto-Config Controller · enhanced through v1.12
- Configuration
- Verification · new in v1.5
- CLI Reference
Core Concepts
AI Integration
Advanced
- PPRL
- Domain Packs
- Streaming / CDC
- Database Integration
- GPU & Vertex AI
- REST API
- Interactive TUI
- Web UI · new in v1.7
- Evaluation
Reference
pip install goldenmatch
npm install goldenmatch