benzsevern
diff --git a/‎README.md‎
Lines changed: 66 additions & 0 deletions b/‎README.md‎
Lines changed: 66 additions & 0 deletions
diff --git a/‎docs/configuration.md‎
Lines changed: 33 additions & 0 deletions b/‎docs/configuration.md‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎docs/python-api.md‎
Lines changed: 141 additions & 0 deletions b/‎docs/python-api.md‎
Lines changed: 141 additions & 0 deletions
diff --git a/‎docs/quick-start.md‎
Lines changed: 24 additions & 0 deletions b/‎docs/quick-start.md‎
Lines changed: 24 additions & 0 deletions
diff --git a/‎examples/README.md‎
Lines changed: 2 additions & 0 deletions b/‎examples/README.md‎
Lines changed: 2 additions & 0 deletions
@@ -29,6 +29,8 @@ goldenmatch dedupe customers.csv
 [![DQBench ER](https://img.shields.io/badge/DQBench%20ER-95.30-gold)](https://github.com/benzsevern/dqbench)
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/benzsevern/goldenmatch/blob/main/scripts/gpu_colab_notebook.ipynb)
 
+> **v1.5.0 is out** — auto-config now runs a preflight + postflight verification layer. Bibliographic and domain-extracted schemas no longer crash under zero-config, remote-asset scorers are demoted by default, and every `DedupeResult` carries an inspectable `postflight_report`. See [Auto-Config Verification](#auto-config-verification-v150).
+
 ---
 
 ## Why GoldenMatch?
@@ -191,6 +193,70 @@ result = gm.dedupe("products.csv", fuzzy={"title": 0.80}, llm_scorer=True)
 result = gm.dedupe("huge.parquet", exact=["email"], backend="ray")
 ```
 
+### Auto-Config Verification (v1.5.0)
+
+Zero-config used to crash on bibliographic and domain-extracted schemas — auto-config would emit a matchkey referencing `__title_key__` without enabling `config.domain`, and the pipeline would raise `ValueError: Missing required columns`. v1.5.0 closes the gap with a preflight + postflight verification layer that runs automatically around `auto_configure_df`.
+
+**Preflight** (`gm.preflight`) runs 6 checks at the end of `auto_configure_df`:
+
+- column resolution (auto-repairs missing domain-extracted columns by enabling `config.domain`)
+- cardinality bounds on exact matchkeys (drops near-unique and near-constant keys)
+- block-size sanity (flags blocks that would stall the scorer)
+- remote-asset demotion (any `embedding`, `record_embedding`, or cross-encoder rerank is demoted unless you pass `allow_remote_assets=True`)
+- confidence-gated weight capping (low-confidence fields cap at weight 0.3)
+
+Unrepairable issues raise `ConfigValidationError` with the full `PreflightReport` attached as `err.report`. Repaired issues stay on the report as `findings` with `repaired=True`.
+
+**Postflight** (`gm.postflight`) runs 4 signals after scoring, before clustering:
+
+- score-distribution histogram + bimodality detection (auto-nudges threshold on clear bimodality)
+- blocking-recall estimate (gated at 10K+ rows)
+- preliminary cluster sizes + oversized-cluster bottleneck pair
+- threshold-band overlap percentage (advises `--llm-auto` when overlap > 20% and LLM is off)
+
+The report attaches to `DedupeResult.postflight_report` / `MatchResult.postflight_report`.
+
+```python
+import goldenmatch as gm
+import polars as pl
+
+df = pl.read_csv("bibliography.csv")
+
+# Zero-config -- preflight + postflight run automatically
+result = gm.dedupe_df(df)
+
+# Inspect the preflight report (private-by-convention underscore)
+for finding in result.config._preflight_report.findings:
+    print(f"[{finding.severity}] {finding.check}: {finding.message}")
+
+# Inspect postflight signals (public)
+sig = result.postflight_report.signals
+print(f"Scored {sig['total_pairs_scored']} pairs")
+print(f"Threshold overlap: {sig['threshold_overlap_pct']:.1%}")
+print(f"Oversized clusters: {len(sig['oversized_clusters'])}")
+```
+
+**Offline by default.** Remote-asset scorers are demoted unless you opt in:
+
+```python
+cfg = gm.auto_configure_df(df, allow_remote_assets=True)  # loads cross-encoder etc.
+```
+
+**Strict mode for parity runs.** `strict=True` still computes postflight signals and emits advisories, but skips threshold adjustments — use it for DQBench, regression suites, and any reproducible output:
+
+```python
+cfg = gm.auto_configure_df(df, strict=True)
+```
+
+**New classifier smarts in v1.5.0:**
+
+- Columns with cardinality ≥ 0.95 are classified as `identifier`, not `phone` / `zip` / `numeric`.
+- New `year` col_type routes to blocking, not scoring.
+- New `multi_name` col_type handles comma/semicolon-delimited author-style fields.
+- Low-confidence fields (< 0.5) cap at weight 0.3.
+
+See `examples/verification_inspection.py` and `examples/strict_mode_parity.py` for runnable walkthroughs.
+
 ### Privacy-Preserving Linkage
 
 ```python
 
@@ -364,3 +364,36 @@ Or auto-generate from data:
 ```python
 config = gm.auto_configure([("data.csv", "source")])
 ```
+
+---
+
+## Verification (v1.5.0)
+
+`auto_configure_df` runs **preflight** at the end of config generation — 6 checks that auto-repair missing domain-extracted columns, drop useless-cardinality exact matchkeys, flag oversized blocks, demote remote-asset scorers, and cap low-confidence weights. Unrepairable issues raise `ConfigValidationError`; the full report is attached to the exception as `err.report`.
+
+The pipeline runs **postflight** after scoring and before clustering — 4 signals (score histogram + bimodality, blocking recall, cluster sizes + bottleneck pairs, threshold-band overlap) that can auto-nudge the threshold on clear bimodality and attach the report to `DedupeResult.postflight_report` / `MatchResult.postflight_report`.
+
+Two new kwargs on `auto_configure_df`:
+
+```python
+import goldenmatch as gm
+
+# Offline-safe (default): remote-asset scorers demoted, postflight may adjust threshold
+cfg = gm.auto_configure_df(df)
+
+# Opt in to cross-encoder rerank / embedding scorers
+cfg = gm.auto_configure_df(df, allow_remote_assets=True)
+
+# Strict: compute signals + advisories, but suppress auto-adjustments (DQBench, regression)
+cfg = gm.auto_configure_df(df, strict=True)
+```
+
+The preflight report is available on the returned config (underscore is private-by-convention but stable across v1.5.x):
+
+```python
+cfg = gm.auto_configure_df(df)
+for finding in cfg._preflight_report.findings:
+    print(f"[{finding.severity}] {finding.check}: {finding.message}")
+```
+
+See the [Verification section in the Python API docs](python-api.html#verification-v150) for the full `preflight` / `postflight` signatures and the `PostflightSignals` schema.
@@ -462,6 +462,7 @@ class DedupeResult:
     stats: dict                       # total_records, total_clusters, match_rate
     scored_pairs: list[tuple]         # (id_a, id_b, score) tuples
     config: GoldenMatchConfig
+    postflight_report: PostflightReport | None  # v1.5.0: signals + advisories + adjustments
 
     def to_csv(path, which="golden")  # Write results to CSV
     match_rate: float                 # Property: percentage of dupes
@@ -479,6 +480,7 @@ class MatchResult:
     matched: pl.DataFrame | None      # Matched target records with scores
     unmatched: pl.DataFrame | None    # Unmatched target records
     stats: dict
+    postflight_report: PostflightReport | None  # v1.5.0: signals + advisories + adjustments
 
     def to_csv(path)
 ```
@@ -680,9 +682,148 @@ gm.profile_dataframe(df) -> dict
 
 ```python
 gm.auto_configure(file_specs) -> GoldenMatchConfig
+gm.auto_configure_df(
+    df,
+    llm_provider=None,
+    domain_config=None,
+    llm_auto=False,
+    strict=False,              # v1.5.0
+    allow_remote_assets=False, # v1.5.0
+) -> GoldenMatchConfig
 gm.suggest_threshold(df, matchkey) -> float
 ```
 
+New v1.5.0 kwargs on `auto_configure_df`:
+
+- `strict` — compute postflight signals and emit advisories, but suppress auto-adjustments (threshold nudges, etc.). Use for DQBench / regression / reproducibility runs.
+- `allow_remote_assets` — permit `embedding`, `record_embedding`, and cross-encoder rerank scorers. Default `False` demotes them so auto-config is offline-safe and never triggers a surprise HuggingFace download.
+
+The returned config carries `config._preflight_report: PreflightReport` (underscore — private-by-convention but stable across v1.5.x).
+
+---
+
+## Verification (v1.5.0)
+
+The preflight + postflight layer validates an auto-generated config against the data it was built for. `auto_configure_df` runs preflight automatically at the end; the pipeline runs postflight automatically after scoring. Both are also callable directly.
+
+### preflight
+
+```python
+gm.preflight(
+    df: pl.DataFrame,
+    config: GoldenMatchConfig,
+    *,
+    profiles: list[ColumnProfile] | None = None,
+    allow_remote_assets: bool = False,
+) -> PreflightReport
+```
+
+Runs 6 checks on `(df, config)`:
+
+1. **Column resolution** — every column referenced by blocking/matchkeys exists, or is a pipeline-synthesized `__mk_*`, or is a domain-extracted column recoverable by enabling `config.domain` (auto-repaired when a domain profile was stashed during auto-config).
+2. **Exact-matchkey cardinality** — drops keys with ratio >= 0.99 (near-unique, no pair ever agrees) or < 0.01 (near-constant, produces giant blocks).
+3. **Block-size sanity** — samples blocking keys and flags blocks that would stall the scorer.
+4. **Remote-asset demotion** — `embedding` / `record_embedding` / cross-encoder rerank scorers are demoted unless `allow_remote_assets=True`.
+5. **Weight confidence capping** — matchkey fields with profile confidence < 0.5 cap at weight 0.3 (requires `profiles` kwarg).
+6. **Domain auto-repair** — when a column like `__title_key__` is missing but a domain profile is available, enables `config.domain` so the pipeline produces the column at runtime.
+
+Auto-repairs what it can (setting `finding.repaired = True`) and records unrepairable issues as `severity="error"` findings. `auto_configure_df` raises `ConfigValidationError` if `report.has_errors`.
+
+```python
+report = gm.preflight(df, config)
+for f in report.findings:
+    print(f"[{f.severity}] {f.check}: {f.message} (repaired={f.repaired})")
+```
+
+### postflight
+
+```python
+gm.postflight(
+    df: pl.DataFrame,
+    config: GoldenMatchConfig,
+    *,
+    pair_scores: list[tuple[int, int, float]],
+    current_threshold: float | None = None,
+) -> PostflightReport
+```
+
+Runs 4 signals on scored pairs:
+
+- **Score histogram + bimodality** — if the score distribution is clearly bimodal (valley depth ratio < 0.5) and the valley is > 0.05 away from the current threshold, emits a `PostflightAdjustment` nudging the threshold to the valley. Suppressed under `strict=True`.
+- **Blocking recall estimate** — gated at >= 10K rows; returns `"deferred"` below that.
+- **Preliminary cluster sizes + oversized-cluster bottleneck pair** — p50/p95/p99/max plus a list of oversized clusters with their weakest edge.
+- **Threshold-band overlap** — fraction of pairs within 0.02 of the threshold. Advises `--llm-auto` when > 20% and LLM scorer is off.
+
+```python
+report = gm.postflight(df, config, pair_scores=scored, current_threshold=0.85)
+print(report.signals["threshold_overlap_pct"])
+for adj in report.adjustments:
+    print(f"{adj.field}: {adj.from_value} -> {adj.to_value} ({adj.reason})")
+```
+
+### Report shapes
+
+```python
+@dataclass
+class PreflightFinding:
+    check: str                    # "missing_column" | "cardinality" | "block_size" |
+                                  # "remote_asset" | "weight_confidence"
+    severity: str                 # "error" | "warning" | "info"
+    subject: str                  # column / matchkey name
+    message: str
+    repaired: bool
+    repair_note: str | None
+
+@dataclass
+class PreflightReport:
+    findings: list[PreflightFinding]
+    config_was_modified: bool
+    has_errors: bool              # property: True if any unrepaired error
+
+class ConfigValidationError(Exception):
+    report: PreflightReport       # full report attached for programmatic inspection
+
+@dataclass
+class PostflightAdjustment:
+    field: str                    # e.g. "threshold"
+    from_value: Any
+    to_value: Any
+    reason: str
+    signal: str                   # which signal motivated the change
+
+@dataclass
+class PostflightReport:
+    signals: PostflightSignals    # TypedDict, schema below
+    adjustments: list[PostflightAdjustment]
+    advisories: list[str]
+```
+
+### PostflightSignals schema
+
+The `signals` dict is a stable TypedDict contract (defined in `goldenmatch/core/autoconfig_verify.py`):
+
+```python
+class PostflightSignals(TypedDict):
+    score_histogram: ScoreHistogram           # {"bins": list[float], "counts": list[int]}
+    blocking_recall: float | Literal["deferred"]  # "deferred" when <10K rows
+    block_size_percentiles: BlockSizePercentiles  # {"p50", "p95", "p99", "max"}
+    threshold_overlap_pct: float              # fraction of pairs within 0.02 of threshold
+    total_pairs_scored: int
+    current_threshold: float
+    preliminary_cluster_sizes: ClusterSizePercentiles
+        # {"p50", "p95", "p99", "max", "count"}
+    oversized_clusters: list[OversizedCluster]
+        # each: {"cluster_id": int, "size": int, "bottleneck_pair": [int, int]}
+```
+
+`ScoreHistogram`, `BlockSizePercentiles`, `ClusterSizePercentiles`, `OversizedCluster` are all TypedDicts — import them from `goldenmatch.core.autoconfig_verify` if you want to type-check consumer code.
+
+### Where the reports live
+
+- `config._preflight_report: PreflightReport | None` — set by `auto_configure_df`. Underscore-prefixed, documented as private-by-convention; stable contract.
+- `DedupeResult.postflight_report: PostflightReport | None` — set by the pipeline after scoring.
+- `MatchResult.postflight_report: PostflightReport | None` — same for match flows.
+
 ---
 
 ## Active learning
 
@@ -49,6 +49,30 @@ result.golden  # Polars DataFrame of canonical records
 
 ---
 
+## Inspecting the verification report (v1.5.0)
+
+Zero-config runs attach a `PostflightReport` to the result — score-distribution signals, cluster-size percentiles, threshold-band overlap, plus any auto-applied adjustments and human-readable advisories.
+
+```python
+result = gm.dedupe_df(df)
+if result.postflight_report:
+    for adv in result.postflight_report.advisories:
+        print(f"advisory: {adv}")
+    for adj in result.postflight_report.adjustments:
+        print(f"adjusted {adj.field}: {adj.from_value} -> {adj.to_value} ({adj.reason})")
+```
+
+The auto-generated config also carries a `PreflightReport` for the checks that ran during `auto_configure_df`:
+
+```python
+for finding in result.config._preflight_report.findings:
+    print(f"[{finding.severity}] {finding.check}: {finding.message}")
+```
+
+See [Verification](python-api.html#verification-v150) in the Python API docs for the full signatures and signal schema.
+
+---
+
 ## Match two files
 
 ```python
 
@@ -26,6 +26,8 @@ python basic_dedupe.py
 | `agent_demo.py` | Autonomous ER agent with confidence gating and review queue | goldenmatch |
 | `benchmark.py` | DQBench ER benchmark (precision, recall, F1, throughput) | goldenmatch, dqbench |
 | `equipment_dedup.py` | Equipment/auction dedup: multi-pass blocking, ANN fallback, weighted fuzzy, LLM calibration | goldenmatch, OPENAI_API_KEY |
+| `verification_inspection.py` | v1.5.0 preflight + postflight walkthrough -- inspect findings, signals, advisories, and adjustments | goldenmatch |
+| `strict_mode_parity.py` | v1.5.0 `strict=True` for deterministic parity / regression runs | goldenmatch |
 
 ## For Coding AIs