Commit cc8ba4c
authored
feat(js): auto-config verification layer (preflight + postflight) for goldenmatch-js v0.3.0 (#48)
* feat(js-profiler): add confidence field to ColumnProfile
Prerequisite for upcoming autoconfig verification layer (Check 6 + weight
cap in classifier fixes). Matches Python's confidence semantics:
0.9 (both heuristics agree), 0.7 (one heuristic), 0.3 (string fallthrough).
* feat(js-autoconfig): cardinality guard for identifier classification
* feat(js-autoconfig): extend ID_NAME_PATTERNS for voter_reg_num / account_no / guid / uuid
* feat(js-autoconfig): add year col_type (blocking signal for biblio data)
* feat(js-autoconfig): multi_name col_type for delimited author-style fields
* feat(js-autoconfig): cap weight at 0.3 for low-confidence fields
* feat(js-verify): add autoconfigVerify module skeleton with types
Types, dataclasses, ConfigValidationError, makePreflightReport factory,
stripConventionPrivate utility. preflight/postflight bodies stubbed;
real implementations land in Phase 2 and Phase 3.
* feat(js-verify): add underscore escape-hatch fields + postflightReport to results
GoldenMatchConfig gains _preflightReport / _strictAutoconfig /
_domainProfile as non-readonly optional fields (minimum escape hatch for
strict-readonly config; see spec \u00a77). DedupeResult and MatchResult gain
optional readonly postflightReport. List is closed \u2014 future internal
state uses side-table pattern instead.
* feat(js-verify): re-export autoconfigVerify public symbols
* feat(js-verify): preflight Check 1 — column resolution + domain auto-repair
- Add DOMAIN_EXTRACTED_COLS constant to domain.ts (__brand__, __model__,
__version__) so preflight can distinguish 'missing but producible' column
references from hard errors.
- Replace preflight stub with a real implementation that walks every
matchkey + blocking key reference, flags anything not present in the
first row, and auto-repairs config.domain when _domainProfile is stashed
on the config and a __<col>__ reference turns up.
- Keep types.ts -> autoconfigVerify.ts runtime cycle broken by using only
'import type' from types.ts inside autoconfigVerify.ts.
* feat(js-verify): preflight Checks 2 & 3 — cardinality bounds on exact matchkeys
Drop exact matchkeys whose referenced column is either near-unique
(cardinality_ratio >= 0.99 — every row its own block) or near-constant
(ratio <= 0.01 — one giant block). Both drops are repaired warnings.
If the drops empty the matchkey list, emit a no_matchkeys_remain error
so ConfigValidationError fires downstream instead of producing a silent
no-op run.
* feat(js-verify): preflight Check 4 — block-size sanity
Walks every BlockingKeyConfig, groups a <= 10k-row sample by raw field
concatenation, and warns when the p99 block size exceeds 5000 (scoring
will be slow) or the p50 is <2 (blocking is too selective, most records
end up alone). Raw-value grouping is a coarse proxy for the real
transform-applied blocker but catches the typical 'everyone has the
same state' / 'ID column used as blocker' failures.
* feat(js-verify): preflight Check 5 — demote remote-asset scorers
Walks every matchkey field and demotes/drops scorers that depend on
remote model downloads:
- 'embedding' -> 'ensemble' (in-place scorer swap)
- 'record_embedding' fields dropped entirely
- weighted matchkey rerank=true -> false
Skipped entirely when allowRemoteAssets=true or llmScorer.enabled=true
(caller has already committed to remote round-trips). Matchkeys that
lose all their fields to the demotion emit remote_asset_matchkey_empty
and are removed.
* feat(js-verify): preflight Check 6 — cap weight for low-confidence fields
Looks up each weighted matchkey field against the passed-in profiles
(optional — no-op when profiles absent). If the classifier confidence
is <0.5 and the configured weight is >0.5, cap the weight at 0.5. This
keeps a field that was classified with low confidence from dominating
the weighted score just because it happened to land with a high weight
during autoconfig construction.
* feat(js-verify): integrate preflight into autoConfigure(Rows)
- Extend AutoconfigOptions with optional strict + allowRemoteAssets flags.
- Detect domain from columns and stash _domainProfile on the config when
confidence >0.7, so preflight Check 1 can auto-repair __<col>__ references.
- Run preflight at the end of autoConfigureRows with the profiled
ColumnProfile list as 'profiles' and the caller's allowRemoteAssets
setting; throw ConfigValidationError on unrepairable errors.
- Stamp _preflightReport on the returned config; stamp _strictAutoconfig
only when strict=true.
- Update two pre-existing autoconfig tests to reflect preflight's new
effect: exact_email / exact_phone matchkeys built from 100%-unique
fixtures are now dropped as cardinality_high (repaired warning in
the preflight report, weighted_identity still present).
* feat(js-verify): postflight score histogram + bimodality threshold nudge
* feat(js-verify): blocking-recall signal (deferred sentinel)
* feat(js-verify): preliminary cluster-size signal with bottleneck pair
* feat(js-verify): postflight threshold-band overlap signal
* feat(js-verify): postflight orchestrator complete + signals-schema contract test
* feat(js-verify): run postflight in runDedupe/runMatch pipelines
_applyPostflight helper shared between both pipelines (mirrors Python's
_apply_postflight). isPreflightReport guard rejects stale/wrong-type
objects. Threshold adjustments apply to pairScores before clustering;
empty-pair case logs an advisory. runMatchPipeline threads postflightReport
through from the delegated runDedupePipeline result.
* test(parity): export Python autoconfigVerify fixtures for TS parity
* test(js-verify): parity harness vs Python autoconfigVerify fixtures
* test(js-verify): property-based invariants for preflight / postflight
* docs(js-examples): verificationInspection + strictModeParity examples
* release: goldenmatch-js v0.3.0 — autoconfig verification layer
README: new Verification section.
Version: 0.1.0 -> 0.3.0.
JSDoc: every exported symbol in autoconfigVerify.ts documented.1 parent 1ced18d commit cc8ba4c
27 files changed
Lines changed: 4117 additions & 35 deletions
File tree
- packages/goldenmatch-js
- examples
- src/core
- tests
- parity
- unit
- tests/parity
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
| 12 | + | |
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
48 | 100 | | |
49 | 101 | | |
50 | 102 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
| 23 | + | |
22 | 24 | | |
23 | 25 | | |
24 | 26 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
Lines changed: 99 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
0 commit comments