Skip to content

Commit 0e89d64

Browse files
authored
Merge pull request #118 from harvard-lil/vintage-benchmark-vector
Vintage-audience benchmark vector + ADR
2 parents ff06813 + ef79d96 commit 0e89d64

10 files changed

Lines changed: 467 additions & 0 deletions

File tree

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# The Vintage Audience: a Kept Benchmark for Metadata-Over-Data Reading
2+
3+
**Date:** 2026-06-22
4+
**Status:** Accepted (benchmark landed; features deliberately deferred)
5+
6+
## Context
7+
8+
Binoc is tuned for one audience today: readers comparing two snapshots of the
9+
*same* dataset who want every change explained — every edited cell, every added
10+
row. Call them the **same-data** audience.
11+
12+
There is a second, latent audience: readers comparing two **vintages** — two
13+
editions of the same published dataset (a yearly facilities register, a
14+
re-released survey). A vintage reader cares about the *shape* of the data: did a
15+
column appear, did a categorical vocabulary shift. They deliberately do **not**
16+
want to read the bulk cell/row churn — for them it is noise, possibly millions
17+
of rows of it.
18+
19+
We are not ready to serve the vintage audience. The same-data experience needs
20+
to be bulletproof first, and inviting vintage feedback now would split our
21+
attention and our optics before the core is solid. But we do need confidence
22+
that the *engine* does not foreclose the vintage audience — that when we choose
23+
to open that channel, it is a matter of configuration and plugins, not a
24+
re-architecture. The risk we want to retire is a silent one: that some
25+
assumption baked into the controller, the IR, or the correspondence engine
26+
quietly assumes the same-data stance.
27+
28+
## Decision
29+
30+
Land a **kept benchmark vector**, `test-vectors/csv-vintage-benchmark`, that
31+
exercises the vintage stance end to end and stays green, rather than building any
32+
vintage feature. The vector is a two-CSV "published dataset" across two editions:
33+
`facilities.csv` gains a `region` column and one row's `status` moves to a new
34+
category value (`decommissioned`); `inspections.csv` changes only in its data
35+
(edited scores, appended rows). A markdown `groups` config expresses the vintage
36+
stance as significance — schema/vocabulary tags are the high-priority group, bulk
37+
cell/row tags the low-priority group.
38+
39+
The benchmark confirms what already works: because significance is a renderer
40+
concern (per [2026-03-09 renderer config](2026-03-09-renderer_config.md)) and
41+
`classify_tags` promotes a node to the highest-priority group among its tags, the
42+
schema-touching `facilities.csv` floats up to the structural section while the
43+
pure-data `inspections.csv` sinks to the bulk section. That file-granularity
44+
separation is the best vintage view binoc offers today, and it is pure
45+
config — the type-ignorant controller (AGENTS rule 1) never participates.
46+
47+
The vector ships two renderings side by side: `expected-output/changelog.snap`
48+
is the real, harness-checked engine output; `VINTAGE-IDEAL.md` is a hand-authored
49+
target that is *not* harness-checked. The benchmark is kept green so the gap
50+
between the two stays visible and measurable. The ideal names three gaps, each
51+
reachable without touching the controller, the IR, or the correspondence engine:
52+
53+
1. **Within-node keep/drop.** A CSV's `region` addition and its `status` cell
54+
edit are edits on one node, so the renderer cannot surface the structural
55+
change while holding the cell back. The fix is a config-driven, edit-level
56+
keep/drop filter in the renderer. The data path already carries
57+
`EditProjection.visible`; today only writers set it. This is the smallest
58+
unlock and it lives entirely in the renderer.
59+
60+
2. **Vocabulary as a first-class change.** The `active -> decommissioned` shift
61+
is reported as an ordinary `binoc.cell-change`, not as "the `status`
62+
vocabulary gained a value." The fix is a plugin `EditListWriter` over
63+
`tabular_v1` that diffs the distinct-value set of each categorical column and
64+
emits a `binoc.vocabulary-change` edit — a plugin pack, exactly like the
65+
standard library is (AGENTS rule 2).
66+
67+
3. **Summary statistics over enumeration.** The bulk section enumerates every
68+
changed cell and added row; a vintage reader wants "4 -> 6 rows, 3 cells
69+
changed." The fix is the same plugin emitting an aggregate via
70+
`Edit::with_summary` or a dataset-level `GlobalClaim`. The seam already
71+
carries such facts — stdlib uses `with_summary` for binary string-diffs — but
72+
no rule emits a tabular roll-up yet.
73+
74+
The conclusion we are recording: the vintage-vs-same-data distinction is a
75+
renderer-config + plugin-pack concern, which is the architecture's whole thesis.
76+
The minimum to open the channel is one renderer-local filter plus one plugin
77+
pack. No engine surgery. The channel is provably clear, and we are choosing not
78+
to walk through it yet.
79+
80+
## Alternatives Considered
81+
82+
**Build the metadata-only filter and a sample statistics plugin now.** This is
83+
the natural next step and the benchmark is designed to make it cheap. We are
84+
deferring it for social and focus reasons, not technical ones: shipping a vintage
85+
surface would invite vintage feedback before the same-data experience is solid.
86+
The benchmark captures the design so the work is shovel-ready when we choose it.
87+
88+
**Write the rationale as prose only, with no vector.** A document can claim the
89+
engine is ready; a passing benchmark proves it and keeps proving it. Without an
90+
executable artifact, a future change could quietly regress the vintage stance
91+
(e.g., bake a same-data assumption into a writer) with nothing to catch it.
92+
93+
**Make the benchmark aspirational — hand-author the ideal as the gold file.**
94+
A snapshot that encodes output the engine does not produce would fail CI, forcing
95+
us to either disable the test (dead weight) or special-case it (harness
96+
complexity). Instead the harness-checked snapshot tracks reality and a separate,
97+
unchecked `VINTAGE-IDEAL.md` holds the target. The benchmark stays honest and
98+
green, and the gap is documented rather than asserted.
99+
100+
**Promote columns (and their vocabularies) to first-class IR nodes now.** This
101+
would make within-node significance and vocabulary diffing fall out naturally,
102+
but it is a substantial IR change in service of an audience we are deliberately
103+
not yet serving. The benchmark shows the same outcomes are reachable with a
104+
plugin writer emitting tagged edits, deferring any IR commitment until the
105+
vintage audience is real.

docs/adr/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Newer entries appear first. Each entry shows its date and current status. Create
66

77
| Date | Title | Status |
88
|---|---|---|
9+
| 2026-06-22 | [The Vintage Audience: a Kept Benchmark for Metadata-Over-Data Reading](2026-06-22-vintage_audience_and_metadata_only_benchmark.md) | Accepted (benchmark landed; features deferred) |
910
| 2026-06-15 | [Tiered Artifact Metadata: Column, Table, and a `parser_metadata_v1` Artifact](2026-06-15-tiered_artifact_metadata.md) | Implemented (channels + producers in CFM-80; rendering + significance in CFM-82) |
1011
| 2026-06-15 | [The Engine Overhaul, Told Whole: Single-Tree to Correspondence-First](2026-06-15-engine_overhaul_retrospective.md) | Retrospective |
1112
| 2026-06-15 | [Partition Identities: a JIT, Format-Owned Capability for N↔M Correspondence (CFM-72)](2026-06-15-partition_identities_jit_format_capability.md) | Implemented |
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Vintage benchmark — target experience
2+
3+
This file is the north star for the *vintage* (different-edition) audience. It is
4+
**not** checked by the harness; it is the hand-authored target that
5+
`expected-output/changelog.snap` should converge toward as the vintage story
6+
improves. Compare the two whenever you touch tabular significance, vocabulary
7+
detection, or summary statistics.
8+
9+
A vintage reader is comparing two editions of the same published dataset. They
10+
care about the *shape* of the data — did a column appear, did a category
11+
vocabulary shift — and they deliberately do **not** want to read the bulk
12+
cell/row churn. (This is the opposite stance from the same-data-with-edits
13+
reader binoc is primarily tuned for today, who wants every cell.)
14+
15+
## What binoc renders today
16+
17+
See `expected-output/changelog.snap`. Abbreviated:
18+
19+
```
20+
## Schema & vocabulary changes
21+
- facilities.csv: Column added: 'region'; 1 cell changed
22+
- row 2, column 'status': 'active' -> 'decommissioned'
23+
- Set Headers: ...; Add Column: 'region' ...
24+
## Bulk data updates
25+
- inspections.csv: 2 rows added; 3 cells changed
26+
- row 1, column 'score': '82' -> '85'
27+
- ... every changed cell and added row, in full ...
28+
```
29+
30+
The file-level separation is right. Three things fall short.
31+
32+
## What great looks like
33+
34+
```
35+
# Changelog: 2021 edition → 2022 edition
36+
37+
## Schema & vocabulary changes
38+
- facilities.csv
39+
- Column added: 'region' (4 values: north, east, south, west)
40+
- Vocabulary 'status' gained a value: 'decommissioned'
41+
(now: active, inactive, decommissioned)
42+
43+
## Bulk data updates — summarized, not enumerated
44+
- facilities.csv: 4 rows, 1 cell changed
45+
- inspections.csv: 4 → 6 rows (+2), 3 cells changed
46+
```
47+
48+
## The three gaps between today and the target
49+
50+
1. **Within-node significance / edit-level keep-drop.**
51+
`facilities.csv`'s `region` addition and its `status` cell edit are edits on
52+
one node, so the renderer cannot put the structural change in the top section
53+
and hold the cell back. The vintage reader still sees the cell bullet.
54+
*Needs:* a config-driven, edit-level drop/keep on the renderer (the data path
55+
already has `EditProjection.visible`, but only writers set it). This is the
56+
single smallest unlock and it lives entirely in the renderer — no engine or
57+
IR change.
58+
59+
2. **Vocabulary as a first-class change.**
60+
`active → decommissioned` is reported as `binoc.cell-change`, not "the
61+
`status` vocabulary gained a value." Columns are not first-class nodes and
62+
distinct-value-set diffing does not exist.
63+
*Needs:* a plugin `EditListWriter` over `tabular_v1` that computes the set of
64+
distinct values per categorical column on each side and emits the set-delta
65+
as a tagged edit (`binoc.vocabulary-change`). No engine change — a plugin
66+
pack, exactly like the standard library is.
67+
68+
3. **Summary statistics instead of enumeration.**
69+
The bulk section dumps every changed cell and added row. A vintage reader
70+
wants "4 → 6 rows, 3 cells changed."
71+
*Needs:* the same plugin writer emitting an aggregate via `Edit::with_summary`
72+
(or `GlobalClaim` for a dataset-level roll-up). The seam already carries such
73+
facts — binoc-stdlib uses `with_summary` for binary string-diffs today; no
74+
rule emits a tabular roll-up yet.
75+
76+
## Why this benchmark exists
77+
78+
It demonstrates that the *engine* does not foreclose the vintage audience: the
79+
target above is reachable with (1) one renderer-local keep/drop filter and (2)
80+
one plugin pack that emits vocabulary + statistic facts — no change to the
81+
type-ignorant controller, the IR, or the correspondence engine. The vintage vs.
82+
same-data distinction is a renderer-config + plugin-pack concern, which is the
83+
architecture's whole thesis (AGENTS rules 1 and 3).
84+
85+
It is kept as a passing benchmark so the gap stays visible and measurable. We
86+
are deliberately **not** building the unlocks yet (we want to nail the
87+
same-data audience first), but the channel is provably clear.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
source: binoc-stdlib/src/test_vectors.rs
3+
expression: "&md"
4+
---
5+
# Changelog: snapshot-asnapshot-b
6+
7+
## Schema & vocabulary changes
8+
9+
- **facilities.csv**: Column added: 'region'; 1 cell changed
10+
- Changed cells
11+
- row 2, column 'status': 'active' -> 'decommissioned'
12+
- Set Headers: from: ["facility_id","name","status"]; to: ["facility_id","name","status","region"]
13+
- Add Column: name: 'region'; values: {"total_values":4,"truncated":false,"values":["north","east","west","south"]}
14+
15+
## Bulk data updates
16+
17+
- **inspections.csv**: 2 rows added; 3 cells changed
18+
- Changed cells
19+
- row 1, column 'score': '82' -> '85'
20+
- row 3, column 'score': '90' -> '91'
21+
- row 4, column 'score': '68' -> '70'
22+
- Rows added
23+
- row 5: 'I104', 'F001', '88'
24+
- row 6: 'I105', 'F002', '73'
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
---
2+
source: binoc-stdlib/src/test_vectors.rs
3+
expression: "&stable_changeset"
4+
---
5+
{
6+
"from_snapshot": "snapshot-a",
7+
"to_snapshot": "snapshot-b",
8+
"claims": [],
9+
"root": {
10+
"action": "modify",
11+
"item_type": "directory",
12+
"path": "",
13+
"children": [
14+
{
15+
"action": "modify",
16+
"item_type": "tabular",
17+
"path": "facilities.csv",
18+
"sources": [
19+
{
20+
"path": "facilities.csv",
21+
"side": "from",
22+
"evidence": "binoc.pair.name",
23+
"action": "modify"
24+
}
25+
],
26+
"summary": [
27+
{
28+
"text": "Column added: 'region'; 1 cell changed"
29+
}
30+
],
31+
"tags": [
32+
"binoc.cell-change",
33+
"binoc.column-addition",
34+
"binoc.schema-change"
35+
],
36+
"details": {
37+
"edits": [
38+
{
39+
"params": {
40+
"from": [
41+
"facility_id",
42+
"name",
43+
"status"
44+
],
45+
"to": [
46+
"facility_id",
47+
"name",
48+
"status",
49+
"region"
50+
]
51+
},
52+
"verb": "tabular.set_headers"
53+
},
54+
{
55+
"params": {
56+
"name": "region",
57+
"values": {
58+
"total_values": 4,
59+
"truncated": false,
60+
"values": [
61+
"north",
62+
"east",
63+
"west",
64+
"south"
65+
]
66+
}
67+
},
68+
"verb": "tabular.add_column"
69+
},
70+
{
71+
"params": {
72+
"column": "status",
73+
"from": "active",
74+
"row": 1,
75+
"to": "decommissioned"
76+
},
77+
"verb": "tabular.edit_cell"
78+
}
79+
]
80+
}
81+
},
82+
{
83+
"action": "modify",
84+
"item_type": "tabular",
85+
"path": "inspections.csv",
86+
"sources": [
87+
{
88+
"path": "inspections.csv",
89+
"side": "from",
90+
"evidence": "binoc.pair.name",
91+
"action": "modify"
92+
}
93+
],
94+
"summary": [
95+
{
96+
"text": "2 rows added; 3 cells changed"
97+
}
98+
],
99+
"tags": [
100+
"binoc.cell-change",
101+
"binoc.row-addition"
102+
],
103+
"details": {
104+
"edits": [
105+
{
106+
"params": {
107+
"column": "score",
108+
"from": "82",
109+
"row": 0,
110+
"to": "85"
111+
},
112+
"verb": "tabular.edit_cell"
113+
},
114+
{
115+
"params": {
116+
"column": "score",
117+
"from": "90",
118+
"row": 2,
119+
"to": "91"
120+
},
121+
"verb": "tabular.edit_cell"
122+
},
123+
{
124+
"params": {
125+
"column": "score",
126+
"from": "68",
127+
"row": 3,
128+
"to": "70"
129+
},
130+
"verb": "tabular.edit_cell"
131+
},
132+
{
133+
"params": {
134+
"rows": [
135+
{
136+
"total_values": 3,
137+
"truncated": false,
138+
"values": [
139+
"I104",
140+
"F001",
141+
"88"
142+
]
143+
},
144+
{
145+
"total_values": 3,
146+
"truncated": false,
147+
"values": [
148+
"I105",
149+
"F002",
150+
"73"
151+
]
152+
}
153+
],
154+
"start": 4
155+
},
156+
"verb": "tabular.append_rows"
157+
}
158+
]
159+
}
160+
}
161+
]
162+
}
163+
}

0 commit comments

Comments
 (0)