|
| 1 | +# The Vintage Audience: a Kept Benchmark for Metadata-Over-Data Reading |
| 2 | + |
| 3 | +**Date:** 2026-06-22 |
| 4 | +**Status:** Accepted (benchmark landed; features deliberately deferred) |
| 5 | + |
| 6 | +## Context |
| 7 | + |
| 8 | +Binoc is tuned for one audience today: readers comparing two snapshots of the |
| 9 | +*same* dataset who want every change explained — every edited cell, every added |
| 10 | +row. Call them the **same-data** audience. |
| 11 | + |
| 12 | +There is a second, latent audience: readers comparing two **vintages** — two |
| 13 | +editions of the same published dataset (a yearly facilities register, a |
| 14 | +re-released survey). A vintage reader cares about the *shape* of the data: did a |
| 15 | +column appear, did a categorical vocabulary shift. They deliberately do **not** |
| 16 | +want to read the bulk cell/row churn — for them it is noise, possibly millions |
| 17 | +of rows of it. |
| 18 | + |
| 19 | +We are not ready to serve the vintage audience. The same-data experience needs |
| 20 | +to be bulletproof first, and inviting vintage feedback now would split our |
| 21 | +attention and our optics before the core is solid. But we do need confidence |
| 22 | +that the *engine* does not foreclose the vintage audience — that when we choose |
| 23 | +to open that channel, it is a matter of configuration and plugins, not a |
| 24 | +re-architecture. The risk we want to retire is a silent one: that some |
| 25 | +assumption baked into the controller, the IR, or the correspondence engine |
| 26 | +quietly assumes the same-data stance. |
| 27 | + |
| 28 | +## Decision |
| 29 | + |
| 30 | +Land a **kept benchmark vector**, `test-vectors/csv-vintage-benchmark`, that |
| 31 | +exercises the vintage stance end to end and stays green, rather than building any |
| 32 | +vintage feature. The vector is a two-CSV "published dataset" across two editions: |
| 33 | +`facilities.csv` gains a `region` column and one row's `status` moves to a new |
| 34 | +category value (`decommissioned`); `inspections.csv` changes only in its data |
| 35 | +(edited scores, appended rows). A markdown `groups` config expresses the vintage |
| 36 | +stance as significance — schema/vocabulary tags are the high-priority group, bulk |
| 37 | +cell/row tags the low-priority group. |
| 38 | + |
| 39 | +The benchmark confirms what already works: because significance is a renderer |
| 40 | +concern (per [2026-03-09 renderer config](2026-03-09-renderer_config.md)) and |
| 41 | +`classify_tags` promotes a node to the highest-priority group among its tags, the |
| 42 | +schema-touching `facilities.csv` floats up to the structural section while the |
| 43 | +pure-data `inspections.csv` sinks to the bulk section. That file-granularity |
| 44 | +separation is the best vintage view binoc offers today, and it is pure |
| 45 | +config — the type-ignorant controller (AGENTS rule 1) never participates. |
| 46 | + |
| 47 | +The vector ships two renderings side by side: `expected-output/changelog.snap` |
| 48 | +is the real, harness-checked engine output; `VINTAGE-IDEAL.md` is a hand-authored |
| 49 | +target that is *not* harness-checked. The benchmark is kept green so the gap |
| 50 | +between the two stays visible and measurable. The ideal names three gaps, each |
| 51 | +reachable without touching the controller, the IR, or the correspondence engine: |
| 52 | + |
| 53 | +1. **Within-node keep/drop.** A CSV's `region` addition and its `status` cell |
| 54 | + edit are edits on one node, so the renderer cannot surface the structural |
| 55 | + change while holding the cell back. The fix is a config-driven, edit-level |
| 56 | + keep/drop filter in the renderer. The data path already carries |
| 57 | + `EditProjection.visible`; today only writers set it. This is the smallest |
| 58 | + unlock and it lives entirely in the renderer. |
| 59 | + |
| 60 | +2. **Vocabulary as a first-class change.** The `active -> decommissioned` shift |
| 61 | + is reported as an ordinary `binoc.cell-change`, not as "the `status` |
| 62 | + vocabulary gained a value." The fix is a plugin `EditListWriter` over |
| 63 | + `tabular_v1` that diffs the distinct-value set of each categorical column and |
| 64 | + emits a `binoc.vocabulary-change` edit — a plugin pack, exactly like the |
| 65 | + standard library is (AGENTS rule 2). |
| 66 | + |
| 67 | +3. **Summary statistics over enumeration.** The bulk section enumerates every |
| 68 | + changed cell and added row; a vintage reader wants "4 -> 6 rows, 3 cells |
| 69 | + changed." The fix is the same plugin emitting an aggregate via |
| 70 | + `Edit::with_summary` or a dataset-level `GlobalClaim`. The seam already |
| 71 | + carries such facts — stdlib uses `with_summary` for binary string-diffs — but |
| 72 | + no rule emits a tabular roll-up yet. |
| 73 | + |
| 74 | +The conclusion we are recording: the vintage-vs-same-data distinction is a |
| 75 | +renderer-config + plugin-pack concern, which is the architecture's whole thesis. |
| 76 | +The minimum to open the channel is one renderer-local filter plus one plugin |
| 77 | +pack. No engine surgery. The channel is provably clear, and we are choosing not |
| 78 | +to walk through it yet. |
| 79 | + |
| 80 | +## Alternatives Considered |
| 81 | + |
| 82 | +**Build the metadata-only filter and a sample statistics plugin now.** This is |
| 83 | +the natural next step and the benchmark is designed to make it cheap. We are |
| 84 | +deferring it for social and focus reasons, not technical ones: shipping a vintage |
| 85 | +surface would invite vintage feedback before the same-data experience is solid. |
| 86 | +The benchmark captures the design so the work is shovel-ready when we choose it. |
| 87 | + |
| 88 | +**Write the rationale as prose only, with no vector.** A document can claim the |
| 89 | +engine is ready; a passing benchmark proves it and keeps proving it. Without an |
| 90 | +executable artifact, a future change could quietly regress the vintage stance |
| 91 | +(e.g., bake a same-data assumption into a writer) with nothing to catch it. |
| 92 | + |
| 93 | +**Make the benchmark aspirational — hand-author the ideal as the gold file.** |
| 94 | +A snapshot that encodes output the engine does not produce would fail CI, forcing |
| 95 | +us to either disable the test (dead weight) or special-case it (harness |
| 96 | +complexity). Instead the harness-checked snapshot tracks reality and a separate, |
| 97 | +unchecked `VINTAGE-IDEAL.md` holds the target. The benchmark stays honest and |
| 98 | +green, and the gap is documented rather than asserted. |
| 99 | + |
| 100 | +**Promote columns (and their vocabularies) to first-class IR nodes now.** This |
| 101 | +would make within-node significance and vocabulary diffing fall out naturally, |
| 102 | +but it is a substantial IR change in service of an audience we are deliberately |
| 103 | +not yet serving. The benchmark shows the same outcomes are reachable with a |
| 104 | +plugin writer emitting tagged edits, deferring any IR commitment until the |
| 105 | +vintage audience is real. |
0 commit comments