PLAN.md — closing all `INNOVATION.md` gaps + shipping all 8 ideas

Input: INNOVATION.md (16 items: 8 workflow gaps in Part 1, 8 emerging-viz ideas in Part 2). Output: a sequenced, dependency-aware rollout plan that closes every item without breaking Glyph's invariants (deterministic, spec-only, agent-driven, snapshot byte-identical, Apache 2.0, no telemetry).

The plan is grouped into six batches (≈ 1 session each). Total estimated effort: ~4,500 LOC across ~20 PRs, on top of the 421 tests + 7 packages already on main.

TL;DR — the rollout in one table

Batch	Items	Effort	Surface delta	Test delta	Why this order
A. Calibration foundation	1.3, 1.6, 2.3, 2.6	~600 LOC, 4 PRs	+0 verbs (in-place changes)	+12 tests	All low-risk, all use already-shipped substrate (`provenance.sampleRows`, handle store). Lifts every chart Glyph emits at once.
B. Spec ergonomics + CI	1.7, 1.8, 2.8	~700 LOC, 3 PRs	+2 verbs (`glyph_spec_diff`, `glyph_spec_patch`)	+18 tests	Foundation for code-review-as-chart-review. 1.7 → 2.8 hard dep.
C. Agent workflow	1.1, 1.4, 2.5	~900 LOC, 4 PRs	+1 verb (`glyph_macro_replay`), +1 callback hook	+20 tests	Adds the LLM-pluggable planner + disambiguation + macro replay. The "production agent" batch.
D. Multi-modal + multi-agent	2.1, 2.4	~1,000 LOC, 3 PRs (incl. preview-server UI)	+2 verbs (`glyph_modality_sync`, `glyph_whyboard_diff`)	+14 tests	Highest user-visible polish; depends on B's spec_diff.
E. Audit + governance	2.2, 2.7	~900 LOC, 3 PRs	+1 verb (`glyph_audit_spec`), +1 metric field	+24 tests	Enterprise-grade. Each is its own module.
F. Streaming + telemetry	1.2, 1.5	~400 LOC, 2 PRs	+1 capability (streaming responses)	+10 tests	Smaller batch, mostly transport work.
Total	16 items	~4,500 LOC	+7 verbs, +1 capability, +1 hook	+98 tests

After all six batches: ~519 tests, ~43 MCP tools, 7 packages. Scoreboard moves from ~100/100 to ~108/100 effective (rubric is hard-capped at 100; the additional surface is in capabilities D3 / Vega-Lite / Tableau don't have).

Dependency map

Drawn as a DAG of "needs" relationships. Items not on this map are independent (start any time).

                         ┌─────────────────────────────────┐
                         │      Substrate already shipped  │
                         │  (DataHandle.provenance + URI,  │
                         │   link_group bus, audit log,    │
                         │   metric registry, Whyboard)    │
                         └────────────────┬────────────────┘
                                          │
        ┌─────────────────────────────────┼─────────────────────────────────┐
        ▼                                 ▼                                 ▼
  Batch A (independent)            Batch B (1.7 → 2.8)              Batch C (1.1 → 1.4)
  1.3 budget   ─┐                  1.7 spec_diff ──┬──→ 2.8 CI gate    1.1 LLM planner ─→ 1.4 disambig
  1.6 TTL      │                  1.8 spec_patch ─┘                  2.5 macro replay (uses audit log + 1.8)
  2.3 unc.     │
  2.6 scale    ┘
                                                                  Batch E (independent)
                                                                  2.2 audit
                                                                  2.7 causal

   Batch D (depends on 1.7 + 2.4 reuses Whyboard)             Batch F (independent)
   2.1 multi-modal sync ─── reuses link_group                  1.2 streaming
   2.4 whyboard diff ──── needs 1.7 spec_diff pattern          1.5 engagement

Batches A, B, C, E, F are mostly independent; D needs 1.7. So a small team could parallelize A + B + C + E + F immediately, with D landing after B's spec_diff is in.

Per-item plan

Each item below lists: goal · scope · files · surface delta · tests · effort · risk.

Batch A — Calibration foundation

1.3 — Context-window budget management

Goal: stop blowing up agent context windows when verbs return many rows.

Scope: add an optional budget_tokens (and/or limit_rows) argument to every row-returning verb (glyph_query, glyph_drill, glyph_anomaly, glyph_drift, glyph_decompose, glyph_forecast). The MCP server samples / truncates in DuckDB before serializing. Returns a sentinel { truncated, total, returned, sample_strategy: "top" | "bottom" | "stratified" }.

Files: packages/mcp/src/server.ts (verb handlers), packages/duckdb/src/engine.ts (TABLESAMPLE helper), small util module.

Surface delta: backward-compatible — existing calls still work; the budget arg is optional.

Tests: ~6 — one per verb verifying truncation kicks in at the threshold + the sentinel shape.

Effort: ~150 LOC. Risk: low.

1.6 — Handle TTL + auto-GC

Goal: long sessions accumulate 50+ derived handles. Auto-evict stale ones.

Scope: every DataHandle gets a last_accessed_at field. A small reaper sweep runs on each MCP serial-chain entry (so it's pay-as-you-go): handles unused for >N minutes with no descendant handles in the session get GC'd. Optional keep: true flag on glyph_publish pins a handle.

Files: packages/mcp/src/state.ts (Map → richer struct), packages/core/src/spec/types.ts (DataHandle gains lastAccessedAt), packages/mcp/src/server.ts (touch on access).

Surface delta: none — internal. Optional glyph_handles_gc(force?: boolean) verb for diagnostics.

Tests: ~3 — TTL elapses → eviction; pinned handle survives; lineage children keep parent alive.

Effort: ~120 LOC. Risk: low.

2.3 — Embedded uncertainty rendering ⭐⭐⭐⭐⭐

Goal: every chart Glyph emits shows confidence by default, not by opt-in.

Scope: extend the compiler to read handle.provenance.sampleRows (already shipped since PR32) and inject visual signals:

Bar marks with low sample → hatched fill via <pattern> defs.
Line marks with per-x-bucket variance → 95% CI ribbon overlay (computed from the rows feeding the line, using stat.std + 1.96).
Point marks with low sample → reduced opacity.
Corner badge when confidence ≠ "high": small text "n=12 · confidence: low" in the top-right.
Opt-out: spec.interactive.uncertainty: false.

Files: packages/core/src/compiler/compile.ts (new pass: injectUncertainty), packages/core/src/render/svg.ts (pattern defs, badge), packages/core/src/scenegraph/types.ts (new optional uncertainty field on Scene).

Surface delta: none from the MCP side; the change is purely in rendered output.

Tests: ~5 — low-sample bar hatched; line CI ribbon emitted; corner badge present; opt-out skips all of it; snapshot byte-identity holds when no provenance is set (existing baselines stay green).

Effort: ~250 LOC. Risk: medium (touches the renderer; snapshot tests must stay byte-identical when provenance is unset).

2.6 — LLM-assisted scale tuning

Goal: agents pick the wrong scale (linear for $100→$10M); suggest log/sqrt/diverging when warranted.

Scope: new pure-fn suggestScale({ schema, rows, encoding }) → array of { field, suggestion, reason, confidence }. Heuristics:

max/min ratio > 100 on a quantitative field → suggest log.
Symmetric around zero → suggest diverging.
Long-tailed (skewness > 2) → suggest sqrt or pow.
Highly clustered (CV < 0.1) → no change.

New MCP verb glyph_suggest_scale(handle_id, encoding?) that wraps it.

Files: packages/core/src/scales-suggest/index.ts (new module), packages/mcp/src/server.ts (verb).

Surface delta: +1 MCP verb. (Total 36 → 37.)

Tests: ~4 — log suggestion fires on $100→$10M; diverging on ±50 around zero; no suggestion on tight range; the explanation field is informative.

Effort: ~150 LOC. Risk: low.

Batch B — Spec ergonomics + CI

1.7 — Spec diff

Goal: structured diff between two specs ("what changed and why").

Scope: pure-fn diffSpecs(a, b) → { added, removed, changed, summary }. Uses RFC 6902 JSON Patch as the wire format for changed. The summary is a one-sentence narrative.

Files: packages/core/src/spec-diff/index.ts, exported from @glyph/core.

Surface delta: +1 MCP verb (glyph_spec_diff(spec_a, spec_b)).

Tests: ~8 — adding a layer, removing a layer, changing an encoding field, changing data.transform (narrative names the SQL change), no changes (empty diff), filtering on the layer level, metric channels ({metric: ...} swap).

Effort: ~250 LOC. Risk: low — pure function.

1.8 — Spec patch (incremental edit)

Goal: refining a chart shouldn't cost a full spec regeneration.

Scope: glyph_spec_patch(handle_id, patches: JsonPatchOp[]) applies RFC 6902 patches to the spec that produced handle_id, re-runs the pipeline with the patched spec, returns the new handle. Lineage records the patch as relation: "transform".

Files: packages/mcp/src/server.ts, packages/mcp/src/state.ts (must store the originating spec per handle).

Surface delta: +1 MCP verb. ServerState gains a specByHandle Map.

Tests: ~5 — replace x encoding, add a layer, remove a layer, replace data.source via patch, invalid patch returns clear error.

Effort: ~250 LOC. Risk: low.

2.8 — Spec-as-code CI gate ⭐⭐⭐⭐

Goal: a chart change in a PR shows up as a visual diff the reviewer can see.

Scope: new CLI command glyph diff <spec.json> <baseline.svg> returning 0 if SVG matches baseline, printing a unified diff URL otherwise. Then a GitHub Action template at .github/actions/glyph-visual-diff/action.yml that calls it and posts the diff to the PR.

Files: packages/cli/src/diff.ts, .github/actions/glyph-visual-diff/, docs in README.md.

Surface delta: +1 CLI subcommand. No new MCP verb (it's a CI concern).

Tests: ~5 — identical SVG returns 0; differing SVG returns non-zero with diff; binary PNG comparison via pixelmatch; CLI flags (--threshold, --output html).

Effort: ~200 LOC. Risk: low.

Batch C — Agent workflow

1.1 — LLM-driven planner

Goal: replace the heuristic Story Agent planner (PR41) with an LLM-pluggable one.

Scope: planStoryHeuristic stays as the fallback. New: planStory({ planner: "heuristic" | "callback", callback?: PlanCallback }). The callback receives intent + schema + rows-sample, returns the same StoryPlan shape. Host agent supplies the callback (Glyph stays LLM-agnostic).

For the MCP verb: optional planner_hint: "llm" argument; when set, the server returns a special awaiting_planner_response marker the host fulfills via a new glyph_story_provide_plan(plan_id, nodes[]) verb.

Files: packages/mcp/src/story.ts, packages/mcp/src/server.ts.

Surface delta: +1 MCP verb (glyph_story_provide_plan).

Tests: ~6 — heuristic stays default; LLM callback path round-trips a custom plan; provide_plan validates the node shape; rejects malformed plans.

Effort: ~250 LOC. Risk: medium (new server-side state machine).

1.4 — Disambiguation on ambiguous intent

Goal: when the user's intent leaves multiple field choices possible, ask back instead of guessing.

Scope: extend glyph_story_plan to optionally emit { status: "needs_clarification", questions: [{ field, prompt, options[] }] } when the planner's confidence on a chosen field is below a threshold. Host renders the questions; user picks; host calls glyph_story_clarify(plan_id, answers[]) which re-plans with the choices pinned.

Files: packages/mcp/src/story.ts, packages/mcp/src/server.ts.

Surface delta: +1 MCP verb (glyph_story_clarify).

Tests: ~4 — confidence-low triggers clarification; clarify with answer resolves; ambiguous field with no options falls back gracefully.

Effort: ~200 LOC. Risk: low.

2.5 — Workflow capture & replay ⭐⭐⭐⭐

Goal: capture a user's exploration as a runnable macro.

Scope: every MCP verb call is already auditable (see PR40's audit log). Extend the audit log so it carries enough to replay (the original args + spec, not just a summary). New CLI: glyph macro capture --since <iso> exports a .glyph-macro.json. New MCP verb glyph_macro_replay(macro, params?) walks it forward against new data, substituting {{params.X}} placeholders.

Files: packages/mcp/src/memory.ts (audit table extension), packages/cli/src/macro.ts, packages/mcp/src/server.ts.

Surface delta: +1 MCP verb (glyph_macro_replay), +1 CLI subcommand (glyph macro).

Tests: ~6 — capture a 5-step session; replay with same data → same handles; replay with new data → new handles with same shape; param substitution; broken macro fails fast with a clear error.

Effort: ~400 LOC. Risk: medium (replay semantics need to handle MCP verb evolution).

Batch D — Multi-modal + multi-agent

2.1 — Conversational drill + multi-modal sync ⭐⭐⭐⭐⭐

Goal: SVG + table + narrative kept in sync. Click in one → updates the other two.

Scope:

Server side: a new glyph_modality_sync(group, selection, modality) verb (companion to glyph_linked_publish). Carries modality: "chart" | "table" | "narrative" so consumers can avoid echoes.
glyph_render gains an optional modalities: ["chart", "table", "narrative"] array; the response carries SVG + rows-sample + auto-generated glyph_explain narrative in one envelope.
@glyph/preview-server HTML wraps three panes (chart / table / narrative) bound to the same link_group. Click → propagates filter → all panes re-render.

Files: packages/mcp/src/server.ts, packages/mcp/src/linked.ts (extend LinkedEvent with modality), packages/preview-server/src/* (UI), packages/live/src/index.ts (table + narrative hydrators).

Surface delta: +1 MCP verb. +1 spec field (modalities).

Tests: ~6 — render returns all three modalities; click in table modality emits event with modality=table; subscribers filter out echoes from their own modality; integration test with two clients on shared group.

Effort: ~600 LOC. Risk: medium (multi-pane UI in @glyph/preview-server).

2.4 — Multi-agent answer comparison ⭐⭐⭐⭐

Goal: side-by-side Whyboards expose disagreement between two agents.

Scope: pure-fn diffWhyboards(a, b) → { agreed_nodes, only_in_a, only_in_b, conflicting }. New MCP verb glyph_whyboard_diff(board_a, board_b). Preview-server gets a "compare" mode that renders two boards side-by-side with diff highlighting.

Files: packages/mcp/src/whyboard.ts (diff fn), packages/mcp/src/server.ts (verb), packages/preview-server/src/* (compare UI).

Surface delta: +1 MCP verb.

Tests: ~5 — identical boards yield empty diff; one extra branch → only_in_a; same branch with different summary → conflicting; whyboard_diff with mismatched source handles errors clearly.

Effort: ~400 LOC. Risk: low (pure-fn + small UI).

Batch E — Audit + governance

2.2 — Adversarial chart auditing ⭐⭐⭐⭐⭐

Goal: a linter for misleading charts. Catches the 10 most common visual manipulations.

Scope: new @glyph/audit package or core module with auditSpec(spec, scene) → AuditFinding[]. Rules (initial set):

Truncated y-axis — quantitative y with scale.domain not starting at 0 (severity: high for bar; medium for line).
Dual-axis correlation — two layers on dual y-axis with mismatched units (severity: medium).
Cherry-picked palette — diverging palette without explicit midpoint declared (severity: low; surfacing).
Logarithmic without disclosure — scale.type: log without title/subtitle/footnote naming it (severity: high).
Excessive aggregation — bar chart of mean over < 5 rows per bucket (severity: medium).
Time axis with gaps not flagged — temporal x missing intermediate periods (severity: low).
Color count > 8 — categorical color encoding with too many bins (severity: low).
Low contrast text on background — title / labels fail WCAG AA (severity: medium).
3D effects / chart junk — n/a in Glyph (we don't have these), but we audit for width:height aspect ratio < 0.5 or > 3 (severity: low).
Mismatched data window — data.transform filters narrow data but the title says "all-time" (this needs LLM; skip in v0).

Each finding has { rule_id, severity, message, suggestion?, severity }. The materializer optionally runs the audit; strictness: "warn" | "error" | "off" spec field controls behavior.

Files: packages/audit/src/index.ts (new package), MCP verb glyph_audit_spec(spec), integration into glyph_render.

Surface delta: +1 MCP verb, +1 package, +1 spec field (strictness).

Tests: ~12 — one per rule + integration (truncated-axis bar errors when strictness: "error"; warns otherwise; off skips).

Effort: ~500 LOC. Risk: low — purely additive.

2.7 — Causal-aware viz

Goal: visually distinguish correlated from asserted-causal relationships.

Scope:

Metric registry gains an optional causal_of?: string[] array on each metric definition. e.g. mrr.causal_of = ["new_customers", "avg_price", "churn"]. The metric layer's compiler reads this.
When the spec encodes a metric against a known-cause field, the renderer emits a → causal badge near the legend.
When the encoding is a known non-cause (or unknown), badge is ↔ correlated.
A new MCP verb glyph_causal_graph() returns the registered DAG so consumers can render the full graph separately.

Files: packages/core/src/metrics/index.ts (extend MetricDefinition), packages/core/src/render/svg.ts (badge), packages/mcp/src/server.ts (verb).

Surface delta: +1 MCP verb, +1 optional metric field.

Tests: ~5 — causal badge fires when encoding matches a known cause; correlated badge otherwise; glyph_causal_graph returns the DAG; cycle detection (don't loop on circular causal claims).

Effort: ~400 LOC. Risk: low.

Batch F — Streaming + telemetry

1.2 — Streaming partial results

Goal: don't block the agent's turn while a 100k-row chart materializes.

Scope: MCP SDK supports streaming responses. Extend the heaviest verbs (glyph_render, glyph_query, glyph_anomaly, glyph_whyboard, glyph_story_execute) to emit progressive chunks:

glyph_render: first chunk = the spec parse + handle id; second = scene compile; third = SVG (whole). Lets the agent display "rendering…" instantly.
glyph_query: chunks of N rows.
glyph_anomaly / glyph_drift / glyph_decompose / glyph_forecast: same.

Each chunk carries { kind: "progress" | "result" | "done", at, payload }.

Files: packages/mcp/src/server.ts (handler refactors), MCP SDK's streaming primitives.

Surface delta: no new verbs; existing ones gain optional streaming mode. Backward-compatible.

Tests: ~6 — streamed render reaches "done"; consumer with non-streaming client gets the full result still; large query streams N chunks.

Effort: ~250 LOC. Risk: medium (MCP transport details).

1.5 — Engagement signals (local-only)

Goal: which charts did the human actually look at?

Scope: opt-in browser-side telemetry. @glyph/preview-server records viewed_at, time_in_focus_ms, clicked_rows[] per handle. Stored locally in ~/.glyph/engagement.duckdb — never phoned home. glyph_engagement(handle_id?) returns the local stats.

The Story Agent's planner reads engagement as a bias signal: "the user always skips the forecast panel and lingers on the drift panel" → next session, weight drift higher.

Files: packages/preview-server/src/engagement.ts, packages/mcp/src/server.ts (verb), packages/mcp/src/memory.ts (new table).

Surface delta: +1 MCP verb (glyph_engagement). Telemetry stays local (Glyph's no-phone-home commitment intact).

Tests: ~4 — view event recorded; click event recorded; aggregate query returns expected stats; opt-out flag respected.

Effort: ~150 LOC. Risk: low.

Sequencing rationale + recommended rollout

Why batches in this order:

Batch A first — every item is independent + low-risk + lifts every chart immediately. Highest impact-per-LOC ratio.
Batch B second — unlocks "spec-as-code" workflow (1.7 → 2.8). After this batch, charts become reviewable like code in PRs.
Batch C third — the "production agent" batch. LLM-pluggable planner + macro replay is what makes Glyph attractive to teams running recurring analytics.
Batch D fourth — depends on B's spec_diff. Pulls the polish into the human-facing UX layer.
Batch E fifth — enterprise readiness. The audit module is the gate every legal / compliance team will demand before deploying agent-driven analytics in regulated contexts.
Batch F last — smallest batch; mostly transport-level work. Sequencing it last lets the upper-batch verbs be the streaming candidates.

Per-batch session plan: each batch is roughly the right size for a single focused multi-PR session. Batches A and F are smaller (1 session each); B, C, D, E are larger but still single-session feasible.

Risk + decision points

The plan above assumes Glyph keeps its design invariants. Three decisions worth flagging:

Decision 1: telemetry. Even local-only telemetry (1.5) is a tonal shift. Today Glyph's pitch is "no telemetry, ever." Local-only is honest but the marketing must be clear: ~/.glyph/engagement.duckdb is opt-in, never phones home, stays on the user's machine. If we ship it, flag it loudly.

Decision 2: LLM in the planner. PR41's planner is heuristic. Batch C makes it LLM-pluggable. We never include an LLM ourselves — the host supplies it. But this introduces a verb (glyph_story_provide_plan) whose ergonomics are LLM-shaped. Keep the heuristic planner as a first-class peer so Glyph stays usable without an LLM.

Decision 3: audit strictness. Batch E's audit module can be advisory ("warn") or blocking ("error"). The default should be "warn" — strictness: "error" is opt-in. We don't want Glyph silently refusing to render charts.

What's NOT in this plan (deliberately)

Full Astro docs site (Session B in NEXT-SESSIONS.md). The static site/index.html is good enough for now; Astro lifts distribution 9 → 11 but is a session by itself.
@glyph/webgl renderer (Session D). 1M-mark headroom. Useful but only ~2 of D3's gallery entries genuinely need it.
Rust port + Python wheel. Multi-week. Distribution play, not architecture.
D3-gap closures from D3-COMPARISON.md — polar coordinates (🚨 Gap 1), hierarchy data shape (🚨 Gap 2), morph transitions (🚨 Gap 3), contour stat (🚨 Gap 4), force simulation (🚨 Gap 5). Each is its own session-sized effort and competes with — rather than complements — Glyph's agent-affordance lead. See that doc for the "right call" analysis.

The deliberate omissions matter as much as the inclusions. Glyph wins by being the agent-affordance tool that's also a good viz library, not by being a D3 superset.

What's shipping in this PR vs. the rest

This PR (PR60) ships:

This document (PLAN.md)
Batch A starter items — the low-risk, independent ones that don't need any new architecture:
- 1.6 Handle TTL + auto-GC
- 1.3 Budget management (limit_rows on row-returning verbs)
- 2.6 Scale tuning (glyph_suggest_scale)
Batch B starter — 1.7 spec_diff (pure-fn, no dependencies)

These four items together close ~25% of the plan in one PR. The remaining 12 items are tracked in this document as the rollout for future sessions.

Status board (updated after each PR)

Item	Status	PR	Notes
1.1 LLM planner	✅ PR69	69	`glyph_story_plan` accepts `planner_hint: "llm"` → returns awaiting-state plan with schema context. `glyph_story_provide_plan(plan_id, nodes)` fulfills it. Per-kind args validated at provide-plan time.
1.2 Streaming	🟡 partial (PR72)	72	Milestone-level progress via the MCP SDK's `notifications/progress` channel. `glyph_render` emits 4 milestones (parse → materialize → compile → render); `glyph_query` emits 2. Backward-compatible: no progressToken = no notifications. Open follow-up: row-batch streaming ("every 1000 rows") and progressive scenegraph emit ("10k of 50k marks") from INNOVATION.md §1.2 are not yet wired — they need a streaming-renderer rewrite. The current PR enables the "abort-early UX" at milestone boundaries only.
1.3 Budget management	✅ this PR	60	starter batch
1.4 Disambiguation	✅ PR62	62	Heuristic planner emits `clarification_questions`; `glyph_story_clarify` pins answers onto the plan.
1.5 Engagement signals	✅ PR71	71	Local-only telemetry table in `~/.glyph/memory.duckdb`. `glyph_engagement_record(handle_id, kind, value?, detail?)` + `glyph_engagement_query(handle_id?, kind?, aggregate?)`. Never transmitted off the machine.
1.6 Handle TTL	✅ this PR	60	starter batch
1.7 Spec diff	✅ this PR	60	starter batch
1.8 Spec patch	✅ PR62	62	RFC 6902 applier in `@glyph/core/spec-diff/patch.ts`; `glyph_spec_patch` re-runs pipeline + emits new handle.
2.1 Multi-modal sync	⏸ deferred	—	UI work in @glyph/preview-server — needs design + visual review before code lands. Server-side hooks already in place (link_group bus).
2.2 Chart auditing	✅ PR63	63	`@glyph/core/audit` with 8 implemented rules (truncated y, log disclosure, dual-axis, excessive aggregation, diverging palette midpoint, color count, aspect ratio, stacked negatives). `glyph_audit_spec` verb.
2.3 Uncertainty rendering	✅ PR61	61	hatched bars + dim points + corner badge; provenance plumbed through compileSpec; opt-out via spec.interactive.uncertainty=false
2.4 Multi-agent compare	✅ PR62	62	`diffWhyboards` pure-fn; `glyph_whyboard_diff` verb. Compare two agents' Whyboards branch-by-branch.
2.5 Workflow capture	✅ PR70	70	Pure-fn Macro JSON shape (`{name, version:1, steps:[{verb,args}]}`). `glyph_macro_replay(macro, params?)` walks the macro with `{{params.X}}` substitution. v0 supports render/describe/query — mutating verbs intentionally excluded.
2.6 Scale tuning	✅ this PR	60	starter batch
2.7 Causal-aware viz	✅ PR64	64	`causal_of` field on MetricDefinition; `buildCausalGraph` pure-fn with cycle detection; `glyph_causal_graph` verb.
2.8 Spec-as-code CI	✅ PR64	64	`glyph diff` CLI subcommand with unified diff + HTML/MD output; GitHub Action template under .github/actions/glyph-visual-diff/.

D3 fix-ups (PR65) — closing gaps without architecture changes

The D3-COMPARISON.md "right call" is to not chase gallery coverage at the cost of Glyph's invariants. PR65 ships the same-architecture wins only:

Item	Status	Notes
`powScale` (pow(0.5)/pow(2)/pow(3))	✅ PR65	Sign-preserving for negative domains. Replaces ad-hoc sqrt callsites.
`thresholdScale`	✅ PR65	Explicit-breakpoint bucketing for color encoding (`[0, 25, 50, 75, 100] → 4 buckets`).
`quantileScale`	✅ PR65	Sample-based rank bucketing.
`linearRegression` (OLS)	✅ PR65	Pure-fn fit returning slope, intercept, R², and two endpoints to render as an overlay layer.
`glyph_regression` MCP verb	✅ PR65	Server-side wrapper over a handle's rows.
polar coords (Gap 1)	❌ skip	Architectural — needs new spec field + scale type + mark type. Per user constraint.
hierarchy data shape (Gap 2)	❌ skip	Architectural — needs new data shape on DataHandle. Per user constraint.
force / contour / morph / 3D / custom marks	❌ skip	All architectural — see D3-COMPARISON.md "right call".

The deliberately-skipped marks (ribbon, errorbar standalone, step-line, slope, bump, beeswarm, hexbin) can each be added in single-PR follow-ups when a real use-case demands them — none requires architectural changes. Each is ~50–150 LOC.

This plan is dependency-aware, scope-honest, and explicitly avoids over-engineering. Every item closes a real INNOVATION.md gap; nothing here breaks Glyph's existing invariants.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PLAN.md — closing all `INNOVATION.md` gaps + shipping all 8 ideas

TL;DR — the rollout in one table

Dependency map

Per-item plan

Batch A — Calibration foundation

1.3 — Context-window budget management

1.6 — Handle TTL + auto-GC

2.3 — Embedded uncertainty rendering ⭐⭐⭐⭐⭐

2.6 — LLM-assisted scale tuning

Batch B — Spec ergonomics + CI

1.7 — Spec diff

1.8 — Spec patch (incremental edit)

2.8 — Spec-as-code CI gate ⭐⭐⭐⭐

Batch C — Agent workflow

1.1 — LLM-driven planner

1.4 — Disambiguation on ambiguous intent

2.5 — Workflow capture & replay ⭐⭐⭐⭐

Batch D — Multi-modal + multi-agent

2.1 — Conversational drill + multi-modal sync ⭐⭐⭐⭐⭐

2.4 — Multi-agent answer comparison ⭐⭐⭐⭐

Batch E — Audit + governance

2.2 — Adversarial chart auditing ⭐⭐⭐⭐⭐

2.7 — Causal-aware viz

Batch F — Streaming + telemetry

1.2 — Streaming partial results

1.5 — Engagement signals (local-only)

Sequencing rationale + recommended rollout

Risk + decision points

What's NOT in this plan (deliberately)

What's shipping in this PR vs. the rest

Status board (updated after each PR)

D3 fix-ups (PR65) — closing gaps without architecture changes

FilesExpand file tree

PLAN.md

Latest commit

History

PLAN.md

File metadata and controls

PLAN.md — closing all INNOVATION.md gaps + shipping all 8 ideas

TL;DR — the rollout in one table

Dependency map

Per-item plan

Batch A — Calibration foundation

1.3 — Context-window budget management

1.6 — Handle TTL + auto-GC

2.3 — Embedded uncertainty rendering ⭐⭐⭐⭐⭐

2.6 — LLM-assisted scale tuning

Batch B — Spec ergonomics + CI

1.7 — Spec diff

1.8 — Spec patch (incremental edit)

2.8 — Spec-as-code CI gate ⭐⭐⭐⭐

Batch C — Agent workflow

1.1 — LLM-driven planner

1.4 — Disambiguation on ambiguous intent

2.5 — Workflow capture & replay ⭐⭐⭐⭐

Batch D — Multi-modal + multi-agent

2.1 — Conversational drill + multi-modal sync ⭐⭐⭐⭐⭐

2.4 — Multi-agent answer comparison ⭐⭐⭐⭐

Batch E — Audit + governance

2.2 — Adversarial chart auditing ⭐⭐⭐⭐⭐

2.7 — Causal-aware viz

Batch F — Streaming + telemetry

1.2 — Streaming partial results

1.5 — Engagement signals (local-only)

Sequencing rationale + recommended rollout

Risk + decision points

What's NOT in this plan (deliberately)

What's shipping in this PR vs. the rest

Status board (updated after each PR)

D3 fix-ups (PR65) — closing gaps without architecture changes

PLAN.md — closing all `INNOVATION.md` gaps + shipping all 8 ideas