What Phase 3 adds. Where Phase 0 (PR1–9) proved the wedge and
post-mvp.mdplans Phase 1 (grammar fill-in) + Phase 2 (reach), Phase 3 turns Glyph into a data-understanding substrate for analysts, business users, and multi-agent systems. Two halves:
- Seven innovation gaps (§1–§7) — features that move Glyph from "chart library" to "decision substrate".
- The GDF protocol (§8) — the wire format that makes agents share data natively, not via JSON-stringified blobs.
Audience for this doc: people building agent workflows on top of Glyph; people asking "can a non-analyst use this?"; people designing multi-agent systems.
- §0 What changes vs Phase 0/1/2
- §1 Gap — Semantic / metric layer
- §2 Gap — Self-explaining charts
- §3 Gap — Diagnostic primitives
- §4 Gap — Action surfaces
- §5 Gap — Agent-graph roles
- §6 Gap — Persistent memory + named views
- §7 Gap — Trust signals + row-level provenance
- §8 Protocol — GDF (Glyph Data Flow)
- §9 Agent-graph topology
- §10 Sequencing
- §11 Non-goals
- §12 North-star metrics
- §13 Gating criteria
Phase 0 (shipped) and the Phase 1/2 plan in post-mvp.md answer:
- Can an LLM write a Glyph spec that renders a useful chart? (Phase 0 ✓)
- Does the grammar cover the standard chart taxonomy? (Phase 1)
- Does it reach Python and high-mark-count rendering? (Phase 2)
Phase 3 answers a fundamentally different question:
Can a small team of agents — and the analyst or business user driving them — go from raw data to a defensible decision and an executed action, without leaving Glyph?
To make that real, Glyph needs primitives that today only exist in scattered BI products, dbt, observability tools, and bespoke notebooks. We pull them together under one declarative substrate. Crucially: every Phase 3 addition is built on the existing QueryHandle primitive. Nothing new at the bottom; everything new at the surface.
Today an agent learns the schema fresh on every glyph_describe call. revenue is just a column; MRR, churn rate, active customer exist only in the analyst's head. Each new agent (or new turn) re-derives them, often subtly differently. This is the single biggest reason business users distrust agent-driven analysis.
As an analyst, I want to define our company's metrics once — MRR, churn rate, "active customer" — so every agent and every chart uses the same definitions, automatically. When the finance team updates the MRR formula, every dashboard updates.
A glyph.metrics.yaml at the repo (or workspace) root, optionally registered via MCP:
metrics:
mrr:
description: "Monthly recurring revenue, excluding one-time charges."
sql: "SUM(amount) FILTER (WHERE type = 'subscription')"
grain: monthly
dimensions: [plan_tier, region, month]
churn_rate:
description: "% of customers who cancelled in a period."
sql: "COUNT(*) FILTER (WHERE status = 'cancelled') / NULLIF(COUNT(*), 0)"
requires: [cohort_month]
active_customer:
description: "Customer with any event in last 30 days."
sql: "EXISTS (SELECT 1 FROM events WHERE customer_id = c.id AND ts > now() - INTERVAL '30 days')"Spec extensions:
{
"data": { "source": "warehouse.customers" },
"layers": [{
"mark": "line",
"encoding": { "x": "month", "y": { "metric": "mrr" } }
}]
}Plus a new MCP verb glyph_metrics(prefix?) returning the available metric registry. The compiler rewrites { "metric": "mrr" } → the SQL expression at materialize time.
- Analyst writes the definition once; agent never reinvents it.
- Business user trusts the number because it traces back to a named, governed definition.
- Mirrors the dbt / Cube / LookML semantic layer pattern that BI tools have proven, but bound to a chart spec instead of a separate config product.
Every agent in the graph consults the same registry. A diagnostician agent's glyph_drill and a decision agent's glyph_act agree on what "MRR" means. No metric drift between agents.
Business users don't read charts well. Today the agent renders a chart and then separately describes it in prose. Two passes, both costly. Charts forwarded to non-analyst stakeholders get read wrong.
As a business user, when an agent shows me a chart, I want the salient observations in plain English right beside it. I don't want to squint at axes; I want to know what to act on.
A new MCP verb glyph_explain(handle_id) that runs a fixed pipeline against the rendered view:
| Step | What it computes |
|---|---|
| 1. Top-line | extent, max value + label, min value + label, recent direction |
| 2. Compositional | top-3 contributing groups by share (uses color / facet) |
| 3. Anomaly | marks > 2σ from their segment mean — surface inline |
| 4. Temporal | if x is temporal: period-over-period delta + trend strength |
Returns structured text:
{
"headline": "Rides peaked at 8am (260/hr), 6× the 3am low.",
"highlights": [
"Weekday 7–9am contributes 30% of daily volume.",
"Hour 17 is an outlier: 240 rides (+2.4σ vs the weekday baseline)."
],
"questions": [
"Why does hour 17 spike on weekdays specifically?",
"Is the 11pm tail (60 rides) driven by airport pickups?"
]
}The questions array is the secret weapon for agent graphs — it's the next prompt the next agent picks up.
- For non-analyst users this is the highest-leverage feature in Phase 3.
- Stops the agent from being a charting tool; starts it being a reading assistant.
- Adds an audit trail: the highlights are deterministic from the rendered view, so the same chart + same Glyph version always yields the same insights.
The questions array is structured fuel for the orchestrator: it's where a diagnostician agent gets handed off to next. Pure data → narrative → next question, all inside the protocol.
"Revenue dropped 12% this week" is the question every business user asks first. Today the agent renders another chart and guesses. Real diagnosis is anomaly detection + drift + cohort decomposition — every team rebuilds this from scratch.
As an analyst handed a "MRR dropped 8%" question, I want a single agent call that decomposes the drop by region, plan tier, and customer cohort, and tells me which contributed most. Today I write five queries by hand.
| Verb | What it returns | SQL shape |
|---|---|---|
glyph_anomaly(handle_id, threshold?) |
rows > N σ from group mean, ranked | WHERE ABS(z) > threshold over windowed mean/stddev |
glyph_drift(handle_id, periodA, periodB) |
per-group contribution to the delta between two periods | window functions; ranks descending by abs(contribution) |
glyph_decompose(handle_id, metric, factors) |
mix-shift / Simpson decomposition: how much of Δmetric is volume vs rate vs mix | row-of-row totals with weighting |
glyph_forecast(handle_id, horizon) |
Holt-Winters or seasonal-naive baseline; flags rendered values that fall outside the predicted band | DuckDB UDF / in-engine math |
Each verb returns:
rows— the diagnostic rows (top-N or full)- a new
handle_id(derived; lineage chained — see §8) - a rendered chart (the diagnosis visualized)
- an
explanationfield — same shape asglyph_explain
- Analyst's "why did X change?" pipeline collapses from hours to one call.
- Business user gets a causal story, not a descriptive one.
- All deterministic SQL: same input → same output. Snapshot-testable.
This is what a diagnostician agent specializes in. The verb names map 1:1 to its skill. Hand-off in: handle_id. Hand-off out: new handle_id with the diagnosis + an explanation. Composes directly with action agents.
PR9 closed the chart → SQL → chart loop. But the business loop is chart → SQL → action (email, ticket, CRM update, alert, downstream MCP tool). Today the agent copies 12 customer IDs into a separate prompt and hopes it gets routed.
As a sales ops user, when an agent surfaces the 12 customers at highest churn risk, I want one click (or one agent call) to email them all, plus a Linear ticket auto-filed with the screenshot. The chart is the input to the action.
A. Declarative actions on the spec
{
"data": { "source": "warehouse.customers" },
"layers": [...],
"interactive": { "key": "customer_id" },
"actions": [
{
"label": "Email risk team",
"tool": "intercom_send_email",
"argMap": { "customer_ids": "$selection.keys", "template": "churn-risk" }
},
{
"label": "Create Linear issue",
"tool": "linear_create_issue",
"argMap": {
"title": "Churn risk: $selection.count customers",
"description": "$selection.summary"
}
}
]
}@glyph/live renders these as buttons next to the chart; the SVG itself carries them as <metadata> for offline agents.
B. MCP verb
glyph_act(handle_id, action, selection) — server-side equivalent. The selection is { equals | between | in } (same shape as glyph_drill). The verb invokes the named tool via the host MCP plane, returning the tool's result.
- Analyst stops being a relay between chart and CRM.
- Business user gets the shortest path from observation to operation.
- The chart becomes operational interface, not a viewer.
The operator agent is the one that calls glyph_act. Its skill is small (~100 tokens); it does one thing well. The orchestrator hands a handle_id + an action choice; the operator executes. Clean separation of cognition (other agents) from side-effects (operator only).
You named agent graphs. Today each agent that touches Glyph starts a fresh @glyph/mcp process — its own DuckDB instance, its own handle store. Handoff between an exploration agent → a diagnosis agent → an action agent loses every QueryHandle. State doesn't compound across agents the way it compounds across turns.
As an orchestrator agent, I want to dispatch sub-tasks to specialised agents (explorer, diagnostician, operator) and have them share data via handles, not by re-uploading rows.
A. Shared session protocol. @glyph/mcp gains a --session-id <uuid> flag. Multiple agents pointing at the same session id share one engine + handle registry. The transport is on-disk DuckDB (ATTACH) by default; UNIX socket for hot paths. A coordinator agent can hand a handle_id to a worker agent and it just works.
B. Role-aware skills. Five new skills, small and focused:
| Skill | Role | Specialises in | Skill size |
|---|---|---|---|
skills/glyph-explorer/ |
Explorer | glyph_describe, glyph_render (initial questions) |
~150 tokens |
skills/glyph-diagnostician/ |
Diagnostician | glyph_anomaly, glyph_drift, glyph_decompose, glyph_forecast |
~200 tokens |
skills/glyph-operator/ |
Operator | glyph_act + selection consolidation only |
~120 tokens |
skills/glyph-narrator/ |
Narrator | glyph_explain, summary writing, downstream-question generation |
~150 tokens |
skills/glyph-orchestrator/ |
Orchestrator | dispatching to the above, holding the lineage DAG | ~250 tokens |
Each role-skill is small so agent context stays cheap; the orchestrator's job is mostly to choose which sub-agent to call next based on the previous agent's questions output.
- Analyst doesn't think about agent topology; they just see a coherent answer.
- Business user gets faster, more accurate answers because each agent does one thing.
This is the gap that's unique to multi-agent — single-agent users don't need it. Without it, multi-agent Glyph workflows have to reconstruct state at every handoff. With it, the agent graph becomes a fan-out of specialists that share a data layer.
Today every @glyph/mcp session starts blank. The user re-explains what "active customer" means, re-builds the same weekly KPI chart, re-derives the same cohorts. Agents have no working memory.
As an analyst, I want to say "show me the weekly MRR dashboard from last Monday" and have the agent recall it — same spec, same metric definitions, same filters.
A ~/.glyph/memory.duckdb file (naturally) holding:
| Table | Purpose |
|---|---|
saved_views |
named specs the user wants to recall: glyph_save("weekly-mrr", spec), glyph_recall("weekly-mrr") |
metric_defs |
overlaps with Gap §1; enables per-user override |
phrase_rewrites |
"our customers" → WHERE tenant_id = 'us-prod' AND status != 'internal' |
column_corrections |
when the user corrects an agent ("rides is ordinal, not quantitative"), remember it |
MCP verbs: glyph_memory_save, glyph_memory_recall, glyph_memory_list, glyph_memory_forget.
Scope: per-user (or per-project via a --memory-path override). Explicitly local-first — no telemetry, no cloud, no shared state across hosts unless the user chooses to commit memory.duckdb to their repo.
- Knowledge compounds across sessions. The agent learns the user's vocabulary.
- The user stops feeling like they're explaining themselves to a stranger every Monday morning.
Memory is what turns the orchestrator agent into a consistent orchestrator. Without persistent state, every agent graph is a one-shot; with it, the graph remembers the project it's working on.
A chart shows MRR = $4.2M. Business user is about to forward it to the board. Two questions they can't answer:
- Is the data fresh?
- Which rows produced this number?
If the answer is "stale by 6 days" or "filtered out 30% of records because of a join bug," the decision is wrong.
As an exec reviewing a chart before a board meeting, I want a visible freshness stamp, a confidence rating per mark, and the ability to click a bar and see the underlying rows.
A. Per-mark provenance. Each SceneMark gains an optional provenance: { sourceRows, filteredOut, freshness, confidence }. glyph_describe returns these for the source; the compiler propagates them through transforms. SVG renderer emits data-provenance="..." attrs and an optional <glyph-trust> overlay (a small badge in the corner of the chart).
B. Confidence flags. When a bar's underlying sample is < 30 rows, the compiler tags it lowSample: true and the renderer styles it differently (hatched fill). Agent sees this and surfaces it in glyph_explain output.
C. Lineage walk-through. glyph_lineage(handle_id, mark_key) — MCP verb returning the chain source file → SQL transform → row IDs → mark. Click-through to the actual rows that produced a number.
- No business user will hand a chart to their CEO without trust signals.
- This is the gap between technically correct and deployable.
The decision agent needs the confidence rating before recommending an action; the operator agent needs the provenance walk-through if the action turns out wrong. Without these, the agent graph has nothing to audit.
The connective tissue that makes §1–§7 work in a multi-agent setting.
QueryHandle is already the right primitive — it's a named, schema-described, queryable, deterministic dataset. Today it lives in one DuckDB process. GDF promotes it to a cross-process, cross-agent DataHandle addressed by URI.
interface DataHandle {
// ── Identity ──────────────────────────────────────────────────
uri: string; // "gdf://<session>/<id>" — globally addressable
version: number; // bumps when the underlying data changes
// ── What it is (cheap; agents reason on this first) ──────────
schema: ReadonlyArray<{
name: string;
type: string;
suggested: "quantitative" | "ordinal" | "nominal" | "temporal";
nullable: boolean;
}>;
rowCount: number;
// ── Where it came from (lineage; never optional) ─────────────
lineage: {
parents: ReadonlyArray<{
uri: string;
relation: "transform" | "filter" | "join" | "agg";
}>;
sql: string; // the deterministic SQL that produced it
producer: {
agent: string;
tool: string;
sessionId: string;
at: string; // ISO timestamp
};
};
// ── Whether to trust it (links to Gap §7) ────────────────────
provenance: {
freshness: string; // ISO timestamp of the underlying read
sampleRows: number;
filteredOut: number;
confidence: "high" | "medium" | "low";
};
// ── Where the bytes are (multiple transports, same handle) ──
binding: {
kind: "duckdb-view" | "arrow-ipc" | "arrow-flight" | "parquet-uri";
location: string;
};
// ── Live? ────────────────────────────────────────────────────
subscribable: boolean;
subscriptionUri?: string;
}| Transport | When | Performance |
|---|---|---|
| In-process (DuckDB view) | One MCP server, one agent | Zero copy; same as today |
Local IPC (Arrow IPC over UNIX socket / shared .duckdb via ATTACH) |
Multiple agents on one host | Zero-copy via memfd / SharedArrayBuffer; ~µs / row |
| Networked (Arrow Flight gRPC) | Distributed agent graph | Streaming columnar; LAN-saturating throughput |
Same wire format. The resolver picks the cheapest transport that works for a given URI.
| Verb | Purpose | Cost |
|---|---|---|
gdf.publish(handle) |
Promote a local handle to a peer-visible URI | O(1) |
gdf.peek(uri, limit?) |
Schema + N sample rows (defaults to 0) | O(1) for schema, O(N) for rows |
gdf.subscribe(uri) |
Bind a remote handle into the local engine; optionally listen for version bumps |
O(schema); rows on demand |
gdf.derive(uri, sql) |
New handle from existing one; lineage auto-chained | O(materialization) |
gdf.lineage(uri, depth?) |
Walk the lineage DAG; returns a tree of {uri, sql, producer, at} |
O(depth) |
gdf.unbind(uri) |
Release; refcount-managed so it's safe across agents | O(1) |
Six verbs cover the entire agent-graph data plane.
A spec's data.source is already a string. Add one URI scheme:
{
"data": { "source": "gdf://prod-session/handle-abc123" },
"layers": [{ "mark": "bar", "encoding": { "x": "hour", "y": "rides" } }]
}An analyst-agent's spec points at what a data-agent already materialised. No re-upload, no re-derivation.
| Tool | Replaces / extends | Approx tokens |
|---|---|---|
glyph_publish(handle_id, scope?) |
promotes a local handle | ~80 |
glyph_subscribe(uri) |
binds an external handle locally | ~80 |
glyph_lineage(uri, depth?) |
provenance walk | ~100 |
glyph_handles() |
list all visible handles in the session | ~60 |
Existing glyph_render / glyph_query / glyph_drill work unchanged because spec's data.source is just a string — they don't care if it's a file path or a gdf:// URI.
Total agent-graph protocol surface stays under 1,000 tokens.
Three deliberate choices:
- DuckDB is the substrate. It speaks Arrow IPC natively, supports
ATTACHacross files, has views as first-class objects. We don't add a layer; we expose what's there. - Schema travels separately from bytes. Agents reason about handles cheaply (~200 bytes); they materialise only when they actually need to render or query.
- Lineage is computed, not stored elsewhere. Each handle records its immediate parent SQL; walking the DAG is read-only joins on a small in-engine table.
Order-of-magnitude estimates (single host, ~1 M-row table):
| Operation | Today (stringified content) | With GDF |
|---|---|---|
| Pass dataset between agents | ~12 MB JSON, 2–5 s | ~2 KB URI handle, < 10 ms |
| Derive a filtered view | re-materialize via SQL | O(predicate), same DuckDB plan |
| Display a chart in a downstream agent | re-parse JSON | direct view binding |
Token cost vs the "stringify the data into tool output" pattern: ~200× cheaper for a 1 k-row result, ~10,000× cheaper for 100 k rows.
- URIs > IDs.
gdf://session-xyz/sales-by-regionis self-describing; an LLM can infer scope from the path. - Schema-first. Every handle has a schema the agent reads before requesting data — same pattern as
glyph_describe, generalised. - Errors are crisp.
handle not found/schema drift/lineage brokenare clear failure modes; recovery is mechanical. - No new mental model.
glyph_render(spec)just learned to accept agdf://source. - Inspectable.
glyph_lineagelets the agent (or the human) ask "where did this number come from?" — debugging an agent graph stops being a black box. - State sharing is only via handles. Agents pass URIs, not blobs. The temptation to stuff data into tool arguments evaporates.
These would over-engineer the protocol:
- No federation of compute. GDF moves handles and small data; if you need cross-org joins, that's a warehouse problem (Iceberg / Trino).
- No CRDT / conflict resolution. Handles are immutable after publish; mutation is by
derive(new handle).versionmonotonically increases. - No mandatory networking. In-process is the fast path; local IPC is the multi-agent path; network is only when an agent literally lives on another host.
GDF makes the role distinction crisp and operational:
| Role | Publishes | Subscribes to | Key verbs |
|---|---|---|---|
| Data agent | Raw source handles (gdf://session/customers, gdf://session/events) |
— | glyph_describe, glyph_publish |
| Transform agent | Derived handles (gdf://session/weekly-mrr) |
Raw source handles | gdf.derive, glyph_publish |
| Explorer agent | — (read-only) | Derived handles; renders charts | glyph_render, gdf.peek |
| Diagnostician agent | Diagnostic handles | Derived handles | glyph_anomaly, glyph_drift, glyph_decompose, glyph_forecast |
| Narrator agent | Text bundles (insights, questions) | Any handle | glyph_explain |
| Decision agent | Action candidates | Diagnostic handles + narrator output | glyph_drill + reasoning |
| Operator agent | Action results | Decision-agent outputs | glyph_act |
| Orchestrator | Lineage DAG of the whole conversation | All of the above | glyph_handles, glyph_lineage |
The orchestrator holds the lineage DAG of all live handles. Asking "why did this decision happen?" walks the DAG back to source rows. Total auditability without log diving.
┌──────────────────────────────────────────────────────────────────┐
│ User: "MRR dropped 8% this week. Why? Who should we email?" │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Orchestrator │
│ → Data agent: publish customers, events, subscriptions │
│ → Transform agent: derive weekly_mrr (gdf://.../weekly-mrr) │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Diagnostician │
│ glyph_decompose(weekly-mrr, metric=mrr, factors=[region, tier]) │
│ → handle: gdf://.../decomp-abc (volume × rate × mix) │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Explorer │
│ glyph_render({ source: decomp-abc, ... interactive: ...}) │
│ Narrator: glyph_explain(handle) → "Drop driven by tier=ENT, │
│ region=US; 8 of 12 churned accounts share rep_id=42" │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Decision │
│ glyph_drill(handle, field=customer_id, in=[the 12]) → rows │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Operator │
│ glyph_act(handle, "email_risk_team", selection) │
│ → 12 emails sent · Linear issue OPS-1234 filed │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Orchestrator publishes the final lineage DAG; user sees the │
│ chart + narrative + the audit trail of the action. │
└──────────────────────────────────────────────────────────────────┘
Every arrow above is a gdf:// URI. Every box knows nothing about the others' internals — only schemas + URIs.
Each ships as one PR using the cycle PR1–10 used.
Tier A — GDF foundation (must land before everything else in Phase 3):
| PR | Scope | Approx LOC |
|---|---|---|
| 11 | DataHandle type (URI, version, lineage, provenance fields); promote QueryHandle → DataHandle non-breakingly |
~250 |
| 12 | MCP verbs: glyph_publish, glyph_subscribe, glyph_lineage, glyph_handles — in-process transport only |
~350 |
| 13 | data.source: "gdf://..." URI resolution in the compiler + materializer |
~150 |
| 14 | Local IPC transport (shared DuckDB file via ATTACH; Arrow IPC over UNIX socket) |
~400 |
Tier B — Innovation gaps built on GDF:
| PR | Scope | Depends on |
|---|---|---|
| 15 | Gap §1 — semantic / metric layer (glyph.metrics.yaml + metric: "mrr" encoding + glyph_metrics MCP verb) |
11–13 |
| 16 | Gap §2 — glyph_explain MCP verb (top-line / compositional / anomaly / temporal pipelines) |
11–13 |
| 17 | Gap §3a — glyph_anomaly, glyph_drift |
11–13 |
| 18 | Gap §3b — glyph_decompose, glyph_forecast |
17 |
| 19 | Gap §4 — actions[] on spec + glyph_act MCP verb |
11–13 |
| 20 | Gap §5 — role-aware skills (explorer / diagnostician / narrator / operator / orchestrator) | 14, 16–19 |
| 21 | Gap §6 — ~/.glyph/memory.duckdb + glyph_memory_* MCP verbs |
11–13 |
| 22 | Gap §7 — per-mark provenance + <glyph-trust> overlay + glyph_lineage UI walk-through |
11–13 |
Tier C — Networked transport (demand-gated):
| PR | Scope |
|---|---|
| 23 | Arrow Flight gRPC transport for distributed agent graphs |
| 24 | Auth + per-handle ACLs (signed URIs, scoped tokens) |
Total: ~14 PRs across Phase 3. Each ~200–500 LOC, 30-min CI cycle, same six-cell matrix.
These dilute the wedge; explicit non-goals for Phase 3:
- A managed cloud — Glyph is local-first. Hosts run their own MCP servers.
- Federated joins across orgs — that's a warehouse problem (Trino, Iceberg, DataFusion). GDF moves handles, not federation.
- A workflow-engine product à la Airflow / Dagster — these are orchestrators of jobs; Phase 3 is for orchestration of agents reasoning about data, a different concern.
- Real-time / sub-second streaming — Perspective owns that lane; GDF's subscriptions are coarse (seconds-to-minutes).
- A no-code dashboard builder — the agent surface is the builder.
- Vector / embedding-based "semantic search" of metrics — overlaps with Gap §1 if mis-scoped. Stay declarative.
| Metric | End of Tier A (~week 18) | End of Tier B (~week 26) | Stretch (~9 mo) |
|---|---|---|---|
| GitHub stars | 12,000 | 20,000 | 35,000 |
Weekly @glyph/core downloads |
15,000 | 50,000 | 200,000 |
| MCP installs (all 5 role-skills combined) | 5,000 | 15,000 | 50,000 |
gdf:// handles published per active session (P50) |
3 | 8 | 15 |
glyph_explain calls per render (P50) |
n/a | 0.8 | 1.0 |
glyph_act calls per session (P75) |
n/a | 1 | 3 |
| Snapshot corpus | 50 | 75 | 100 |
| Lineage DAG depth per decision (P50) | 3 | 5 | 7 |
Tier A ships if:
gdf://URIs resolve transparently inglyph_render/glyph_query/glyph_drill— no API surface change for spec writers.- A two-process demo works: agent A publishes a handle; agent B subscribes and renders against it; round-trip < 50 ms on localhost.
glyph_lineage(uri)returns a tree that walks back to a known source file for every published handle.- Snapshot byte-identity still holds for non-interactive specs.
- Total MCP surface stays under 1,000 tokens.
Tier B ships if:
- The §9 typical run executes end-to-end against a real dataset (≥100 k rows) on a single host in < 5 s.
- Every diagnostic verb (anomaly / drift / decompose / forecast) has ≥5 snapshot tests + a deterministic explanation.
- The 5 role-skills are independently installable; combined token cost < 1,000 tokens.
glyph_actinvokes at least one upstream MCP tool (e.g. a stub email tool) end-to-end.glyph_memory_*round-trips a saved view across a server restart.
Tier C ships if:
- Arrow Flight benchmarks beat in-process JSON on a 1M-row handle by ≥100× for cross-host transfer.
- Signed URIs work in a 3-host agent graph with TLS and a per-handle ACL.
Phase 0 proved a chart-and-compute artifact. Phase 1/2 fill in the breadth of grammar and reach. Phase 3 is the layer that makes Glyph the substrate analysts, business users, and agent graphs reach for when the question is "what should we do?", not just "what does this look like?".
Every Phase 3 feature builds on the existing QueryHandle primitive. We don't add anything new at the bottom; we name and amplify what's already there.