Agent-Pattern-Labs
diff --git a/‎AGENT_NATIVE_UI.md‎
Lines changed: 121 additions & 0 deletions b/‎AGENT_NATIVE_UI.md‎
Lines changed: 121 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 1 addition & 0 deletions b/‎CLAUDE.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 20 additions & 0 deletions b/‎README.md‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎benchmarks/agent-native-methodology.md‎
Lines changed: 54 additions & 0 deletions b/‎benchmarks/agent-native-methodology.md‎
Lines changed: 54 additions & 0 deletions
@@ -0,0 +1,121 @@
+# Agent-Native UI
+
+Geometra's agent-native layer makes the interface itself the protocol. A normal frontend can expose a DOM, screenshots, accessibility data, and backend APIs. Geometra exposes the computed UI frame directly: exact geometry, semantics, interaction targets, policy metadata, and replayable action history from the same declarative tree that renders pixels.
+
+## Contract
+
+Every rendered frame can produce a semantic geometry snapshot:
+
+```ts
+{
+  id: 'claims-review:frame:1',
+  route: 'claims-review',
+  rootBounds: { x: 0, y: 0, width: 1180, height: 760 },
+  nodes: [
+    {
+      id: 'approve-payout',
+      role: 'button',
+      name: 'Approve payout',
+      bounds: { x: 474, y: 512, width: 132, height: 62 },
+      hitTarget: { x: 474, y: 512, width: 132, height: 62 },
+      visible: true,
+      enabled: true,
+      focusable: true,
+      interactive: true,
+      actionId: 'approve-payout'
+    }
+  ],
+  actions: [
+    {
+      id: 'approve-payout',
+      kind: 'approve',
+      risk: 'write',
+      requiresConfirmation: true,
+      bounds: { x: 474, y: 512, width: 132, height: 62 }
+    }
+  ]
+}
+```
+
+Use `semantic.id` for stable UI ids. If omitted, Geometra falls back to `agentAction.id`, then `key`, then a path id like `node:0.2`.
+
+## Core APIs
+
+`@geometra/core` exports:
+
+- `collectSemanticGeometry(tree, layout)` for flat exact geometry plus role/name/state per node.
+- `createAgentGeometrySnapshot(tree, layout, options)` for auditable frame snapshots.
+- `createAgentRuntime(app, options)` for direct app-level commands: `inspect`, `snapshot`, `click`, `focus`, `type`, `key`, `getActionLog`, and `replay`.
+- `agentAction(contract, semantic)` and `collectAgentActions(tree, layout)` for business-level action contracts.
+- `createAgentGateway()` for policy, approval, execution, trace, and replay around those contracts.
+
+## Runtime Commands
+
+The app runtime operates by semantic geometry id instead of DOM selectors or guessed coordinates:
+
+```ts
+const runtime = createAgentRuntime(app, { route: 'claims-review' })
+
+const frame = runtime.inspect()
+runtime.click('approve-payout')
+runtime.type('agent-note', ' reviewed')
+const replay = runtime.replay(runtime.getActionLog())
+```
+
+Each command records before/after frame snapshots in the runtime action log. That answers: what did the agent see, which stable target did it use, what exact geometry was active, and what changed afterward.
+
+## Gateway And HTTP
+
+`@geometra/gateway` exposes the same frame-bound contract to external agents:
+
+- `GET /inspect` returns the latest frame, semantic geometry, current actions, and pending approvals.
+- `GET /actions` returns contracted business actions plus the latest frame.
+- `POST /actions/request` requests an action by id and frame id.
+- `POST /actions/approve` approves or denies a pending action.
+- `GET /trace` returns the append-only event trace.
+- `GET /replay` returns before/after frame snapshots and action outcomes.
+
+The MCP-style tool adapter mirrors this with:
+
+- `geometra_gateway_inspect_frame`
+- `geometra_gateway_list_actions`
+- `geometra_gateway_request_action`
+- `geometra_gateway_approve_action`
+- `geometra_gateway_get_trace`
+- `geometra_gateway_get_replay`
+
+## Demo
+
+Run the claims workflow demo:
+
+```bash
+bun run --filter @geometra/demo-agent-native-ops dev
+```
+
+The demo shows:
+
+- a human-rendered Canvas UI
+- exact semantic geometry for the same UI
+- clicking `approve-payout` by stable id
+- typing into `agent-note` by stable id
+- policy-gated gateway actions
+- trace and replay panels with before/after frame geometry
+
+Run the external-agent HTTP flow:
+
+```bash
+bun run demo:agent-native:http
+```
+
+That script builds the core/gateway packages, starts a local gateway, calls `/inspect`, requests `approve-payout`, approves it, reads `/replay`, and writes `examples/replays/claims-review.json`.
+
+## Benchmark
+
+Run the deterministic value harness:
+
+```bash
+bun run benchmark:agent-native:assert
+```
+
+The harness compares Geometra-native operation against MCP/browser/vision-style inference on context bytes, tool calls, latency, success rate, security failures, replayability, and postcondition checks.
+See `benchmarks/agent-native-methodology.md` for assumptions and metric definitions.
@@ -12,6 +12,7 @@ See **`GEOMETRY_SNAPSHOT_TESTING.md`** for layout JSON / geometry regression pat
 See **`DEPLOYMENT.md`** for production deployment: process management, reverse proxy, auth, scaling, monitoring.
 See **`NATIVE_MCP_GUIDE.md`** for building native Geometra apps that AI agents drive via MCP.
 See **`MCP_COOKBOOK.md`** for MCP tool call recipes (proxy and native workflows).
+See **`AGENT_NATIVE_UI.md`** for semantic geometry snapshots, stable UI ids, app runtime commands, gateway inspect/actions, trace, and replay.
 
 ## Architecture
 
 
@@ -3,6 +3,7 @@
 **The geometry protocol for UI.** Server-computed `{ x, y, w, h }` — not component descriptions — streamed to humans and AI agents over the same socket.
 
 > **AI Agents:** See [`llms.txt`](llms.txt) for a structured overview of the entire framework — architecture, props, components, protocols, and APIs.
+> **Agent-native apps:** See [`AGENT_NATIVE_UI.md`](AGENT_NATIVE_UI.md) for exact semantic geometry, stable UI ids, runtime commands, gateway inspect/actions, trace, and replay.
 
 **[Live Demo](https://razroo.github.io/geometra)** | **[npm](https://www.npmjs.com/org/geometra)** | **[GitHub](https://github.com/razroo/geometra)** | **[Auth](https://github.com/razroo/geometra-auth)** | **[Token Registry](https://github.com/razroo/geometra-token-registry)**
 
@@ -194,6 +195,25 @@ To uninstall, remove the server entry from your client's MCP configuration.
 
 See [mcp/README.md](mcp/README.md) for tool details, examples, and source installs from this repo.
 
+## Agent-Native UI Protocol
+
+Geometra can make the UI itself the agent contract: exact semantic geometry, stable UI ids, action policy, before/after replay, and postcondition checks from the same tree that renders to humans.
+
+| Browser automation | Geometra-native UI |
+|---|---|
+| Infer state from DOM, screenshots, selectors, or OCR | Inspect `semantic.id`, role/name/state, exact bounds, and action contracts directly |
+| Click guessed selectors or coordinates | Click/focus/type by stable UI id |
+| Audit backend calls separately from what was visible | Replay frame-before/frame-after geometry plus action trace |
+| Add policy and approval as extra app logic | Carry risk, confirmation, input/output schemas, and postconditions with the UI action |
+
+Run the end-to-end external-agent flow:
+
+```bash
+bun run demo:agent-native:http
+```
+
+It starts a local gateway, calls `/inspect`, requests and approves `approve-payout`, reads `/replay`, and writes `examples/replays/claims-review.json`.
+
 ## Agent-native roadmap
 
 The next layer is explicit agent contracts on top of geometry: stable action ids, risk classes, policy gates, traces, and replay. Start with:
 
@@ -0,0 +1,54 @@
+# Agent-Native Benchmark Methodology
+
+`scripts/benchmark-agent-native-value.mjs` is a deterministic value harness. It is not a lab latency benchmark; it is a repeatable scenario model for comparing how much work an AI agent must do when the UI is a native protocol versus when the agent infers state from browser or vision surfaces.
+
+## Scenarios
+
+The scenario data lives in `benchmarks/agent-native-scenarios.json`.
+
+Each scenario describes an enterprise workflow where the agent must inspect a UI state, choose an action, respect policy or approval rules, execute the action, and prove what happened afterward.
+
+Current scenarios:
+
+- `claims-review`: review claim evidence, approve payout, and export audit evidence.
+- `compliance-queue`: classify evidence, attach a reason code, and escalate a sanctions hit.
+- `access-admin`: review privileged access, approve a temporary role, and export approval evidence.
+
+## Modes
+
+- `geometra-native`: the app exposes semantic geometry snapshots, stable node/action ids, policy metadata, and replayable before/after frames directly through Geometra.
+- `geometra-mcp`: the agent uses Geometra MCP/proxy semantics against a web surface. This is still structured, but the app is not itself the native protocol.
+- `playwright-mcp`: the agent uses browser automation primitives, DOM/a11y queries, selectors, and manual orchestration.
+- `vision-computer-use`: the agent uses screenshot or OCR-style inference and coordinate actions.
+
+## Metrics
+
+- `contextBytes`: approximate structured context the agent must inspect to complete the workflow.
+- `toolCalls`: round trips required to inspect, act, wait, verify, and export/replay.
+- `medianLatencyMs`: representative median flow latency for the modeled mode.
+- `successRate`: expected workflow completion rate under realistic UI variance.
+- `humanApprovals`: required human policy checkpoints.
+- `securityFailures`: modeled cases where the agent could act on the wrong target, stale state, or insufficiently audited surface.
+- `replayable`: whether before/after UI state is available as structured replay data.
+- `postconditionChecks`: explicit structured checks attached to the completed action.
+
+## Assertions
+
+`bun run benchmark:agent-native:assert` validates that:
+
+- every scenario contains all required modes and metrics
+- native Geometra uses no more context or tool calls than every non-native baseline
+- native success rate is not lower than any baseline
+- native security failures remain `0`
+- native mode is replayable
+- native mode includes at least one postcondition check
+
+## Interpreting Results
+
+The most important comparison is not raw speed. The product claim is:
+
+> Browser automation infers what happened. Geometra-native apps expose what happened as the UI protocol.
+
+That shows up as fewer context bytes, fewer tool calls, fewer wrong-target/security failures, and a replay record that includes exact semantic geometry for the frame the agent acted on.
+
+Use the harness for product positioning and regression guardrails. Use separate live benchmarks when measuring actual transport, renderer, or network latency.