Skip to content

Commit 2db55db

Browse files
feat: add agent-native UI protocol runtime
Expose exact semantic geometry snapshots, runtime commands, gateway inspect, replay artifacts, and demo/benchmark docs so external agents can operate Geometra apps without browser inference. Made-with: Cursor
1 parent 98a36f5 commit 2db55db

22 files changed

Lines changed: 4227 additions & 5 deletions

AGENT_NATIVE_UI.md

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# Agent-Native UI
2+
3+
Geometra's agent-native layer makes the interface itself the protocol. A normal frontend can expose a DOM, screenshots, accessibility data, and backend APIs. Geometra exposes the computed UI frame directly: exact geometry, semantics, interaction targets, policy metadata, and replayable action history from the same declarative tree that renders pixels.
4+
5+
## Contract
6+
7+
Every rendered frame can produce a semantic geometry snapshot:
8+
9+
```ts
10+
{
11+
id: 'claims-review:frame:1',
12+
route: 'claims-review',
13+
rootBounds: { x: 0, y: 0, width: 1180, height: 760 },
14+
nodes: [
15+
{
16+
id: 'approve-payout',
17+
role: 'button',
18+
name: 'Approve payout',
19+
bounds: { x: 474, y: 512, width: 132, height: 62 },
20+
hitTarget: { x: 474, y: 512, width: 132, height: 62 },
21+
visible: true,
22+
enabled: true,
23+
focusable: true,
24+
interactive: true,
25+
actionId: 'approve-payout'
26+
}
27+
],
28+
actions: [
29+
{
30+
id: 'approve-payout',
31+
kind: 'approve',
32+
risk: 'write',
33+
requiresConfirmation: true,
34+
bounds: { x: 474, y: 512, width: 132, height: 62 }
35+
}
36+
]
37+
}
38+
```
39+
40+
Use `semantic.id` for stable UI ids. If omitted, Geometra falls back to `agentAction.id`, then `key`, then a path id like `node:0.2`.
41+
42+
## Core APIs
43+
44+
`@geometra/core` exports:
45+
46+
- `collectSemanticGeometry(tree, layout)` for flat exact geometry plus role/name/state per node.
47+
- `createAgentGeometrySnapshot(tree, layout, options)` for auditable frame snapshots.
48+
- `createAgentRuntime(app, options)` for direct app-level commands: `inspect`, `snapshot`, `click`, `focus`, `type`, `key`, `getActionLog`, and `replay`.
49+
- `agentAction(contract, semantic)` and `collectAgentActions(tree, layout)` for business-level action contracts.
50+
- `createAgentGateway()` for policy, approval, execution, trace, and replay around those contracts.
51+
52+
## Runtime Commands
53+
54+
The app runtime operates by semantic geometry id instead of DOM selectors or guessed coordinates:
55+
56+
```ts
57+
const runtime = createAgentRuntime(app, { route: 'claims-review' })
58+
59+
const frame = runtime.inspect()
60+
runtime.click('approve-payout')
61+
runtime.type('agent-note', ' reviewed')
62+
const replay = runtime.replay(runtime.getActionLog())
63+
```
64+
65+
Each command records before/after frame snapshots in the runtime action log. That answers: what did the agent see, which stable target did it use, what exact geometry was active, and what changed afterward.
66+
67+
## Gateway And HTTP
68+
69+
`@geometra/gateway` exposes the same frame-bound contract to external agents:
70+
71+
- `GET /inspect` returns the latest frame, semantic geometry, current actions, and pending approvals.
72+
- `GET /actions` returns contracted business actions plus the latest frame.
73+
- `POST /actions/request` requests an action by id and frame id.
74+
- `POST /actions/approve` approves or denies a pending action.
75+
- `GET /trace` returns the append-only event trace.
76+
- `GET /replay` returns before/after frame snapshots and action outcomes.
77+
78+
The MCP-style tool adapter mirrors this with:
79+
80+
- `geometra_gateway_inspect_frame`
81+
- `geometra_gateway_list_actions`
82+
- `geometra_gateway_request_action`
83+
- `geometra_gateway_approve_action`
84+
- `geometra_gateway_get_trace`
85+
- `geometra_gateway_get_replay`
86+
87+
## Demo
88+
89+
Run the claims workflow demo:
90+
91+
```bash
92+
bun run --filter @geometra/demo-agent-native-ops dev
93+
```
94+
95+
The demo shows:
96+
97+
- a human-rendered Canvas UI
98+
- exact semantic geometry for the same UI
99+
- clicking `approve-payout` by stable id
100+
- typing into `agent-note` by stable id
101+
- policy-gated gateway actions
102+
- trace and replay panels with before/after frame geometry
103+
104+
Run the external-agent HTTP flow:
105+
106+
```bash
107+
bun run demo:agent-native:http
108+
```
109+
110+
That script builds the core/gateway packages, starts a local gateway, calls `/inspect`, requests `approve-payout`, approves it, reads `/replay`, and writes `examples/replays/claims-review.json`.
111+
112+
## Benchmark
113+
114+
Run the deterministic value harness:
115+
116+
```bash
117+
bun run benchmark:agent-native:assert
118+
```
119+
120+
The harness compares Geometra-native operation against MCP/browser/vision-style inference on context bytes, tool calls, latency, success rate, security failures, replayability, and postcondition checks.
121+
See `benchmarks/agent-native-methodology.md` for assumptions and metric definitions.

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ See **`GEOMETRY_SNAPSHOT_TESTING.md`** for layout JSON / geometry regression pat
1212
See **`DEPLOYMENT.md`** for production deployment: process management, reverse proxy, auth, scaling, monitoring.
1313
See **`NATIVE_MCP_GUIDE.md`** for building native Geometra apps that AI agents drive via MCP.
1414
See **`MCP_COOKBOOK.md`** for MCP tool call recipes (proxy and native workflows).
15+
See **`AGENT_NATIVE_UI.md`** for semantic geometry snapshots, stable UI ids, app runtime commands, gateway inspect/actions, trace, and replay.
1516

1617
## Architecture
1718

README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
**The geometry protocol for UI.** Server-computed `{ x, y, w, h }` — not component descriptions — streamed to humans and AI agents over the same socket.
44

55
> **AI Agents:** See [`llms.txt`](llms.txt) for a structured overview of the entire framework — architecture, props, components, protocols, and APIs.
6+
> **Agent-native apps:** See [`AGENT_NATIVE_UI.md`](AGENT_NATIVE_UI.md) for exact semantic geometry, stable UI ids, runtime commands, gateway inspect/actions, trace, and replay.
67
78
**[Live Demo](https://razroo.github.io/geometra)** | **[npm](https://www.npmjs.com/org/geometra)** | **[GitHub](https://github.com/razroo/geometra)** | **[Auth](https://github.com/razroo/geometra-auth)** | **[Token Registry](https://github.com/razroo/geometra-token-registry)**
89

@@ -194,6 +195,25 @@ To uninstall, remove the server entry from your client's MCP configuration.
194195

195196
See [mcp/README.md](mcp/README.md) for tool details, examples, and source installs from this repo.
196197

198+
## Agent-Native UI Protocol
199+
200+
Geometra can make the UI itself the agent contract: exact semantic geometry, stable UI ids, action policy, before/after replay, and postcondition checks from the same tree that renders to humans.
201+
202+
| Browser automation | Geometra-native UI |
203+
|---|---|
204+
| Infer state from DOM, screenshots, selectors, or OCR | Inspect `semantic.id`, role/name/state, exact bounds, and action contracts directly |
205+
| Click guessed selectors or coordinates | Click/focus/type by stable UI id |
206+
| Audit backend calls separately from what was visible | Replay frame-before/frame-after geometry plus action trace |
207+
| Add policy and approval as extra app logic | Carry risk, confirmation, input/output schemas, and postconditions with the UI action |
208+
209+
Run the end-to-end external-agent flow:
210+
211+
```bash
212+
bun run demo:agent-native:http
213+
```
214+
215+
It starts a local gateway, calls `/inspect`, requests and approves `approve-payout`, reads `/replay`, and writes `examples/replays/claims-review.json`.
216+
197217
## Agent-native roadmap
198218

199219
The next layer is explicit agent contracts on top of geometry: stable action ids, risk classes, policy gates, traces, and replay. Start with:
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Agent-Native Benchmark Methodology
2+
3+
`scripts/benchmark-agent-native-value.mjs` is a deterministic value harness. It is not a lab latency benchmark; it is a repeatable scenario model for comparing how much work an AI agent must do when the UI is a native protocol versus when the agent infers state from browser or vision surfaces.
4+
5+
## Scenarios
6+
7+
The scenario data lives in `benchmarks/agent-native-scenarios.json`.
8+
9+
Each scenario describes an enterprise workflow where the agent must inspect a UI state, choose an action, respect policy or approval rules, execute the action, and prove what happened afterward.
10+
11+
Current scenarios:
12+
13+
- `claims-review`: review claim evidence, approve payout, and export audit evidence.
14+
- `compliance-queue`: classify evidence, attach a reason code, and escalate a sanctions hit.
15+
- `access-admin`: review privileged access, approve a temporary role, and export approval evidence.
16+
17+
## Modes
18+
19+
- `geometra-native`: the app exposes semantic geometry snapshots, stable node/action ids, policy metadata, and replayable before/after frames directly through Geometra.
20+
- `geometra-mcp`: the agent uses Geometra MCP/proxy semantics against a web surface. This is still structured, but the app is not itself the native protocol.
21+
- `playwright-mcp`: the agent uses browser automation primitives, DOM/a11y queries, selectors, and manual orchestration.
22+
- `vision-computer-use`: the agent uses screenshot or OCR-style inference and coordinate actions.
23+
24+
## Metrics
25+
26+
- `contextBytes`: approximate structured context the agent must inspect to complete the workflow.
27+
- `toolCalls`: round trips required to inspect, act, wait, verify, and export/replay.
28+
- `medianLatencyMs`: representative median flow latency for the modeled mode.
29+
- `successRate`: expected workflow completion rate under realistic UI variance.
30+
- `humanApprovals`: required human policy checkpoints.
31+
- `securityFailures`: modeled cases where the agent could act on the wrong target, stale state, or insufficiently audited surface.
32+
- `replayable`: whether before/after UI state is available as structured replay data.
33+
- `postconditionChecks`: explicit structured checks attached to the completed action.
34+
35+
## Assertions
36+
37+
`bun run benchmark:agent-native:assert` validates that:
38+
39+
- every scenario contains all required modes and metrics
40+
- native Geometra uses no more context or tool calls than every non-native baseline
41+
- native success rate is not lower than any baseline
42+
- native security failures remain `0`
43+
- native mode is replayable
44+
- native mode includes at least one postcondition check
45+
46+
## Interpreting Results
47+
48+
The most important comparison is not raw speed. The product claim is:
49+
50+
> Browser automation infers what happened. Geometra-native apps expose what happened as the UI protocol.
51+
52+
That shows up as fewer context bytes, fewer tool calls, fewer wrong-target/security failures, and a replay record that includes exact semantic geometry for the frame the agent acted on.
53+
54+
Use the harness for product positioning and regression guardrails. Use separate live benchmarks when measuring actual transport, renderer, or network latency.

0 commit comments

Comments
 (0)