@tangle-network/agent-app is the shared application-shell framework for Tangle agent products. The substrate packages are the engine; this is the shell those products otherwise fork-duplicate. A reference consumer is 100% on agent-app for shell mechanism.
Does the capability make sense WITHOUT a specific app's tool side-channel / approval queue / chat route? YES → it's an ENGINE concern → it belongs in
@tangle-network/agent-eval/agent-runtime/agent-integrations/tcloud/sandbox. If it's not there yet, contribute it down (additive export). Do NOT reimplement it here. NO → it's app-shell → it belongs here.
Corollary — extend, never duplicate. Before writing anything that completes, scores, runs a loop, parses a tool name, or talks to a hub, check what the engines already export. We shipped a bug doing this: eval reimplemented verifyCompletion/weightedComposite that agent-eval already exports — now it re-exports them and keeps only the bridge. (git log "de-duplicate against agent-eval".)
- Engine =
peerDependency, never a bundleddependency. The product pins the engine version → no BOM lock, no forced fleet bumps. agent-integrations + agent-eval are peers; the consumer installs them. - Compose by seam, not by import. agent-app owns mechanism + control flow; the product supplies domain through typed config/callbacks (
AppToolHandlers,AppToolTaxonomy,verifyToken,streamTurn,executeToolCall,KeyProvisioner/WorkspaceKeyStore/KeyCrypto,apiKeyResolver,BrokerTokenMinter). Never import product code. Never bake a domain value (a proposal type, a premium, a disclaimer, a rubric) — it's a parameter. - Structural over hard-dep where possible.
/tangleand/billingtake the tcloud client as a structural contract (no tcloud dep). Prefer that to a dep when the surface is small. - Substrate-free is a feature.
/runtime,/web,/crypto,/redactimport nothing — they're pure mechanism behind callback seams. Keep them that way. - Additive subpaths. New capability = new
./subpath(entry intsup.config.ts+exportsinpackage.json+ root barrel). Never a breaking change to an existing export.
| Subpath | Owns (app-shell) | Composes (peer/structural) |
|---|---|---|
/tools |
structured agent→app side channel (submit_proposal/schedule_followup/render_ui/add_citation): OpenAI defs, MCP-server builder (buildHttpMcpServer/buildAppToolMcpServer), HTTP route handler, runtime executor, capability auth, ToolInputError |
— |
/runtime |
streamAppToolLoop/runAppToolLoop (bounded turn tool-loop) + resolveTangleModelConfig + toLoopEvents/createOpenAICompatStreamTurn (sandbox-free browser/edge copilot adapter — OpenAI-compat stream → LoopEvents, fragmented tool-calls assembled) |
examples/browser-copilot.md) |
/eval |
producedFromToolEvents (side-channel→RuntimeEventLike bridge) + createTokenRecallChecker |
re-exports agent-eval's verifyCompletion/extractProducedState/weightedComposite/createLlmCorrectnessChecker |
/integrations |
hub /exec client + resolveIntegrationAction + invokeIntegrationHub (wiring) |
peer-dep @tangle-network/agent-integrations (the engine/catalog) |
/tangle |
app-registration consent URL + cached broker-token provider | structural TangleAppsClient (from agent-integrations) |
/billing |
per-workspace budget-capped key manager (mint/rotate/rollover/usage) | structural tcloud provisioner + store + crypto seams |
/missions |
durable multi-step mission orchestration: guarded status/step machine + cursor + cost ledger over a MissionStorePort (CAS updates → typed conflict, opaque extras insert passthrough for product columns), idempotent plan engine (cached-done short-circuit, cursor reconciliation, retryable-vs-deterministic failure, detached-session polling), budget/classification/volume gates that park as waiting_approval, :::mission parser, client-safe live-event reducer + the canonical StepAgentActivity per-step delegated-run lane (step.updated snapshot, latest-wins) |
— (substrate-free; product supplies storage, SandboxDispatch, approvals port, classifyStep) |
/delegation |
the agent-runtime driven-loop MCP server entry (opt-in) | — |
/trace |
flow observability: FlowSpan/FlowTrace + ASCII waterfall/histogram renderers; mission trace bridge (createMissionTraceContext/childSpanContext/traceEnv — 32-hex/16-hex ids + the TRACE_ID/PARENT_SPAN_ID env pair agent-runtime's readTraceContextFromEnv inherits); delegation→FlowSpan converters (delegationActivityToFlowSpans, loopTraceEventsToFlowSpans over a structural LoopTraceEventLike, composeMissionFlowTrace, stepActivityFlowTrace) |
— (pure data; id formats byte-match agent-runtime's OTLP export, no import) |
/web-react |
shared chat-shell + observability components: ModelPicker/EffortPicker/ChatMessages/RunDrillIn, MissionActivityLane (per-step delegated-run sub-rows → web waterfall), AgentActivityPanel (cross-context delegation surface over a fetchActivity(cursor) data port, missionRef link slot), FlowWaterfall + pure waterfallLayout/mergeActivityPages helpers |
react peer; renders /missions lanes via /trace converters |
/crypto /web /redact /stream |
AES-GCM field crypto · web boundary utils (body/context/rate-limit/headers) · PII redaction · SSE normalization + turn identity | — |
The sandbox runs full agent harnesses — skills, tools, sub-agents, MCP, bash, python — invoked through prompts. Products built on agent-app coordinate UI, durability, approvals, and billing around the agent. They never do the agent's work for it, and agent-app must never make it easy to.
- Intelligence and tooling live in the agent; durability and money live in the platform. Reasoning, tool selection, installation, evidence gathering, content production → a prompt to an agent session. Surviving restarts, gating spend, pausing for approval → platform code (product or this shell).
- Prompts state intents, never implementations. No shell commands, CLI flags, or install scripts inside system prompts, plan steps, or directives. Name the outcome and the evidence path; the executing agent chooses tools at execution time.
- No domain logic in execution infrastructure. Engines, dispatchers, and schedulers must not pattern-match intents or embed per-vertical scripts. Vertical knowledge belongs in prompt directives and product content, the layers the agent reads. (Shell corollary of the engine/shell rule: domain is a parameter, never baked.)
- Don't rebuild harness or platform primitives. The sandbox SDK already provides durable session execution:
dispatchPrompt({ detach: true })runs the turn server-side after the caller disconnects,findCompletedTurn(turnId)is the idempotent completion check,_sessionStatus/_sessionResultpoll lifecycle, and the session gateway mints read-only JWTs so browsers attach to live streams without the product worker. Autonomous/queue work must dispatch detached and poll — never hold an SSE stream open in a worker to learn that a session finished. What the SDK does NOT provide is multi-step orchestration (sequencing, gates, budgets, schedules) — that is the legitimate product/shell layer. - No text-block data channels. Agent writes (proposals, tasks, records, plans) go through schema-validated tools that fail loud back to the model — never through
:::blocktext conventions scraped from output after the fact (regex parsing drops malformed data silently).:::blocks may exist only as SYSTEM-authored render vocabulary: the platform writes them into persisted messages as UI card anchors; no prompt teaches an agent to author one. (Fleet retirement: creative-agent #299–#301 is the canonical pattern — tool + fail-loud validation + byte-compatible rows + system-side anchor.) - Gate actions, not mechanics. Approvals attach to what an action does (spend, publish, vault writes) classified from intent — not to literal commands.
Three callers, three transports — picking wrong is how durability bugs and overbuilt workers happen:
| Caller | Transport | Why |
|---|---|---|
| Interactive product turn (chat, copilot) | streamPrompt held open for the turn; session-gateway read JWT for the browser to attach directly |
A user is watching; worker lifetime ≈ turn length. The gateway replays buffered events on reconnect, so a dropped tab or worker restart loses nothing. |
| Autonomous product work (missions, queues, crons, scheduled jobs) | dispatchPrompt({ detach: true }) + poll (findCompletedTurn / _sessionStatus) from a durable driver (CF Workflows, DO alarm, queue consumer) |
No consumer exists and workers die in minutes. The platform executes the turn server-side; deterministic session/turn ids make crash re-dispatch a lookup, not a second agent run. Never hold an SSE stream open in a worker to learn that a session finished. |
| Eval agent (agent-eval loops, self-improve, CI) | streamPrompt / runLoop in a long-lived process |
The harness IS the consumer and outlives the run; durability machinery adds nothing — reproducibility comes from scenarios and seeds, and a failed run is re-run, not resumed. |
agent-runtime stays durability-free on purpose: it must run identically in a local eval process, CI, and a sandbox. Durable session execution is the sandbox platform's job; durable orchestration (sequencing, gates, budgets, schedules) is the product/shell layer above it.
The test for new code: "Could the agent in the sandbox do this itself if we told it the intent?" If yes, write the prompt, not the wrapper.
pnpm install
pnpm typecheck && pnpm test && pnpm buildtsup (ESM + d.ts), vitest, tsc. Every change keeps tests green. No Co-Authored-By / AI-attribution in commits (repo-wide). Commit identity is the global git config (Drew Stone <drewstone329@gmail.com>) — never override it.
- Apply the rule above — confirm it's shell, not engine.
- Domain-seam it (typed config; no product import).
- Wire
tsup.config.ts+package.jsonexports+src/index.ts. - Real tests (the seam exercised with a fake; the engine path verified against the real engine where it composes one).
- Prove it on a reference consumer — it stays green.