Hecate is an open-source AI gateway and agent-task runtime. The Go gateway embeds the React operator UI, mediates OpenAI- and Anthropic-shaped client traffic to upstream LLM providers, runs Hecate Chat tools-on turns through visible agent_loop tasks, supervises external coding-agent adapters from Chats, runs queued agent_loop tasks with policy and approval gates, and emits OpenTelemetry traces for everything it does. Companion entrypoints such as hecate-acp handle protocols that need their own process lifecycle. Hecate is gateway-local, deny-by-default, runtime-aware, and storage-tiered (memory / sqlite). Every endpoint, config knob, and error message exists to answer five operator questions: what did the gateway just decide, why, what did it cost, what happens on the next failure, and where is the trace.
cmd/hecate/ hecate binary entry: gateway, embedded UI, MCP subcommand
cmd/hecate-acp/ ACP stdio bridge for editor agent panels
pkg/types/ public types (ChatRequest, Message, ContentBlock, ...)
— no internal/ imports
internal/api/ inbound HTTP shapes + handlers
OpenAIChatMessage, OpenAIMessageContent (uppercase)
internal/providers/ outbound HTTP per provider (openai, anthropic)
openAIChatMessage, openAIMessageContent (lowercase)
— same JSON shape as api/, deliberate duplication
internal/orchestrator/ task runtime (queue, runner, agent_loop, sandbox)
internal/sandbox/ per-call sh subprocess: policy validation,
env sanitisation, output cap, optional
bwrap/sandbox-exec wrapper
internal/taskstate/ task / run / step / artifact / approval persistence
internal/storage/ sqlite client wrappers
internal/retention/ retention worker (subsystems: traces, budget, audit,
provider_history, turn_events,
agent_chat_approvals)
internal/mcp/ stdio MCP server (read tools + write tools)
internal/agentadapters/ ACP/process adapters for Codex, Claude Code, Cursor
internal/agentchat/ Agent Chat transcript persistence and runtime linkage
internal/modelcaps/ model tool-capability merge logic and defaults
ui/ React/Vite operator UI (embedded via //go:embed ui/dist)
e2e/ binary-startup tests, build tag e2e (sub-tags: ollama, docker)
docs/ long-form references (canonical product/runtime docs)
.claude/ Claude Code adapter (slash commands, settings)
.cursor/ Cursor adapter (.mdc rule files)
docs-ai/ canonical, vendor-neutral agent instruction layer (this directory)
The codebase has three concentric rings; cross-ring imports go inward only:
pkg/types/— public types, nointernal/imports. The wire-shape contract.internal/api/— inbound HTTP shapes + handlers. Translates HTTP requests into internal types; never touches providers directly.internal/providers/— outbound HTTP per provider (OpenAI-compat, Anthropic). Translates internal types to provider wire shapes. Never importsinternal/api/.internal/orchestrator/— task runtime (queue, runner,agent_loop, sandbox boundary). Sits above providers, called by api.internal/<feature>/— gateway services (governor, router, retention, taskstate, mcp, …). Each owns one concern.
The api↔providers parallel-struct duplication (OpenAIChatMessage ↔ openAIChatMessage) is intentional. It keeps internal/providers/ free of internal/api/ imports and lets the wire shapes evolve independently. See ../skills/providers/SKILL.md for full reasoning.
Every backend-bound concern (taskstate, chatstate, agentchat, approvals, governor, retention history) ships with two tiers, mirrored exactly:
memory— in-process, default, perfect forgo testandjust dev.sqlite— single-file persistence viamodernc.org/sqlite(no CGO).
When adding a new persisted thing, mirror both. Add a <thing>_test.go that runs against memory and sqlite.
- Go: see
go.modfor the exact pinned version. CGO is not used;modernc.org/sqliteis the pure-Go sqlite driver. - Task runner: just. Use
just <recipe>for repo-level build/test/dev flows; do not add Makefile targets or documentmake ...commands. - UI / website package manager: Bun (pinned via
packageManagerinui/package.jsonandwebsite/package.json). Lockfiles arebun.lock; there is nopackage-lock.json. Usebun install,bun run <script>,bun add <pkg>,bun x <tool>. Do not introduce npm/yarn/pnpm lockfiles or workflow steps. - Native app toolchain: Rust + Cargo via rustup for Tauri work (
tauri/,just tauri-*). Backend and UI-only work should not require Cargo. - UI stack: React 19, TypeScript, Vite, Vitest + Testing Library + jsdom. Plain CSS with design tokens in
ui/src/styles.css— no CSS-in-JS, no utility-class framework. - Critical command distinction:
bun run test≠bun test. The latter skips the testing-library DOM setup and panics withdocument[isPrepared]errors. Alwaysbun run test(which dispatches to vitest).
These earn extra scrutiny; changes here are not drive-by territory.
- Sandbox boundary (
internal/sandbox/) — per-callshsubprocess spawned directly from the gateway after policy validation, env sanitisation, output cap, and a wall-clock timeout (Layer 1). On Linux withbwrapinstalled and on macOS, the call is additionally wrapped bybwrap/sandbox-execfor filesystem and network confinement (Layer 2 — auto-detected at startup viainternal/sandbox/wrapper.go, no opt-in flag). No separatesandboxddaemon — the safety properties run inline. CPU / FD / address-space caps are not applied per-call (setrlimitwould shrink the long-running gateway) — operators who need them run under systemd or in a container with--cpus/--memoryflags. New tool kinds follow the sameinternal/sandbox/shape. Seedocs/sandbox.mdfor the layer model anddocs/agent-runtime.mdfor the network-egress policy that sits on top. - Approval lifecycle (
internal/taskstate,awaiting_approval) — pre-execution and mid-loop approvals halt the run. New gates use the sameTaskApprovalshape. - Retention worker (
internal/retention) — high-cardinality history sweep. Subsystems:trace_snapshots,budget_events,audit_events,provider_history,turn_events,agent_chat_approvals. Persisted things must mirror. - Cost ledger — all money is
int64micro-USD (1_000_000=$1). Neverfloat64. - No auth layer. Every request is processed as the operator. The gateway binds to
127.0.0.1by default; bind elsewhere only behind a reverse proxy or firewall.
Long-form references live in docs/. Update them in the same change as the
code, not as a follow-up. Don't restate their content here — link and move on.
| Question | Doc |
|---|---|
| How does a request flow through the gateway? What are the storage tiers? | docs/architecture.md |
What agent_loop tools exist? What are the system prompt layers? Cost model? |
docs/agent-runtime.md |
| What are the task / run / step / approval HTTP endpoints? | docs/runtime-api.md |
| What does this SSE event payload look like? | docs/events.md |
| What OTel spans and metrics does the gateway emit? | docs/telemetry.md |
| How do I configure a provider? What providers are supported? | docs/providers.md |
| How do I configure MCP? What tools does the server expose? | docs/mcp.md |
| How do Hecate Chat segments and model capabilities work? | docs/chat-sessions.md, docs/rfcs/unified-chats-and-model-capabilities.md |
| How do external Agent Chat adapters work? | docs/external-agent-adapters.md |
| How does an editor ACP host connect to Hecate? | docs/acp.md |
| How do I deploy? What are the Compose profiles? | docs/deployment.md |
How do I build and test locally? What does [skip ci] mean? |
docs/development.md |
| What sandbox isolation layers are shipped? How do namespaces work? | docs/sandbox.md |