Open-source AI gateway and agent-task runtime. The gateway mediates
OpenAI- and Anthropic-shaped client traffic to upstream providers, runs
queued agent_loop tasks behind policy and approval gates, and emits
OpenTelemetry traces. Gateway-local, deny-by-default, storage-tiered
(memory / sqlite). Binds to 127.0.0.1 by default and is intended to run as
a local operator console. The React operator UI is embedded via
//go:embed ui/dist.
This file is the orientation entry — the codebase map, the runtime
invariants, and the gotchas that bite often. It is what an agent
(Claude Code, Codex, Cursor, or human) reaches for when starting work
on this repo. Conventions, workflow, verification, and longer-form
guidance live in docs-ai/.
| Surface | What it carries |
|---|---|
docs-ai/ |
Canonical agent guidance — project context, conventions, workflow, verification, task shapes, area + posture skills |
AGENTS.md (this) and ui/AGENTS.md, internal/providers/AGENTS.md |
Codebase map per area |
CLAUDE.md |
Thin Claude Code adapter pointing to docs-ai/ |
.cursor/rules/ |
Thin Cursor adapter pointing to docs-ai/ |
.claude/commands/ |
Claude Code slash commands: /race, /test-affected |
docs/ |
Long-form references (architecture, runtime API, events, telemetry) |
When in doubt: read docs-ai/core/project-context.md and docs-ai/core/workflow.md.
cmd/hecate/ hecate binary entry: gateway, embedded UI, MCP subcommand
cmd/hecate-acp/ ACP stdio bridge for editor agent panels
pkg/types/ public types (ChatRequest, Message, ContentBlock, ...)
— no internal/ imports
ui/ React/Vite operator UI, embedded via //go:embed ui/dist
tauri/ native desktop app (Tauri 2.x); wraps hecate as a sidecar,
webview loads http://127.0.0.1:{port}/ served by the gateway
scripts/
release.ts cut a release: pre-flight, goreleaser snapshot, Tauri
version stamp, tag, push (`bun scripts/release.ts vX.Y.Z`)
stamp-version.ts stamp Tauri version files to current git tag / TAURI_VERSION
e2e/ binary-startup tests; build tag e2e (sub-tags: ollama, docker)
docs/ long-form references (architecture, runtime API, events, ...)
docs-ai/ canonical agent guidance (this file points there for depth)
internal/
api/ inbound HTTP shapes + handlers (OpenAIChatMessage, uppercase)
providers/ outbound HTTP per provider (openAIChatMessage, lowercase)
— same JSON shape as api/, deliberate duplication
gateway/ top-level request orchestration: governor → router → provider
router/ provider/model selection, failover, retry, circuit
governor/ policy + budget + rate-limit decisions; local cost ledger
policy/ approval policy + provider/model allowlists
catalog/, models/ provider catalog + model registry
billing/ pricebook + invoice/usage rollups (cost tables live here)
orchestrator/ task runtime: queue, runner, agent_loop, sandbox boundary
sandbox/ per-call sh subprocess: policy validation, env sanitisation,
output cap + timeout, auto-detected bwrap/sandbox-exec wrapper
taskstate/ task / run / step / artifact / approval persistence
agentadapters/ ACP/process adapters for Codex, Claude Code, Cursor
agentchat/ Agent Chat transcript persistence (memory / sqlite)
chatstate/ chat-completion conversation persistence
storage/ sqlite client wrappers
retention/ retention worker (subsystems: traces, budget, audit, provider_history, turn_events)
mcp/ stdio MCP server (read tools + write tools)
controlplane/ providers, pricing, settings state
auth/ local operator principal request context
ratelimit/ per-key request limits
requestscope/ per-request principal + tracing context
config/, bootstrap/ env-driven config + startup wiring
secrets/ env-var and file-based secret resolution
telemetry/ OTel exporter wiring + span helpers
profiler/ pprof endpoints + runtime stats
version/ build-time version stamp
Architecture rings (cross-ring imports inward only):
pkg/types/ ← internal/api/ ← internal/providers/
↑
internal/orchestrator/ (sits above api, drives runs through providers)
The api↔providers parallel-struct duplication (OpenAIChatMessage ↔ openAIChatMessage) is intentional — it keeps internal/providers/ free of internal/api/ imports. Full rationale: docs-ai/skills/providers/SKILL.md.
Storage tier rule: every backend-bound concern mirrors two tiers —
memory (default) and sqlite (modernc.org/sqlite, no CGO).
When adding a new persisted thing, mirror both.
Non-negotiable rules of the system. Read them before writing code that touches request handling, persistence, or tool execution.
- Local operator boundary. Every request is processed as the operator. The gateway binds to
127.0.0.1by default; bind elsewhere only behind a reverse proxy, firewall, or equivalent access control. - Sandbox is per-call subprocess, applied inline. Shell, file, and git tool calls spawn a fresh
shfrom inside the gateway after policy validation + env sanitisation + output cap + wall-clock timeout. On Linux withbwrapinstalled and on macOS, the call is additionally wrapped bybwrap/sandbox-execfor filesystem and network confinement (auto-detected at startup). No separate sandbox daemon, no per-call rlimits (those would shrink the long-running gateway). New tools follow the same pattern. - Approvals are blocking. Pre-execution and mid-loop approvals halt the run; the run record persists in
awaiting_approvaluntil resolved. New gates use theTaskApprovalshape. - Events are appended, not mutated. Every state transition writes a
run_eventwith a monotonic sequence. The SSE stream replays fromafter_sequence. New event types must follow the event-protocol v1 taxonomy (run.*,turn.*,tool.*,policy.*,gap.*,error.*) and be documented indocs/events.md. - Cost is
int64micro-USD. Neverfloat64for money — pricebook, budgets, ledger all stay integer (1_000_000=$1). - OTel is first-class. Every request gets a trace ID surfaced in the
X-Trace-Idresponse header and persisted on the run record. New code paths add spans, not just log lines.
Full standards: docs-ai/core/engineering-standards.md.
- Comments explain why, not what. State the trade-off.
- Pointer vs value for optional fields: pointer when zero is a valid
distinct value (
Seed *int,ParallelToolCalls *bool); value withomitemptywhen zero == API default (PresencePenalty float64). json.RawMessagefor forward-compat passthrough fields.- Test naming:
TestPackage_Behavior. Table-driven where the variant set is obvious. - No emojis, no plan/phase labels in commit messages or comments.
- Conventional Commits;
chore(agent):for agent-doc-only changes. Don't auto-commit — propose a message and let the operator merge.
Full ladder: docs-ai/core/verification.md.
- Race suite is the floor for runtime/backend changes:
go test -race -timeout 10m ./...(or/race). Race builds are large; if your default$GOCACHEis on a small volume, point it at the repo:GOCACHE="$(pwd)/.gocache" go test -race .... - Vet Go changes: run
go veton touched packages during iteration; usego vet ./...for broad backend changes or release prep. - Iteration:
/test-affectedfor narrow runs. - E2E:
go test -tags e2e ./e2e/.... Build tage2ealways required; sub-tagsollama,dockeropt in.PROVIDER_FAKE_KIND=localskips pricebook preflight on synthetic models. - UI:
cd ui && bun run typecheckthenbun run test. Neverbun test(skips testing-library DOM setup).
| Task | Where |
|---|---|
| Add a passthrough wire field (the seven-step chain — most-redone task here) | docs-ai/skills/providers/SKILL.md |
| Add an MCP tool / persisted run-event type / test helper cheat-sheet | docs-ai/skills/backend/SKILL.md |
| UI recipes (SSE-driven state field, paired pickers, snapshot refresh) | docs-ai/skills/ui/SKILL.md |
| Native desktop app (sidecar lifecycle, bundling, Tauri commands) | docs-ai/skills/tauri/SKILL.md |
| Cut a release tag | bun scripts/release.ts vX.Y.Z — checks worktree, snapshot dry-run, stamps Tauri versions, tags, pushes. Full procedure: docs-ai/tasks/release.md |
| Stamp Tauri version files | bun scripts/stamp-version.ts (or just tauri-version) — syncs Cargo.toml, package.json, tauri.conf.json to current git tag |
bun run test≠bun test. The latter skips the testing-library DOM setup and panics withdocument[isPrepared]. Alwaysbun run test(which dispatches to vitest).- modernc/sqlite TIME-as-text format: the driver writes
time.Timeusing Go's defaulttime.Time.String()format, which doesn't lex-compare with RFC3339Nano cutoffs and silently breaks the retention sweep. Always write timestamps ast.UTC().Format(time.RFC3339Nano)when the column is TEXT (seeinternal/taskstate/sqlite.goAppendRunEvent). - OpenAI/openAI parallel structs are intentional: don't unify. Mirror fields when adding on either side.
- Streaming
wireReqplumbing: when adding a passthrough field, plumb it into BOTHChatandChatStreamwireReqconstructions ininternal/providers/openai.go. Forgetting one is the most common provider bug — non-stream tests pass; the field silently drops in production for any client usingstream: true. - Capability-cache seeding for provider tests: seed
cachedCapsandcapsExpiryto skip the discovery path. Snippet indocs-ai/skills/providers/SKILL.md. - Pricebook preflight in tests:
PROVIDER_FAKE_KIND=localfor synthetic models in e2e. - mermaid
loopis a reserved keyword: don't use it as a sequence-diagram participant name. UseAgentor similar. - CodeQL CWE-190: don't compute
make([]T, 0, len(x)+N)with arithmetic — use plainlen(x)and letappendgrow. - Env-PRECONFIGURED gate:
PROVIDER_<NAME>_API_KEY/_BASE_URLonly auto-import into the CP store whenPROVIDER_<NAME>_PRECONFIGURED=1is also set. E2E helpers (hecateServer,startHecateProcess) funnel throughautoPreconfiguredEnvto inject the gate; new e2e spawn helpers must do the same or routed requests 400 withno provider supports model …. :8765collisions across launches:just dev/just run/just servenow runjust stopfirst so a stale./hecatefrom another shell never blocks a relaunch (or adocker run -p 8765:8765 …). New scripts that spawn the binary should calljust stop(or replicate thelsof -ti:8765 | xargs killstep).- API response envelope: every Hecate-native
/hecate/v1/*GET returns{object, data}. Compatibility endpoints (/v1/models,/v1/chat/completions,/v1/messages) keep provider-shaped contracts. Don't write a UI client that reads top-level fields — always readpayload.data.<field>for Hecate-native endpoints and make test fixtures mirror the real envelope.
| Doc | Covers |
|---|---|
docs/architecture.md |
Request flow, lease semantics, storage tier matrix |
docs/agent-runtime.md |
agent_loop tools, system prompt layers, cost model, retry-from-turn |
docs/runtime-api.md |
Task / run / step / approval endpoints, queue + lease |
docs/events.md |
Every event type at /hecate/v1/events with payload shapes |
docs/telemetry.md |
OTel spans + metrics, OTLP wiring, status & gaps |
docs/security.md |
Local-first threat model, workspace safety, approvals, secrets, advisories |
docs/providers.md |
Provider catalog, configuration |
docs/mcp.md |
MCP server: tools, transport, configure |
docs/external-agent-adapters.md |
Hecate as an ACP client/operator: Chats runs Codex, Claude Code, and Cursor Agent |
docs/acp.md |
Hecate as an ACP agent: hecate-acp bridge for editor agent panels |
docs/deployment.md |
Compose profiles, image pinning, lost-token recovery |
docs/development.md |
Local build, testing, screenshot tooling, [skip ci] convention |
docs/desktop-app.md |
Native Tauri 2.x app: distribution, current state, roadmap, footguns |