Centaur — Developer Guide

Quick Start

1. Clone and configure

git clone <repo-url>
cd centaur
brew install just

Centaur runs locally on Kubernetes through the Helm chart. Infra secrets are required as pre-created Kubernetes Secrets. For local development, just bootstrap-secrets creates them from your shell environment:

export OP_SERVICE_ACCOUNT_TOKEN=...
export OP_VAULT=...
export SLACK_BOT_TOKEN=...
export SLACK_SIGNING_SECRET=...
export SLACKBOT_API_KEY=...

Application-level LLM/tool secrets such as OpenAI and Anthropic tokens stay in 1Password and are loaded by the secrets service.

2. Boot the stack

just up

Database migrations

./scripts/dbmate new add_agent_leases
./scripts/dbmate --set overlay new add_org_tables
./scripts/dbmate status
./scripts/dbmate up

./scripts/dbmate creates the next numbered SQL file in services/api/db/migrations by default, or in services/api/db/migrations inside the mounted overlay when you pass --set overlay. up, migrate, and status run against both the core and overlay migration sets unless you pin a specific set. Each set has its own dbmate migrations table so overlay repos can extend the shared Postgres database without version collisions. If DATABASE_URL is not set in your shell, the wrapper reuses the API deployment's configured value through kubectl exec.

3. Test

From inside the API deployment (localhost bypass — no key needed):

THREAD_KEY=test-e2e-1

SPAWN=$(kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/spawn \
  -H "Content-Type: application/json" \
  -d "{\"thread_key\":\"${THREAD_KEY}\",\"harness\":\"amp\"}")
ASSIGNMENT_GENERATION=$(printf '%s' "$SPAWN" | jq -r '.assignment_generation')

kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/message \
  -H "Content-Type: application/json" \
  -d "{\"thread_key\":\"${THREAD_KEY}\",\"assignment_generation\":${ASSIGNMENT_GENERATION},\"role\":\"user\",\"parts\":[{\"type\":\"text\",\"text\":\"Reply with exactly PONG and nothing else.\"}]}"

EXECUTE=$(kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/execute \
  -H "Content-Type: application/json" \
  -d "{\"thread_key\":\"${THREAD_KEY}\",\"assignment_generation\":${ASSIGNMENT_GENERATION},\"harness\":\"amp\",\"delivery\":{\"platform\":\"dev\"}}")
EXECUTION_ID=$(printf '%s' "$EXECUTE" | jq -r '.execution_id')

kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s "http://localhost:8000/agent/executions/${EXECUTION_ID}" | jq

Or create a DB-backed key for external use (see API Key Management).

Architecture

See the architecture diagram in the README.

End-to-End Request Flow

User mentions bot in Slack → webhook → slackbot → api
User mentions bot in Google Chat → webhook → chatbot → api
API spawns/reuses a Kubernetes sandbox pod (centaur-agent:latest) for that thread
Executes harness (amp/claude-code/codex) through the sandbox backend
Harness calls tools via curl back to API at http://api:8000 (REST, NOT MCP)
LLM API calls route through firewall proxy which injects real credentials
Results stream as JSON events → posted to Slack or Google Chat

Service Interface Contracts

Centaur is a modular service architecture. Each service communicates through well-defined interfaces. As long as you implement these interfaces, you can swap or extend any layer independently.

Client → API (durable control-plane protocol):

Clients (slackbot, CLI, external integrations) should stay thin. They persist input with spawn -> message -> execute, stream or replay output from the durable events endpoint, and only fall back to durable terminal state when the live stream is gone. The API owns runtime assignment, execution serialization, cancellation, and final-delivery recovery; Postgres is the source of truth.

Step 1: Assign or reuse a runtime (POST /agent/spawn)

Pins one warm runtime to the thread and returns the current assignment_generation.

POST /agent/spawn
{
  "thread_key": "slack:C0AJ07U8Z1N:1773364194.179929",
  "harness": "amp"
}

← {
    "thread_key": "slack:C0AJ07U8Z1N:1773364194.179929",
    "runtime_id": "rtm_123",
    "assignment_generation": 12,
    "state": "assigned_idle"
  }

Step 2: Persist the user turn (POST /agent/message)

Writes one durable transcript event. Inline base64 image/document blocks are extracted into attachments and rewritten to lightweight attachment_ref parts.

POST /agent/message
{
  "thread_key": "slack:C0AJ07U8Z1N:1773364194.179929",
  "assignment_generation": 12,
  "role": "user",
  "parts": [{"type": "text", "text": "analyze this"}],
  "user_id": "U123",
  "metadata": {"user_name": "alice", "platform": "slack"}
}

← {"ok": true, "message_id": "msg_123"}

Step 3: Enqueue execution (POST /agent/execute)

Creates a durable execution request plus final-delivery obligation. The worker drives the attached container; the response is just the execution handle.

POST /agent/execute
{
  "thread_key": "slack:C0AJ07U8Z1N:1773364194.179929",
  "assignment_generation": 12,
  "harness": "amp",
  "delivery": {"platform": "slack"}
}

← {"ok": true, "execution_id": "exe_123", "status": "queued"}

Step 4: Stream or replay output (GET /agent/threads/{thread_key}/events)

Consumers tail durable events for one execution. On disconnect, reconnect with the last seen event id. If the execution already finished and no more rows remain, the API emits the terminal execution_state snapshot.

GET /agent/threads/slack:C0AJ07U8Z1N:1773364194.179929/events?execution_id=exe_123&after_event_id=0

← SSE event: amp_raw_event
← data: {"type":"assistant","message":{...}}
← SSE event: turn.done
← data: {"type":"turn.done","result":"..."}
← SSE event: execution_state
← data: {"status":"completed","result_text":"..."}

Step 5: Release only when you really want to end the assignment (POST /agent/threads/{thread_key}/release)

Releases the thread-to-runtime pin and optionally cancels any non-terminal execution still tied to that assignment generation.

Inspect the active runtime for a thread (GET /agent/runtime?key={thread_key})

Returns {persona_id, persona, harness, engine, overlay: {loaded, mount_api, mount_sandbox, image}, available_personas, …}. Sandboxes call this through call agent runtime '?key='"$CENTAUR_THREAD_KEY"; clients can call it directly to confirm what persona/overlay an assignment is actually running.

Durable state written for one turn:

Table	What
`agent_runtime_assignments`	Thread-to-runtime pin and active assignment generation
`agent_message_requests`	Durable inbound transcript events
`attachments`	Extracted attachment bytes for inline multimodal content
`agent_execution_requests`	Queued/running/terminal execution row
`agent_execution_events`	Replayable raw + projected execution events
`agent_final_delivery_outbox`	Final-result delivery obligation for reconnect/retry paths

POST /agent/connect and POST /agent/reconnect are legacy endpoints now kept only as explicit 410 LEGACY_ENDPOINT_REMOVED stubs. Do not build new clients on them.

API → Sandbox (stdin/stdout, NDJSON):

The API communicates with sandbox Pods through the active sandbox backend's attach stream. The wire format is Anthropic message format — this is the canonical protocol between the API and all sandboxes, regardless of which harness runs inside.

→ stdin:  {"type":"turn.start","turn_id":1,"text":"analyze this"}
→ stdin:  {"type":"turn.start","turn_id":2,"content":[             // Anthropic content blocks
             {"type":"text","text":"what is this?"},
             {"type":"image","source":{"type":"base64","media_type":"image/png","data":"..."}}
           ]}
→ stdin:  {"type":"interrupt"}

← stdout: {"type":"system","subtype":"init","session_id":"T-..."}
← stdout: {"type":"assistant","message":{"role":"assistant","content":[...]}}
← stdout: {"type":"result","subtype":"success","result":"..."}
← stdout: {"type":"turn.done","turn_id":1,"result":"..."}

Sandbox harness adapter (services/sandbox/harness_session.py):

The sandbox's harness_session.py translates the standard Anthropic format into whatever each harness CLI actually accepts:

Harness	Translation
claude-code	Pass through directly (native Anthropic format)
amp	Materialize image/document blocks to files on disk, replace with `@/path` text mentions (Amp stdin only accepts text blocks)
codex / pi-mono	Extract text from content blocks, pass as CLI argument

This means clients and the API never need to know about harness-specific quirks. They speak Anthropic format; the sandbox adapter handles the rest.

Sandbox → API (REST over Kubernetes services):

Agents call tools via curl $CENTAUR_API_URL/tools/<tool>/<method> over the in-cluster service network. Auth is via CENTAUR_API_KEY injected when the sandbox Pod is created.

Network Isolation

The Helm chart installs deny-by-default NetworkPolicies, then explicitly allows the service paths the stack needs: Slackbot to API, API to Postgres/secrets/firewall/Kubernetes, sandbox Pods to API/firewall, DNS, and configured egress.

Directory Structure

centaur/
├── services/
│   ├── api/              # FastAPI control plane (standalone service)
│   │   ├── api/          # Python package
│   │   │   ├── routers/  # HTTP endpoints (agent, workflows, admin, health, …)
│   │   │   ├── sandbox/  # Sandbox backend abstraction (Kubernetes)
│   │   │   ├── workflows/# Built-in workflow handlers (agent_turn, slack_thread_turn)
│   │   │   ├── runtime_control.py   # Durable execution control-plane
│   │   │   ├── workflow_engine.py   # Durable workflow engine (checkpoint/replay)
│   │   │   ├── warm_pool.py         # Pre-warmed sandbox pool
│   │   │   ├── vm_metrics.py        # Push-based VictoriaMetrics metrics
│   │   │   └── observability.py     # Execution observation projections
│   │   ├── Dockerfile
│   │   └── tools.toml    # Tool plugin directory config
│   ├── secrets/          # Pluggable secrets manager (standalone service)
│   ├── firewall/         # mitmproxy addon — credential injection proxy
│   ├── sandbox/          # Agent container image (Ubuntu 24.04 + uv + gh + node + bun + amp)
│   ├── slackbot/         # TypeScript + Hono Slack event listener (pnpm/Bun)
│   ├── chatbot/          # TypeScript + Hono Google Chat event listener (pnpm/Bun)
│   ├── grafana/          # Grafana dashboards + provisioning
│   ├── fluentbit/        # Fluent Bit log shipping config
│   └── alloy/            # Grafana Alloy config
├── centaur_sdk/          # Standalone SDK (pip install centaur-sdk)
├── packages/             # Shared packages (api-client, harness-events)
├── tools/                # Open-source tool plugins (auto-discovered)
│   ├── alchemy/          # One directory per tool — each has client.py + pyproject.toml
│   ├── websearch/
│   ├── telegram/
│   └── …                 # 60+ tool plugins (crypto, research, productivity, infra, …)
├── workflows/            # External workflow definitions (auto-discovered)
│   ├── agent_loop.py     # Recurring agent polling/monitoring loop
│   └── multi_step_demo.py       # Demo: branching, loops, conditionals
├── scripts/              # Operational scripts
└── Justfile              # Local Helm/Kubernetes workflow

Terminology

Chat SDK always refers to the Vercel Chat SDK (~/github/vercel/chat). When you need to understand how the Chat SDK or @chat-adapter/* packages work, always read the source at ~/github/vercel/chat — never dig through node_modules.

Testing Before Pushing

NEVER push changes without testing them locally first. Testing means actually running the affected service and proving the change works end-to-end — not just linting or reasoning about it.

Build the affected service: just build-one <service>
Bring it up: just deploy
Make a real request that exercises the change and show the output
Only then commit and push

For tool changes: tools hot-reload, so just verify via curl -X POST http://localhost:8000/tools/<tool>/<method> from inside the API deployment. For Dockerfile/infra changes: rebuild, redeploy, and verify the binary/service is present and functional. For firewall changes: test from inside a sandbox pod through the proxy.

Local-First Testing — Never Touch the Deploy Box

All testing and E2E validation MUST happen on the local Kubernetes stack (just up on this machine). The deploy box is production. Changes reach it via git push → GitHub Actions auto-deploy. The only reasons to SSH into it are:

Checking logs (kubectl logs, VictoriaLogs queries) for debugging production issues
Emergency manual intervention — only when the user explicitly asks

For E2E testing, always:

just build-one <service> locally
just deploy locally
Run curl commands against localhost through kubectl exec -n centaur deploy/centaur-centaur-api -- curl ...
Verify results locally
Only then commit, push, and let CI/CD handle production

Code Conventions

Python 3.11+, uv for deps, ruff for lint/format (line-length=100)
services/slackbot uses pnpm only (single lockfile: pnpm-lock.yaml)
All imports at top of file, never inside functions
Absolute imports only: from api.X, from centaur_sdk.X
All secrets via env vars or secret manager, never hardcode
asyncpg for Postgres, pgvector for embeddings
Conventional commits: feat:, fix:, docs:, refactor:, test:, chore:

Lint & Test

Each service has its own pyproject.toml and ruff.toml. From the repo root:

uv run ruff check .          # lint
uv run ruff format .         # auto-fix
uv run pytest                # tests

Plugin System — Tools & Workflows

Centaur has two plugin types that are auto-discovered at startup and hot-reloaded on file changes — no core code changes required to extend the system.

Tool Plugins

Tools live in directories listed in tools.toml (plugin_dirs). Each tool is a directory with client.py (class + _client() factory), pyproject.toml, and optional cli.py. The API auto-discovers tools on startup, generates REST endpoints at /tools/{name}/{method}, and hot-reloads on file changes.

client.py: NO load_dotenv(). Secrets via secret() from centaur_sdk.tool_sdk.
cli.py: YES load_dotenv() at top. Thin typer wrapper for standalone use.
Methods starting with _ are excluded from registration.
Tool dependencies declared in pyproject.toml are installed at image build time.

Example:

# tools/my-tool/client.py
import httpx

class MyToolClient:
    def search(self, query: str, limit: int = 10) -> dict:
        """Search for something."""
        resp = httpx.get(f"https://api.example.com/search?q={query}&limit={limit}")
        return resp.json()

def _client():
    return MyToolClient()

Workflow Plugins

Workflows live in directories listed in the WORKFLOW_DIRS env var (colon-separated paths, bind-mounted into the API container). Each workflow is a single Python file exporting WORKFLOW_NAME, an async handler(params, ctx), and an optional Input dataclass. See Durable Workflows for the full programming model.

Built-in workflows ship in services/api/api/workflows/. External workflows (like those in the top-level workflows/ directory) are loaded identically — just point WORKFLOW_DIRS at them.

Ordered Overlays

Centaur supports a first-class ordered overlay model, so organizations can extend the base repo without forking or relying on filesystem overlayfs. A common deployment keeps the base repo and an external overlay checkout side by side:

your-deployment/
├── centaur/              # This repo
└── centaur-overlay/      # Org-specific tools, workflows, skills, personas, prompt overlay

The Helm chart supports ordered overlays by mounting an overlay image or prompt content at /app/overlay/org, including its tools/, workflows/, .agents/skills/, persona prompts, and services/sandbox/SYSTEM_PROMPT.md after the base repo content.

Later overlay entries win cleanly when names collide, so the base repo stays generic while deployments can layer in org-specific behavior from outside the checkout.

Durable Workflows

The workflow engine (workflow_engine.py) provides a checkpoint/replay model inspired by Cloudflare Workflows. The handler function IS the workflow — steps are runtime-discovered via ctx.step(name, fn) calls. The engine checkpoints each step result to Postgres. On resume after crash or suspension, the handler re-executes top-to-bottom but skips steps that already have checkpoints (returning the cached result instantly). Dynamic branching, loops, and conditional logic work naturally because it is just Python.

WorkflowContext API

Every handler receives (params, ctx) where ctx: WorkflowContext provides:

Primitive	Purpose
`ctx.step(name, fn)`	Execute fn exactly once; return cached result on replay. Supports `retry` (RetryPolicy) and `timeout`.
`ctx.sleep(name, duration)`	Suspend the run for duration; checkpoint + resume automatically.
`ctx.sleep_until(name, when)`	Suspend until a specific datetime.
`ctx.wait_for_event(name, event_type, correlation_id)`	Suspend until an external event arrives via `POST /workflows/events`.
`ctx.start_workflow(name, workflow_name, run_input)`	Create a child workflow run (returns immediately).
`ctx.wait_for_workflow(name, run_id)`	Suspend until a child workflow reaches terminal state.
`ctx.run_workflow(name, workflow_name, run_input)`	Start + wait in one call.
`ctx.start_agent(name, text=…)`	Shorthand: start a child `agent_turn` workflow.
`ctx.run_agent(name, text=…)`	Shorthand: start + wait for a child `agent_turn` workflow.
`ctx.log(msg, **kwargs)`	Structured log, suppressed during replay.

Writing a workflow

# workflows/my_workflow.py
from dataclasses import dataclass
from typing import Any
from api.workflow_engine import WorkflowContext

WORKFLOW_NAME = "my_workflow"

@dataclass
class Input:
    message: str = "hello"

async def handler(inp: Input, ctx: WorkflowContext) -> dict[str, Any]:
    greeting = await ctx.step("gather", lambda: {"msg": inp.message})
    await ctx.sleep("pause", timedelta(minutes=5))
    result = await ctx.run_agent("agent", text=f"Summarize: {greeting['msg']}")
    return {"greeting": greeting, "agent_result": result}

Workflow lifecycle

Runs go through: queued → running → sleeping/waiting → running → … → completed/failed/cancelled.

Worker pool: WORKFLOW_WORKER_CONCURRENCY workers (default 2) poll for claimable runs.
Lease-based fencing: Each worker holds a lease on its run, extended by a heartbeat. If the worker dies, the lease expires and another worker reclaims the run.
Schedules: Cron-based or interval-based schedules are configured in workflow_schedules table and ticked by the worker loop.
External events: POST /workflows/events delivers events that wake waiting runs.
Child workflows: Parent→child relationships are tracked; cancelling a parent cascels linked executions.

Workflow REST API

Endpoint	Purpose
`POST /workflows/runs`	Create a workflow run (`workflow_name`, `input`, optional `trigger_key` for idempotency, `eager_start`)
`GET /workflows/runs`	List runs (filter by `workflow_name`, `thread_key`, `status`, `parent_run_id`)
`GET /workflows/runs/{run_id}`	Get run details (status, checkpoints, waiting_on)
`GET /workflows/runs/{run_id}/children`	List child workflow runs
`GET /workflows/runs/{run_id}/checkpoints`	Inspect all checkpoints for a run
`POST /workflows/runs/{run_id}/cancel`	Cancel a run (idempotent for terminal runs)
`POST /workflows/events`	Deliver an external event (`event_type`, `correlation_id`, `payload`)

Built-in workflows

Workflow	Description
`agent_turn`	Single durable agent turn: spawn → message → execute → wait for terminal result.
`slack_thread_turn`	Same as `agent_turn` but requires a chat `thread_key`. Used by both the slackbot and the chatbot (Google Chat). Per-platform behavior (mention regex, persona-switch context note, prompt-switch release-id) is dispatched off `delivery.platform`; the workflow name is intentionally `slack_thread_turn` rather than a generic `comms_thread_turn` so durable in-flight runs and the `_workflow_request_hash` shortcut keep resolving.
`agent_loop`	Recurring agent loop: runs an agent turn every N seconds until the agent signals `{"done": true}`, max iterations, or deadline.

Durable state

Table	What
`workflow_runs`	Run metadata, status, input/output, parent/root hierarchy
`workflow_checkpoints`	Per-step cached results, linked execution/child-run IDs
`workflow_schedules`	Cron/interval schedule definitions with next_run_at tracking
`workflow_events`	External events for `wait_for_event` correlation

Agent Sandbox

Overview

1 conversation = 1 Kubernetes sandbox Pod. The API spawns Pods running harness CLIs (amp, claude-code, codex). Inside the Pod, the harness calls back to the API via curl over REST.

How the System Prompt Works

The sandbox image bakes services/sandbox/SYSTEM_PROMPT.md into ~/AGENTS.md at build time. On container startup, entrypoint.sh copies it into the workspace root as workspace/AGENTS.md — this is the file that AI harnesses (Amp, Claude Code, Codex) read as their system instructions.

The system prompt tells the agent:

Identity: it's running inside a Kubernetes sandbox pod, calling back to the API for tool access
Tools: three kinds — harness built-ins (Read, Bash, etc.), API tools via the call helper, and a headless browser
call helper (/usr/local/bin/call): a bash wrapper around curl that provides a concise syntax for API tool calls. call slack get_channel_history '{"channel":"general"}' instead of a full curl command. Returns TOON format for token efficiency.
Slack messaging: the agent's stdout IS the Slack reply — never call send_message on the active thread
Dashboard blocks: fenced code blocks with dashboard language tag render structured tables, charts, and KPI cards in compatible Centaur clients
Rules: never display secrets, show your work, lead with the answer

The call helper (services/sandbox/call.sh) handles routing:

call <tool> <method> [json] → POST /tools/<tool>/<method>
call discover <tool> → GET /tools/<tool>

Legacy call search / call sql shorthands were removed. Sandbox agents should call the concrete tool directly, for example call websearch search '{"query":"..."}' or another deployment-specific query method discovered via call discover <tool>.

Persona System

The entrypoint supports persona overlays via AGENT_PERSONA. Persona prompts are discovered from the loaded tool directories (including overlays such as ~/centaur-overlay) and appended after the base + org overlay system prompts at container startup.

Sandbox Pod Config

Runs under Kubernetes NetworkPolicies with API reachable through the in-cluster service URL
Entrypoint injects CENTAUR_API_URL and CENTAUR_API_KEY env vars
Stub API keys so harnesses init in API-key mode (not browser login)
HTTPS_PROXY routes LLM calls through the firewall
Resource limits: 4GB memory, 2 CPUs
Image tagged centaur-agent:latest
Labels identify Centaur-managed sandboxes and carry thread/harness metadata for discovery/recovery

Credential Injection (Firewall)

Sandbox Pods never see real API keys. The firewall (services/firewall/addon.py) intercepts HTTPS and injects credentials from the secrets service:

Target host	Header	Format
`api.anthropic.com`	`x-api-key`	raw
`api.openai.com`	`authorization`	bearer
`ampcode.com`	`authorization`	bearer
`api.github.com`	`authorization`	token
`github.com`	`authorization`	basic auth

Session Persistence

sandbox_sessions table: tracks sandbox ID, harness, engine, state, thread key, and thread title
chat_messages table: stores persisted user/assistant messages for Slackbot delivery and durable transcript surfaces
On API restart, sandbox ownership is re-read from sandbox_sessions; process-local queues and sockets are rebuilt lazily per sandbox
Pods are still discoverable via Kubernetes labels even if DB state needs reconciliation

Security Model

API auth: All callers authenticate with DB-backed API keys (aiv2_* prefix, stored in api_keys table). Local in-cluster service calls use the configured bypass paths where applicable.
Sandbox auth: Sandbox Pods get auto-issued HMAC-signed tokens (sbx1.* prefix) minted by the API. These are short-lived (2h TTL) and scoped to agent + tools:*.
Slack: HMAC-SHA256 signature verification on all webhooks
Google Chat: Service account app authentication + domain allowlisting on webhooks
Public edge: The Helm chart exposes public routes only when configured through Ingress, HTTPRoute, or service settings.
Sandbox isolation: Pods get stub keys only; real keys injected by firewall proxy in-flight
Filesystem: Host repos mounted read-only by default; only working repo is read-write
Kubernetes API: The API service account is scoped to the Pod, Secret, exec, attach, and log operations needed to manage sandboxes.

API Key Management

All API authentication uses DB-backed keys stored in the api_keys Postgres table. Keys are managed via the admin API (localhost-only, or requires admin scope).

Key types

Type	Prefix	Issued by	Used by	Scopes
DB keys	`aiv2_*`	Admin API	Slackbot, CLI, external callers	Per-key (e.g. `["*"]`, `["agent:execute"]`)
Sandbox tokens	`sbx1.*`	API (automatic)	Sandbox containers → API tool calls	`["agent", "tools:*"]`

How services get their keys

Slackbot: SLACK_BOT_TOKEN, SLACK_SIGNING_SECRET, and SLACKBOT_API_KEY are injected from the local infra Secret.
Chatbot: GOOGLE_SERVICE_ACCOUNT_JSON, CHATBOT_API_KEY are injected from the local infra Secret.
Sandbox containers: Auto-issued sbx1.* token injected as CENTAUR_API_KEY at container creation
Local testing: Use localhost bypass (no key needed from inside the API deployment), or create a key via admin API

Secrets

Tool credentials (e.g., ANTHROPIC_API_KEY, AMP_API_KEY) are never materialized inside sandboxes or the API service. Tools declare which keys they need in their pyproject.toml and call secret("KEY") to receive a placeholder. Outbound HTTPS traffic is MITM'd by iron-proxy, which substitutes the real credential based on the host/key injection map managed by firewall-manager. iron-proxy resolves op://... references directly against 1Password.

For local development, infra secrets are stored in Kubernetes Secrets created by just bootstrap-secrets; application secrets continue to come from 1Password.

Observability & Audit Logs

Architecture

All services write structured JSON logs to stdout. Kubernetes captures pod logs, and optional observability deployments can forward them to VictoriaLogs. VictoriaMetrics receives metrics via push from the API service when enabled.

Service → stdout (JSON) → Kubernetes pod logs → optional log collector → VictoriaLogs/Grafana

This design keeps the local Helm stack minimal while preserving structured logs for collectors.

Components

Component	Role	Config
VictoriaLogs	Optional log storage + query engine	External/overlay deployment
VictoriaMetrics	Optional metrics storage + query engine	Push-based when enabled
Grafana	Optional dashboards + log explorer	External/overlay deployment

Querying logs

Via Grafana: navigate to Explore → VictoriaLogs and use LogsQL.

Via CLI (from inside the Kubernetes network):

# All logs for a specific thread
kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s "http://victorialogs:9428/select/logsql/query" \
  --data-urlencode "query=thread_key:C042WDDP89Y" --data-urlencode "limit=50"

# API errors in the last hour
kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s "http://victorialogs:9428/select/logsql/query" \
  --data-urlencode "query=_stream:{service=\"api\"} AND level:error" --data-urlencode "limit=20"

# Firewall audit trail for a time range
kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s "http://victorialogs:9428/select/logsql/query" \
  --data-urlencode "query=_stream:{service=\"firewall\"} AND event:proxy_audit" \
  --data-urlencode "start=2026-03-10T00:00:00Z" --data-urlencode "end=2026-03-11T00:00:00Z"

Audit logging

The firewall emits a structured audit event for every outbound request from sandbox containers: method, host, path, status code, request/response bytes, duration, and source container IP. These are searchable via event:proxy_audit in VictoriaLogs.

The API logs tool calls (event:tool_call_started, event:tool_call_completed), session lifecycle (event:warm_container_claimed), and HTTP requests with thread context.

Logging contract

Services must write single-line JSON to stdout with these fields:

Field	Required	Description
`timestamp`	Yes	ISO 8601 timestamp
`level`	Yes	`debug`, `info`, `warning`, `error`
`service`	Yes	Service name (`api`, `firewall`, `secrets`)
`event`	Yes	Machine-readable event name
`msg`	No	Human-readable message
`thread_key`	No	Thread identifier (when applicable)

Never log secret values, auth headers, or raw tokens.

E2E Testing (without Slack)

1. Bring up the stack

just up

All E2E curl commands below use kubectl exec for localhost bypass (no API key needed). To test from outside the container, create a DB-backed key via the admin API.

2. Spawn a runtime assignment

THREAD_KEY=test-e2e-1

SPAWN=$(kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/spawn \
  -H "Content-Type: application/json" \
  -d "{\"thread_key\":\"${THREAD_KEY}\",\"harness\":\"amp\"}")
ASSIGNMENT_GENERATION=$(printf '%s' "$SPAWN" | jq -r '.assignment_generation')

3. Persist a message

kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/message \
  -H "Content-Type: application/json" \
  -d "{\"thread_key\":\"${THREAD_KEY}\",\"assignment_generation\":${ASSIGNMENT_GENERATION},\"role\":\"user\",\"parts\":[{\"type\":\"text\",\"text\":\"Reply with exactly PONG and nothing else.\"}]}"

4. Enqueue execution

EXECUTE=$(kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/execute \
  -H "Content-Type: application/json" \
  -d "{\"thread_key\":\"${THREAD_KEY}\",\"assignment_generation\":${ASSIGNMENT_GENERATION},\"harness\":\"amp\",\"delivery\":{\"platform\":\"dev\"}}")
EXECUTION_ID=$(printf '%s' "$EXECUTE" | jq -r '.execution_id')

5. Tail durable events (or reconnect later)

kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -N \
  "http://localhost:8000/agent/threads/${THREAD_KEY}/events?execution_id=${EXECUTION_ID}&after_event_id=0"

If this stream disconnects, reconnect with the last seen event_id as after_event_id. If the execution already finished, the endpoint emits the terminal execution_state snapshot.

6. Inspect or cancel

kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s "http://localhost:8000/agent/executions/${EXECUTION_ID}" | jq

kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST \
  "http://localhost:8000/agent/executions/${EXECUTION_ID}/cancel" \
  -H "Content-Type: application/json" \
  -d '{}'

7. Release the assignment when finished

kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST "http://localhost:8000/agent/threads/${THREAD_KEY}/release" \
  -H "Content-Type: application/json" \
  -d '{"release_id":"rel-test-e2e-1","cancel_inflight":true}'

Debugging

kubectl get pods -n centaur -l centaur-agent=true
kubectl exec -n centaur <sandbox-pod> curl -s http://api:8000/health

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Centaur — Developer Guide

Quick Start

1. Clone and configure

2. Boot the stack

Database migrations

3. Test

Architecture

End-to-End Request Flow

Service Interface Contracts

Network Isolation

Directory Structure

Terminology

Testing Before Pushing

Local-First Testing — Never Touch the Deploy Box

Code Conventions

Lint & Test

Plugin System — Tools & Workflows

Tool Plugins

Workflow Plugins

Ordered Overlays

Durable Workflows

WorkflowContext API

Writing a workflow

Workflow lifecycle

Workflow REST API

Built-in workflows

Durable state

Agent Sandbox

Overview

How the System Prompt Works

Persona System

Sandbox Pod Config

Credential Injection (Firewall)

Session Persistence

Security Model

API Key Management

Key types

How services get their keys

Secrets

Observability & Audit Logs

Architecture

Components

Querying logs

Audit logging

Logging contract

E2E Testing (without Slack)

1. Bring up the stack

2. Spawn a runtime assignment

3. Persist a message

4. Enqueue execution

5. Tail durable events (or reconnect later)

6. Inspect or cancel

7. Release the assignment when finished

Debugging