git clone <repo-url>
cd centaur
brew install justCentaur runs locally on Kubernetes through the Helm chart. Infra secrets are required as pre-created Kubernetes Secrets. For local development, just bootstrap-secrets creates them from your shell environment:
export OP_SERVICE_ACCOUNT_TOKEN=...
export OP_VAULT=...
export SLACK_BOT_TOKEN=...
export SLACK_SIGNING_SECRET=...
export SLACKBOT_API_KEY=...Application-level LLM/tool secrets such as OpenAI and Anthropic tokens stay in 1Password and are loaded by the secrets service.
just up./scripts/dbmate new add_agent_leases
./scripts/dbmate --set overlay new add_org_tables
./scripts/dbmate status
./scripts/dbmate up./scripts/dbmate creates the next numbered SQL file in services/api/db/migrations by default, or in services/api/db/migrations inside the mounted overlay when you pass --set overlay. up, migrate, and status run against both the core and overlay migration sets unless you pin a specific set. Each set has its own dbmate migrations table so overlay repos can extend the shared Postgres database without version collisions. If DATABASE_URL is not set in your shell, the wrapper reuses the API deployment's configured value through kubectl exec.
From inside the API deployment (localhost bypass — no key needed):
THREAD_KEY=test-e2e-1
SPAWN=$(kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/spawn \
-H "Content-Type: application/json" \
-d "{\"thread_key\":\"${THREAD_KEY}\",\"harness\":\"amp\"}")
ASSIGNMENT_GENERATION=$(printf '%s' "$SPAWN" | jq -r '.assignment_generation')
kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/message \
-H "Content-Type: application/json" \
-d "{\"thread_key\":\"${THREAD_KEY}\",\"assignment_generation\":${ASSIGNMENT_GENERATION},\"role\":\"user\",\"parts\":[{\"type\":\"text\",\"text\":\"Reply with exactly PONG and nothing else.\"}]}"
EXECUTE=$(kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/execute \
-H "Content-Type: application/json" \
-d "{\"thread_key\":\"${THREAD_KEY}\",\"assignment_generation\":${ASSIGNMENT_GENERATION},\"harness\":\"amp\",\"delivery\":{\"platform\":\"dev\"}}")
EXECUTION_ID=$(printf '%s' "$EXECUTE" | jq -r '.execution_id')
kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s "http://localhost:8000/agent/executions/${EXECUTION_ID}" | jqOr create a DB-backed key for external use (see API Key Management).
See the architecture diagram in the README.
- User mentions bot in Slack → webhook → slackbot → api
- User mentions bot in Google Chat → webhook → chatbot → api
- API spawns/reuses a Kubernetes sandbox pod (
centaur-agent:latest) for that thread - Executes harness (amp/claude-code/codex) through the sandbox backend
- Harness calls tools via
curlback to API athttp://api:8000(REST, NOT MCP) - LLM API calls route through firewall proxy which injects real credentials
- Results stream as JSON events → posted to Slack or Google Chat
Centaur is a modular service architecture. Each service communicates through well-defined interfaces. As long as you implement these interfaces, you can swap or extend any layer independently.
Client → API (durable control-plane protocol):
Clients (slackbot, CLI, external integrations) should stay thin. They persist input with spawn -> message -> execute, stream or replay output from the durable events endpoint, and only fall back to durable terminal state when the live stream is gone. The API owns runtime assignment, execution serialization, cancellation, and final-delivery recovery; Postgres is the source of truth.
Step 1: Assign or reuse a runtime (POST /agent/spawn)
Pins one warm runtime to the thread and returns the current assignment_generation.
POST /agent/spawn
{
"thread_key": "slack:C0AJ07U8Z1N:1773364194.179929",
"harness": "amp"
}
← {
"thread_key": "slack:C0AJ07U8Z1N:1773364194.179929",
"runtime_id": "rtm_123",
"assignment_generation": 12,
"state": "assigned_idle"
}
Step 2: Persist the user turn (POST /agent/message)
Writes one durable transcript event. Inline base64 image/document blocks are extracted into attachments and rewritten to lightweight attachment_ref parts.
POST /agent/message
{
"thread_key": "slack:C0AJ07U8Z1N:1773364194.179929",
"assignment_generation": 12,
"role": "user",
"parts": [{"type": "text", "text": "analyze this"}],
"user_id": "U123",
"metadata": {"user_name": "alice", "platform": "slack"}
}
← {"ok": true, "message_id": "msg_123"}
Step 3: Enqueue execution (POST /agent/execute)
Creates a durable execution request plus final-delivery obligation. The worker drives the attached container; the response is just the execution handle.
POST /agent/execute
{
"thread_key": "slack:C0AJ07U8Z1N:1773364194.179929",
"assignment_generation": 12,
"harness": "amp",
"delivery": {"platform": "slack"}
}
← {"ok": true, "execution_id": "exe_123", "status": "queued"}
Step 4: Stream or replay output (GET /agent/threads/{thread_key}/events)
Consumers tail durable events for one execution. On disconnect, reconnect with the last seen event id. If the execution already finished and no more rows remain, the API emits the terminal execution_state snapshot.
GET /agent/threads/slack:C0AJ07U8Z1N:1773364194.179929/events?execution_id=exe_123&after_event_id=0
← SSE event: amp_raw_event
← data: {"type":"assistant","message":{...}}
← SSE event: turn.done
← data: {"type":"turn.done","result":"..."}
← SSE event: execution_state
← data: {"status":"completed","result_text":"..."}
Step 5: Release only when you really want to end the assignment (POST /agent/threads/{thread_key}/release)
Releases the thread-to-runtime pin and optionally cancels any non-terminal execution still tied to that assignment generation.
Inspect the active runtime for a thread (GET /agent/runtime?key={thread_key})
Returns {persona_id, persona, harness, engine, overlay: {loaded, mount_api, mount_sandbox, image}, available_personas, …}. Sandboxes call this through call agent runtime '?key='"$CENTAUR_THREAD_KEY"; clients can call it directly to confirm what persona/overlay an assignment is actually running.
Durable state written for one turn:
| Table | What |
|---|---|
agent_runtime_assignments |
Thread-to-runtime pin and active assignment generation |
agent_message_requests |
Durable inbound transcript events |
attachments |
Extracted attachment bytes for inline multimodal content |
agent_execution_requests |
Queued/running/terminal execution row |
agent_execution_events |
Replayable raw + projected execution events |
agent_final_delivery_outbox |
Final-result delivery obligation for reconnect/retry paths |
POST /agent/connect and POST /agent/reconnect are legacy endpoints now kept only as explicit 410 LEGACY_ENDPOINT_REMOVED stubs. Do not build new clients on them.
API → Sandbox (stdin/stdout, NDJSON):
The API communicates with sandbox Pods through the active sandbox backend's attach stream. The wire format is Anthropic message format — this is the canonical protocol between the API and all sandboxes, regardless of which harness runs inside.
→ stdin: {"type":"turn.start","turn_id":1,"text":"analyze this"}
→ stdin: {"type":"turn.start","turn_id":2,"content":[ // Anthropic content blocks
{"type":"text","text":"what is this?"},
{"type":"image","source":{"type":"base64","media_type":"image/png","data":"..."}}
]}
→ stdin: {"type":"interrupt"}
← stdout: {"type":"system","subtype":"init","session_id":"T-..."}
← stdout: {"type":"assistant","message":{"role":"assistant","content":[...]}}
← stdout: {"type":"result","subtype":"success","result":"..."}
← stdout: {"type":"turn.done","turn_id":1,"result":"..."}
Sandbox harness adapter (services/sandbox/harness_session.py):
The sandbox's harness_session.py translates the standard Anthropic format into whatever each harness CLI actually accepts:
| Harness | Translation |
|---|---|
| claude-code | Pass through directly (native Anthropic format) |
| amp | Materialize image/document blocks to files on disk, replace with @/path text mentions (Amp stdin only accepts text blocks) |
| codex / pi-mono | Extract text from content blocks, pass as CLI argument |
This means clients and the API never need to know about harness-specific quirks. They speak Anthropic format; the sandbox adapter handles the rest.
Sandbox → API (REST over Kubernetes services):
Agents call tools via curl $CENTAUR_API_URL/tools/<tool>/<method> over the in-cluster service network. Auth is via CENTAUR_API_KEY injected when the sandbox Pod is created.
The Helm chart installs deny-by-default NetworkPolicies, then explicitly allows the service paths the stack needs: Slackbot to API, API to Postgres/secrets/firewall/Kubernetes, sandbox Pods to API/firewall, DNS, and configured egress.
centaur/
├── services/
│ ├── api/ # FastAPI control plane (standalone service)
│ │ ├── api/ # Python package
│ │ │ ├── routers/ # HTTP endpoints (agent, workflows, admin, health, …)
│ │ │ ├── sandbox/ # Sandbox backend abstraction (Kubernetes)
│ │ │ ├── workflows/# Built-in workflow handlers (agent_turn, slack_thread_turn)
│ │ │ ├── runtime_control.py # Durable execution control-plane
│ │ │ ├── workflow_engine.py # Durable workflow engine (checkpoint/replay)
│ │ │ ├── warm_pool.py # Pre-warmed sandbox pool
│ │ │ ├── vm_metrics.py # Push-based VictoriaMetrics metrics
│ │ │ └── observability.py # Execution observation projections
│ │ ├── Dockerfile
│ │ └── tools.toml # Tool plugin directory config
│ ├── secrets/ # Pluggable secrets manager (standalone service)
│ ├── firewall/ # mitmproxy addon — credential injection proxy
│ ├── sandbox/ # Agent container image (Ubuntu 24.04 + uv + gh + node + bun + amp)
│ ├── slackbot/ # TypeScript + Hono Slack event listener (pnpm/Bun)
│ ├── chatbot/ # TypeScript + Hono Google Chat event listener (pnpm/Bun)
│ ├── grafana/ # Grafana dashboards + provisioning
│ ├── fluentbit/ # Fluent Bit log shipping config
│ └── alloy/ # Grafana Alloy config
├── centaur_sdk/ # Standalone SDK (pip install centaur-sdk)
├── packages/ # Shared packages (api-client, harness-events)
├── tools/ # Open-source tool plugins (auto-discovered)
│ ├── alchemy/ # One directory per tool — each has client.py + pyproject.toml
│ ├── websearch/
│ ├── telegram/
│ └── … # 60+ tool plugins (crypto, research, productivity, infra, …)
├── workflows/ # External workflow definitions (auto-discovered)
│ ├── agent_loop.py # Recurring agent polling/monitoring loop
│ └── multi_step_demo.py # Demo: branching, loops, conditionals
├── scripts/ # Operational scripts
└── Justfile # Local Helm/Kubernetes workflow
- Chat SDK always refers to the Vercel Chat SDK (
~/github/vercel/chat). When you need to understand how the Chat SDK or@chat-adapter/*packages work, always read the source at~/github/vercel/chat— never dig throughnode_modules.
NEVER push changes without testing them locally first. Testing means actually running the affected service and proving the change works end-to-end — not just linting or reasoning about it.
- Build the affected service:
just build-one <service> - Bring it up:
just deploy - Make a real request that exercises the change and show the output
- Only then commit and push
For tool changes: tools hot-reload, so just verify via curl -X POST http://localhost:8000/tools/<tool>/<method> from inside the API deployment. For Dockerfile/infra changes: rebuild, redeploy, and verify the binary/service is present and functional. For firewall changes: test from inside a sandbox pod through the proxy.
All testing and E2E validation MUST happen on the local Kubernetes stack (just up on this machine).
The deploy box is production. Changes reach it via git push → GitHub Actions auto-deploy. The only reasons to SSH into it are:
- Checking logs (
kubectl logs, VictoriaLogs queries) for debugging production issues - Emergency manual intervention — only when the user explicitly asks
For E2E testing, always:
just build-one <service>locallyjust deploylocally- Run curl commands against
localhostthroughkubectl exec -n centaur deploy/centaur-centaur-api -- curl ... - Verify results locally
- Only then commit, push, and let CI/CD handle production
- Python 3.11+,
uvfor deps,rufffor lint/format (line-length=100) services/slackbotusespnpmonly (single lockfile:pnpm-lock.yaml)- All imports at top of file, never inside functions
- Absolute imports only:
from api.X,from centaur_sdk.X - All secrets via env vars or secret manager, never hardcode
asyncpgfor Postgres,pgvectorfor embeddings- Conventional commits:
feat:,fix:,docs:,refactor:,test:,chore:
Each service has its own pyproject.toml and ruff.toml. From the repo root:
uv run ruff check . # lint
uv run ruff format . # auto-fix
uv run pytest # testsCentaur has two plugin types that are auto-discovered at startup and hot-reloaded on file changes — no core code changes required to extend the system.
Tools live in directories listed in tools.toml (plugin_dirs). Each tool is a directory with client.py (class + _client() factory), pyproject.toml, and optional cli.py. The API auto-discovers tools on startup, generates REST endpoints at /tools/{name}/{method}, and hot-reloads on file changes.
client.py: NOload_dotenv(). Secrets viasecret()fromcentaur_sdk.tool_sdk.cli.py: YESload_dotenv()at top. Thin typer wrapper for standalone use.- Methods starting with
_are excluded from registration. - Tool dependencies declared in
pyproject.tomlare installed at image build time.
Example:
# tools/my-tool/client.py
import httpx
class MyToolClient:
def search(self, query: str, limit: int = 10) -> dict:
"""Search for something."""
resp = httpx.get(f"https://api.example.com/search?q={query}&limit={limit}")
return resp.json()
def _client():
return MyToolClient()Workflows live in directories listed in the WORKFLOW_DIRS env var (colon-separated paths, bind-mounted into the API container). Each workflow is a single Python file exporting WORKFLOW_NAME, an async handler(params, ctx), and an optional Input dataclass. See Durable Workflows for the full programming model.
Built-in workflows ship in services/api/api/workflows/. External workflows (like those in the top-level workflows/ directory) are loaded identically — just point WORKFLOW_DIRS at them.
Centaur supports a first-class ordered overlay model, so organizations can extend the base repo without forking or relying on filesystem overlayfs. A common deployment keeps the base repo and an external overlay checkout side by side:
your-deployment/
├── centaur/ # This repo
└── centaur-overlay/ # Org-specific tools, workflows, skills, personas, prompt overlay
The Helm chart supports ordered overlays by mounting an overlay image or prompt content at /app/overlay/org, including its tools/, workflows/, .agents/skills/, persona prompts, and services/sandbox/SYSTEM_PROMPT.md after the base repo content.
Later overlay entries win cleanly when names collide, so the base repo stays generic while deployments can layer in org-specific behavior from outside the checkout.
The workflow engine (workflow_engine.py) provides a checkpoint/replay model inspired by Cloudflare Workflows. The handler function IS the workflow — steps are runtime-discovered via ctx.step(name, fn) calls. The engine checkpoints each step result to Postgres. On resume after crash or suspension, the handler re-executes top-to-bottom but skips steps that already have checkpoints (returning the cached result instantly). Dynamic branching, loops, and conditional logic work naturally because it is just Python.
Every handler receives (params, ctx) where ctx: WorkflowContext provides:
| Primitive | Purpose |
|---|---|
ctx.step(name, fn) |
Execute fn exactly once; return cached result on replay. Supports retry (RetryPolicy) and timeout. |
ctx.sleep(name, duration) |
Suspend the run for duration; checkpoint + resume automatically. |
ctx.sleep_until(name, when) |
Suspend until a specific datetime. |
ctx.wait_for_event(name, event_type, correlation_id) |
Suspend until an external event arrives via POST /workflows/events. |
ctx.start_workflow(name, workflow_name, run_input) |
Create a child workflow run (returns immediately). |
ctx.wait_for_workflow(name, run_id) |
Suspend until a child workflow reaches terminal state. |
ctx.run_workflow(name, workflow_name, run_input) |
Start + wait in one call. |
ctx.start_agent(name, text=…) |
Shorthand: start a child agent_turn workflow. |
ctx.run_agent(name, text=…) |
Shorthand: start + wait for a child agent_turn workflow. |
ctx.log(msg, **kwargs) |
Structured log, suppressed during replay. |
# workflows/my_workflow.py
from dataclasses import dataclass
from typing import Any
from api.workflow_engine import WorkflowContext
WORKFLOW_NAME = "my_workflow"
@dataclass
class Input:
message: str = "hello"
async def handler(inp: Input, ctx: WorkflowContext) -> dict[str, Any]:
greeting = await ctx.step("gather", lambda: {"msg": inp.message})
await ctx.sleep("pause", timedelta(minutes=5))
result = await ctx.run_agent("agent", text=f"Summarize: {greeting['msg']}")
return {"greeting": greeting, "agent_result": result}Runs go through: queued → running → sleeping/waiting → running → … → completed/failed/cancelled.
- Worker pool:
WORKFLOW_WORKER_CONCURRENCYworkers (default 2) poll for claimable runs. - Lease-based fencing: Each worker holds a lease on its run, extended by a heartbeat. If the worker dies, the lease expires and another worker reclaims the run.
- Schedules: Cron-based or interval-based schedules are configured in
workflow_schedulestable and ticked by the worker loop. - External events:
POST /workflows/eventsdelivers events that wake waiting runs. - Child workflows: Parent→child relationships are tracked; cancelling a parent cascels linked executions.
| Endpoint | Purpose |
|---|---|
POST /workflows/runs |
Create a workflow run (workflow_name, input, optional trigger_key for idempotency, eager_start) |
GET /workflows/runs |
List runs (filter by workflow_name, thread_key, status, parent_run_id) |
GET /workflows/runs/{run_id} |
Get run details (status, checkpoints, waiting_on) |
GET /workflows/runs/{run_id}/children |
List child workflow runs |
GET /workflows/runs/{run_id}/checkpoints |
Inspect all checkpoints for a run |
POST /workflows/runs/{run_id}/cancel |
Cancel a run (idempotent for terminal runs) |
POST /workflows/events |
Deliver an external event (event_type, correlation_id, payload) |
| Workflow | Description |
|---|---|
agent_turn |
Single durable agent turn: spawn → message → execute → wait for terminal result. |
slack_thread_turn |
Same as agent_turn but requires a chat thread_key. Used by both the slackbot and the chatbot (Google Chat). Per-platform behavior (mention regex, persona-switch context note, prompt-switch release-id) is dispatched off delivery.platform; the workflow name is intentionally slack_thread_turn rather than a generic comms_thread_turn so durable in-flight runs and the _workflow_request_hash shortcut keep resolving. |
agent_loop |
Recurring agent loop: runs an agent turn every N seconds until the agent signals {"done": true}, max iterations, or deadline. |
| Table | What |
|---|---|
workflow_runs |
Run metadata, status, input/output, parent/root hierarchy |
workflow_checkpoints |
Per-step cached results, linked execution/child-run IDs |
workflow_schedules |
Cron/interval schedule definitions with next_run_at tracking |
workflow_events |
External events for wait_for_event correlation |
1 conversation = 1 Kubernetes sandbox Pod. The API spawns Pods running harness CLIs (amp, claude-code, codex). Inside the Pod, the harness calls back to the API via curl over REST.
The sandbox image bakes services/sandbox/SYSTEM_PROMPT.md into ~/AGENTS.md at build time. On container startup, entrypoint.sh copies it into the workspace root as workspace/AGENTS.md — this is the file that AI harnesses (Amp, Claude Code, Codex) read as their system instructions.
The system prompt tells the agent:
- Identity: it's running inside a Kubernetes sandbox pod, calling back to the API for tool access
- Tools: three kinds — harness built-ins (Read, Bash, etc.), API tools via the
callhelper, and a headless browser callhelper (/usr/local/bin/call): a bash wrapper aroundcurlthat provides a concise syntax for API tool calls.call slack get_channel_history '{"channel":"general"}'instead of a full curl command. Returns TOON format for token efficiency.- Slack messaging: the agent's stdout IS the Slack reply — never call
send_messageon the active thread - Dashboard blocks: fenced code blocks with
dashboardlanguage tag render structured tables, charts, and KPI cards in compatible Centaur clients - Rules: never display secrets, show your work, lead with the answer
The call helper (services/sandbox/call.sh) handles routing:
call <tool> <method> [json]→POST /tools/<tool>/<method>call discover <tool>→GET /tools/<tool>
Legacy call search / call sql shorthands were removed. Sandbox agents should call the concrete tool directly, for example call websearch search '{"query":"..."}' or another deployment-specific query method discovered via call discover <tool>.
The entrypoint supports persona overlays via AGENT_PERSONA. Persona prompts are discovered from the loaded tool directories (including overlays such as ~/centaur-overlay) and appended after the base + org overlay system prompts at container startup.
- Runs under Kubernetes NetworkPolicies with API reachable through the in-cluster service URL
- Entrypoint injects
CENTAUR_API_URLandCENTAUR_API_KEYenv vars - Stub API keys so harnesses init in API-key mode (not browser login)
HTTPS_PROXYroutes LLM calls through the firewall- Resource limits: 4GB memory, 2 CPUs
- Image tagged
centaur-agent:latest - Labels identify Centaur-managed sandboxes and carry thread/harness metadata for discovery/recovery
Sandbox Pods never see real API keys. The firewall (services/firewall/addon.py) intercepts HTTPS and injects credentials from the secrets service:
| Target host | Header | Format |
|---|---|---|
api.anthropic.com |
x-api-key |
raw |
api.openai.com |
authorization |
bearer |
ampcode.com |
authorization |
bearer |
api.github.com |
authorization |
token |
github.com |
authorization |
basic auth |
sandbox_sessionstable: tracks sandbox ID, harness, engine, state, thread key, and thread titlechat_messagestable: stores persisted user/assistant messages for Slackbot delivery and durable transcript surfaces- On API restart, sandbox ownership is re-read from
sandbox_sessions; process-local queues and sockets are rebuilt lazily per sandbox - Pods are still discoverable via Kubernetes labels even if DB state needs reconciliation
- API auth: All callers authenticate with DB-backed API keys (
aiv2_*prefix, stored inapi_keystable). Local in-cluster service calls use the configured bypass paths where applicable. - Sandbox auth: Sandbox Pods get auto-issued HMAC-signed tokens (
sbx1.*prefix) minted by the API. These are short-lived (2h TTL) and scoped toagent+tools:*. - Slack: HMAC-SHA256 signature verification on all webhooks
- Google Chat: Service account app authentication + domain allowlisting on webhooks
- Public edge: The Helm chart exposes public routes only when configured through Ingress, HTTPRoute, or service settings.
- Sandbox isolation: Pods get stub keys only; real keys injected by firewall proxy in-flight
- Filesystem: Host repos mounted read-only by default; only working repo is read-write
- Kubernetes API: The API service account is scoped to the Pod, Secret, exec, attach, and log operations needed to manage sandboxes.
All API authentication uses DB-backed keys stored in the api_keys Postgres table. Keys are managed via the admin API (localhost-only, or requires admin scope).
| Type | Prefix | Issued by | Used by | Scopes |
|---|---|---|---|---|
| DB keys | aiv2_* |
Admin API | Slackbot, CLI, external callers | Per-key (e.g. ["*"], ["agent:execute"]) |
| Sandbox tokens | sbx1.* |
API (automatic) | Sandbox containers → API tool calls | ["agent", "tools:*"] |
- Slackbot:
SLACK_BOT_TOKEN,SLACK_SIGNING_SECRET, andSLACKBOT_API_KEYare injected from the local infra Secret. - Chatbot:
GOOGLE_SERVICE_ACCOUNT_JSON,CHATBOT_API_KEYare injected from the local infra Secret. - Sandbox containers: Auto-issued
sbx1.*token injected asCENTAUR_API_KEYat container creation - Local testing: Use localhost bypass (no key needed from inside the API deployment), or create a key via admin API
Tool credentials (e.g., ANTHROPIC_API_KEY, AMP_API_KEY) are never materialized inside sandboxes or the API service. Tools declare which keys they need in their pyproject.toml and call secret("KEY") to receive a placeholder. Outbound HTTPS traffic is MITM'd by iron-proxy, which substitutes the real credential based on the host/key injection map managed by firewall-manager. iron-proxy resolves op://... references directly against 1Password.
For local development, infra secrets are stored in Kubernetes Secrets created by just bootstrap-secrets; application secrets continue to come from 1Password.
All services write structured JSON logs to stdout. Kubernetes captures pod logs, and optional observability deployments can forward them to VictoriaLogs. VictoriaMetrics receives metrics via push from the API service when enabled.
Service → stdout (JSON) → Kubernetes pod logs → optional log collector → VictoriaLogs/Grafana
This design keeps the local Helm stack minimal while preserving structured logs for collectors.
| Component | Role | Config |
|---|---|---|
| VictoriaLogs | Optional log storage + query engine | External/overlay deployment |
| VictoriaMetrics | Optional metrics storage + query engine | Push-based when enabled |
| Grafana | Optional dashboards + log explorer | External/overlay deployment |
Via Grafana: navigate to Explore → VictoriaLogs and use LogsQL.
Via CLI (from inside the Kubernetes network):
# All logs for a specific thread
kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s "http://victorialogs:9428/select/logsql/query" \
--data-urlencode "query=thread_key:C042WDDP89Y" --data-urlencode "limit=50"
# API errors in the last hour
kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s "http://victorialogs:9428/select/logsql/query" \
--data-urlencode "query=_stream:{service=\"api\"} AND level:error" --data-urlencode "limit=20"
# Firewall audit trail for a time range
kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s "http://victorialogs:9428/select/logsql/query" \
--data-urlencode "query=_stream:{service=\"firewall\"} AND event:proxy_audit" \
--data-urlencode "start=2026-03-10T00:00:00Z" --data-urlencode "end=2026-03-11T00:00:00Z"The firewall emits a structured audit event for every outbound request from sandbox containers: method, host, path, status code, request/response bytes, duration, and source container IP. These are searchable via event:proxy_audit in VictoriaLogs.
The API logs tool calls (event:tool_call_started, event:tool_call_completed), session lifecycle (event:warm_container_claimed), and HTTP requests with thread context.
Services must write single-line JSON to stdout with these fields:
| Field | Required | Description |
|---|---|---|
timestamp |
Yes | ISO 8601 timestamp |
level |
Yes | debug, info, warning, error |
service |
Yes | Service name (api, firewall, secrets) |
event |
Yes | Machine-readable event name |
msg |
No | Human-readable message |
thread_key |
No | Thread identifier (when applicable) |
Never log secret values, auth headers, or raw tokens.
just upAll E2E curl commands below use kubectl exec for localhost bypass (no API key needed).
To test from outside the container, create a DB-backed key via the admin API.
THREAD_KEY=test-e2e-1
SPAWN=$(kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/spawn \
-H "Content-Type: application/json" \
-d "{\"thread_key\":\"${THREAD_KEY}\",\"harness\":\"amp\"}")
ASSIGNMENT_GENERATION=$(printf '%s' "$SPAWN" | jq -r '.assignment_generation')kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/message \
-H "Content-Type: application/json" \
-d "{\"thread_key\":\"${THREAD_KEY}\",\"assignment_generation\":${ASSIGNMENT_GENERATION},\"role\":\"user\",\"parts\":[{\"type\":\"text\",\"text\":\"Reply with exactly PONG and nothing else.\"}]}"EXECUTE=$(kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST http://localhost:8000/agent/execute \
-H "Content-Type: application/json" \
-d "{\"thread_key\":\"${THREAD_KEY}\",\"assignment_generation\":${ASSIGNMENT_GENERATION},\"harness\":\"amp\",\"delivery\":{\"platform\":\"dev\"}}")
EXECUTION_ID=$(printf '%s' "$EXECUTE" | jq -r '.execution_id')kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -N \
"http://localhost:8000/agent/threads/${THREAD_KEY}/events?execution_id=${EXECUTION_ID}&after_event_id=0"If this stream disconnects, reconnect with the last seen event_id as after_event_id. If the execution already finished, the endpoint emits the terminal execution_state snapshot.
kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s "http://localhost:8000/agent/executions/${EXECUTION_ID}" | jq
kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST \
"http://localhost:8000/agent/executions/${EXECUTION_ID}/cancel" \
-H "Content-Type: application/json" \
-d '{}'kubectl exec -n centaur deploy/centaur-centaur-api -- curl -s -X POST "http://localhost:8000/agent/threads/${THREAD_KEY}/release" \
-H "Content-Type: application/json" \
-d '{"release_id":"rel-test-e2e-1","cancel_inflight":true}'kubectl get pods -n centaur -l centaur-agent=true
kubectl exec -n centaur <sandbox-pod> curl -s http://api:8000/health