OpenClaw O11y is an observability stack for OpenClaw agents.
It combines:
- Agent: a lightweight Go agent that watches local OpenClaw state and hosts a local OTLP proxy
- Server: a central Go server that receives both native OpenClaw events and OTLP telemetry, stores them, aggregates data, and serves dashboards
- Web: a React frontend focused on AI-agent business views instead of raw telemetry plumbing
- Plugin: an optional OpenClaw plugin (
openclaw-otel-plugin) that emits rich OpenTelemetry traces, metrics, and logs from inside the agent runtime
The goal is not just to show metrics/logs/traces. The goal is to answer questions like:
- Which run burned the most money?
- Which model is expensive or unstable?
- Which tool is slow or failing?
- Did a session start looping and expanding context?
- Did the agent execute risky shell or filesystem operations?
OpenClaw O11y currently provides two layers of observability:
- Runtime event views
- overview, tokens, sessions, cron, workspace, and live logs
- OTel-native agent observability
- trace call trees
- cost dashboard
- metrics dashboard
- observability health
- security timeline
- context bloat candidates
- Cost Dashboard
- total spend, calls, tokens, average cost per call
- provider-stacked daily cost trend
- model cost breakdown
- Top Expensive Runs
- surfaces the highest-cost sessions directly from root trace summaries
- Cost Flame Graph
- renders a selected trace in
costmode ortokenmode
- renders a selected trace in
- Context Bloat Alert
- flags sessions whose prompt-token growth suggests runaway loops or exploding context
- Deep Trace View
- root run summaries
- nested LLM / tool / subagent spans
- duration, model, provider, cost, tokens, errors, params, outputs
- Tool Reliability Matrix
- calls, errors, error rate, average latency, P95 latency, max latency
- Observability Health
- root recreation
- orphan events
- idle-timeout closures
- lifecycle anomalies emitted by the plugin itself
- High-Risk Operation Timeline
- classifies risky tool calls into categories such as shell, code execution, filesystem mutation, and network access
- shows risk class, reason, params preview, duration, and error state
The stack has four moving parts:
- OpenClaw runtime
- emits native runtime events such as
llm_input,llm_output,before_tool_call,after_tool_call,subagent_*,agent_end
- emits native runtime events such as
- OpenClaw OTEL plugin
- converts runtime events into OTLP traces, metrics, and logs
- sends them to the local OTLP proxy on the worker host
- Go agent
- watches local OpenClaw files and logs
- forwards OTLP payloads to the central server
- collects node metrics
- Go server + React web
- receives OTLP and event data
- stores it in SQLite
- serves business-oriented dashboards and WebSocket updates
High-level flow:
OpenClaw Runtime
-> openclaw-otel-plugin
-> local OTLP proxy on clawo11y-agent
-> central clawo11y server
-> SQLite + aggregations
-> React dashboards
services/agent- worker-side Go process
services/server- central API server, OTLP receiver, aggregation layer
services/web- React frontend
openclaw-otel-plugin- OpenClaw plugin that emits traces, metrics, and logs
scripts/o11y-agent.service- systemd example for the worker agent
scripts/o11y-server.service- systemd example for the central server
Choose the mode that fits your setup.
Use this mode when you want the fastest single-host experience for evaluation or demos.
git clone https://github.com/danl5/clawo11y.git
cd clawo11y
docker compose up -dThis starts:
- the central server + web UI
- the worker-side Go agent
Important notes:
docker-compose.ymlmounts~/.openclawinto the agent container- if your OpenClaw data lives elsewhere, change the left-hand side of that bind mount
- this gets you the runtime-event views immediately
- OTEL-native views still require the OpenClaw plugin to be installed and configured in your OpenClaw environment
Run everything on one machine:
git clone https://github.com/danl5/clawo11y.git
cd clawo11y
chmod +x start.sh
./start.shThe script builds:
- the React frontend
- the Go server
- the Go agent
and starts the local stack.
Recommended for real worker/server separation.
mkdir -p <clawo11y-dir>/bin <clawo11y-dir>/data
# Build frontend assets and the Go server binary
cd <clawo11y-dir>
make build-web build-server
# Start the server directly
./bin/clawo11y-server# Build or download the worker agent binary
cd <clawo11y-dir>
make build-agent
# Start the worker directly
O11Y_SERVER_URL=http://<server-host>:<server-port> ./bin/clawo11y-agentIf you want the server or agent to be managed by systemctl, use the sample service files in scripts/:
# Review and adjust the service file first if ExecStart, WorkingDirectory,
# User, Environment, or network/port settings differ in your deployment.
sudo cp <clawo11y-dir>/scripts/o11y-server.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now o11y-server
# Review and adjust the service file first if ExecStart, WorkingDirectory,
# User, Environment, or endpoint settings differ in your deployment.
sudo cp <clawo11y-dir>/scripts/o11y-agent.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now o11y-agentInstall the plugin from the local path:
openclaw plugins install <clawo11y-dir>/openclaw-otel-pluginThen configure it in ~/.openclaw/openclaw.json:
{
"plugins": {
"entries": {
"@clawo11y/openclaw-otel-plugin": {
"enabled": true,
"config": {
"endpoint": "http://localhost:4318",
"metric_interval_ms": 30000,
"export_timeout_ms": 10000,
"root_idle_timeout_ms": 300000,
"pricing": {
"qwen-max": { "prompt": 1.5, "completion": 4.5 },
"claude-3-opus": { "prompt": 15.0, "completion": 75.0 },
"MiniMax-M2.7": { "input": 0.3, "output": 1.2 },
"MiniMax-M2.7-highspeed": { "input": 0.3, "output": 1.2 }
}
}
}
}
}
}# restart OpenClaw gateway
openclaw gateway restart If you change the agent-side OTLP proxy address, make sure the plugin config.endpoint matches it.
This distinction is important.
You still get the runtime-event experience from the Go agent:
OverviewTokensSessionsCronWorkspace- live event/log streaming
- node/system metrics
This mode depends on:
- local OpenClaw files and logs
- agent-side parsing and event forwarding
This mode does not provide the full OTEL-native product views.
You additionally get the OTEL-native agent observability layer:
Trace- deep call tree, span waterfall, run summaries
Cost- cost dashboard, top expensive runs, cost/token flame graph, context bloat candidates
Metrics- run / llm / tool / subagent / health metrics
Security- high-risk operation timeline
- observability self-health
- root recreation
- orphan events
- idle-timeout closures
This mode depends on:
- the plugin being installed in OpenClaw
- plugin
config.endpointpointing to the worker-local OTLP proxy - OpenClaw runtime events including token/usage data when available
- If you only deploy
server + web + agent, you get the classic runtime-event dashboards. - If you also install the plugin, you unlock the full OTEL-native observability product.
| Variable | Default | Description |
|---|---|---|
O11Y_SERVER_URL |
http://127.0.0.1:8000 |
Central server base URL. |
O11Y_OTLP_PROXY_ADDR |
127.0.0.1:4318 |
Listen address for the worker-local OTLP proxy. |
O11Y_METRICS_INTERVAL_SEC |
60 |
System metric collection interval. |
O11Y_REQUEST_CONCURRENCY |
3 |
Max concurrent outbound requests to the server. |
O11Y_CLIENT_TIMEOUT_SEC |
10 |
HTTP timeout for agent-to-server calls. |
O11Y_CLIENT_RETRY_COUNT |
3 |
Retry count for agent-to-server calls. |
O11Y_CLIENT_RETRY_WAIT_MS |
1000 |
Retry wait time in milliseconds. |
O11Y_OTLP_PROXY_QUEUE_SIZE |
5000 |
OTLP forwarding queue capacity. |
O11Y_OTLP_PROXY_RETRY_INTERVAL_SEC |
5 |
Retry interval when OTLP forwarding fails. |
OPENCLAW_BASE_DIR |
~/.openclaw |
OpenClaw root directory on the worker host. |
GATEWAY_LOG_DIR |
<OPENCLAW_BASE_DIR>/logs |
Optional explicit log directory override. |
| Variable | Default | Description |
|---|---|---|
O11Y_SERVER_ADDR |
0.0.0.0:8000 |
Listen address for the central server. |
O11Y_DB_URL |
sqlite:///./o11y_server.db |
Telemetry database connection string. |
O11Y_DATA_RETENTION_DAYS |
7 |
Data retention window for background cleanup. |
O11Y_SERVER_SHUTDOWN_TIMEOUT_SEC |
5 |
Graceful shutdown timeout. |
The bundled plugin emits:
- Traces
- root
command.process - LLM spans
- tool spans
- subagent spans
- root
- Metrics
- run, llm, tool, subagent, security, and observability-health metrics
- Logs
- lifecycle logs such as
run.started,llm.finished,tool.failed,subagent.finished - anomaly logs such as root recreation, orphan events, and idle-timeout closures
- security logs such as
security.high_risk_tool
- lifecycle logs such as
See:
There are two different token paths in the product:
- Runtime event path
- used by the live
Tokensand session/event views - depends on parsed OpenClaw runtime events
- used by the live
- OTEL path
- used by
Trace,Cost,Metrics, andContext Bloat - depends on
llm_output.usagebeing present and forwarded by the plugin
- used by
If your provider does not expose usage consistently, cost and token views on the OTEL side may be incomplete.
# Terminal 1: central server
cd services/server
go run .
# Terminal 2: frontend
cd services/web
npm install
npm run dev
# Terminal 3: worker agent
cd services/agent
go run .For plugin development:
cd openclaw-otel-plugin
npm install
npm run buildAfter rebuilding the plugin, restart OpenClaw / gateway so the new plugin code is loaded.
OpenClaw O11y is no longer just a local log viewer. The product direction is:
- FinOps for AI
- cost attribution
- expensive-run analysis
- context bloat detection
- Agent Debugging
- deep trace drill-down
- tool reliability
- run lifecycle health
- Security & Compliance
- high-risk operation audit timeline
- structured risk classification
The long-term goal is to become an observability platform for AI workforces, not only a trace viewer.
Happy observing. May your cache hit rates be high and your hallucinations be low.
