Skip to content

Commit f144071

Browse files
committed
feat: add sdk observability layer
1 parent d14534b commit f144071

65 files changed

Lines changed: 7576 additions & 49 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,4 @@ tests/.tmp/
3232
*.log
3333
*.txt
3434
.kode/
35+
.kode-observability-http/

README.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,48 @@ export OPEN_SANDBOX_ENDPOINT=http://127.0.0.1:8080 # optional
9090
export OPEN_SANDBOX_IMAGE=ubuntu # optional
9191
```
9292

93+
## Observability
94+
95+
KODE keeps observability as an SDK-facing capability first:
96+
97+
- runtime metrics via `agent.getMetricsSnapshot()`
98+
- runtime observations via `agent.getObservationReader()` / `agent.subscribeObservations()`
99+
- optional OTEL bridge via `observability.otel`
100+
- optional persisted observation query via `observability.persistence`
101+
102+
Minimal persisted-observation example:
103+
104+
```typescript
105+
import {
106+
Agent,
107+
JSONStore,
108+
JSONStoreObservationBackend,
109+
createStoreBackedObservationReader,
110+
} from '@shareai-lab/kode-sdk';
111+
112+
const storeDir = './.kode';
113+
const observationBackend = new JSONStoreObservationBackend(storeDir);
114+
115+
const agent = await Agent.create({
116+
templateId: 'assistant',
117+
observability: {
118+
persistence: {
119+
backend: observationBackend,
120+
},
121+
},
122+
}, deps);
123+
124+
const runtimeSnapshot = agent.getMetricsSnapshot();
125+
const runtimeObservations = agent.getObservationReader().listObservations();
126+
127+
const persistedReader = createStoreBackedObservationReader(observationBackend);
128+
const persistedObservations = await persistedReader.listObservations({ limit: 50 });
129+
```
130+
131+
If you want to expose these metrics or observations over HTTP, do it in your application on top of readers/backends, not inside `Agent` itself. `examples/08-observability-http.ts` is an application-layer example, not an SDK-owned HTTP feature.
132+
133+
Run the full example locally with `npm run example:observability-http`.
134+
93135
## Architecture for Scale
94136

95137
For production deployments serving many users, we recommend the **Worker Microservice Pattern**:
@@ -150,6 +192,7 @@ See [docs/en/guides/architecture.md](./docs/en/guides/architecture.md) for detai
150192
| [Concepts](./docs/en/getting-started/concepts.md) | Core concepts explained |
151193
| **Guides** | |
152194
| [Events](./docs/en/guides/events.md) | Three-channel event system |
195+
| [Observability](./docs/en/guides/observability.md) | Metrics, observations, persistence, and app-layer exposure |
153196
| [Tools](./docs/en/guides/tools.md) | Built-in tools & custom tools |
154197
| [E2B Sandbox](./docs/en/guides/e2b-sandbox.md) | E2B cloud sandbox integration |
155198
| [OpenSandbox](./docs/en/guides/opensandbox-sandbox.md) | OpenSandbox self-hosted sandbox integration |

README.zh-CN.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,48 @@ export OPEN_SANDBOX_ENDPOINT=http://127.0.0.1:8080 # 可选
9090
export OPEN_SANDBOX_IMAGE=ubuntu # 可选
9191
```
9292

93+
## 可观测性
94+
95+
KODE 把可观测性优先作为 SDK 能力暴露:
96+
97+
- 运行时指标:`agent.getMetricsSnapshot()`
98+
- 运行时 observation:`agent.getObservationReader()` / `agent.subscribeObservations()`
99+
- 可选 OTEL bridge:`observability.otel`
100+
- 可选持久化 observation 查询:`observability.persistence`
101+
102+
最小持久化 observation 示例:
103+
104+
```typescript
105+
import {
106+
Agent,
107+
JSONStore,
108+
JSONStoreObservationBackend,
109+
createStoreBackedObservationReader,
110+
} from '@shareai-lab/kode-sdk';
111+
112+
const storeDir = './.kode';
113+
const observationBackend = new JSONStoreObservationBackend(storeDir);
114+
115+
const agent = await Agent.create({
116+
templateId: 'assistant',
117+
observability: {
118+
persistence: {
119+
backend: observationBackend,
120+
},
121+
},
122+
}, deps);
123+
124+
const runtimeSnapshot = agent.getMetricsSnapshot();
125+
const runtimeObservations = agent.getObservationReader().listObservations();
126+
127+
const persistedReader = createStoreBackedObservationReader(observationBackend);
128+
const persistedObservations = await persistedReader.listObservations({ limit: 50 });
129+
```
130+
131+
如果你要通过 HTTP 对外暴露这些指标或 observation,应该在你的应用层基于 reader/backend 去包装,而不是让 `Agent` 自己直接监听端口。`examples/08-observability-http.ts` 只是应用层示例,不是 SDK 自带的 HTTP 能力。
132+
133+
可通过 `npm run example:observability-http` 本地运行完整示例。
134+
93135
## 支持的 Provider
94136

95137
| Provider | 流式输出 | 工具调用 | 推理 | 文件 |
@@ -110,6 +152,7 @@ export OPEN_SANDBOX_IMAGE=ubuntu # 可选
110152
| [核心概念](./docs/zh-CN/getting-started/concepts.md) | 核心概念详解 |
111153
| **使用指南** | |
112154
| [事件系统](./docs/zh-CN/guides/events.md) | 三通道事件系统 |
155+
| [可观测性](./docs/zh-CN/guides/observability.md) | 指标、observation、持久化与应用层暴露 |
113156
| [工具系统](./docs/zh-CN/guides/tools.md) | 内置工具与自定义工具 |
114157
| [E2B 沙箱](./docs/zh-CN/guides/e2b-sandbox.md) | E2B 云端沙箱接入 |
115158
| [OpenSandbox 沙箱](./docs/zh-CN/guides/opensandbox-sandbox.md) | OpenSandbox 自托管沙箱接入 |

docs/en/examples/playbooks.md

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,33 @@ const stats = await store.aggregateStats(agent.agentId);
153153

154154
---
155155

156-
## 6. Combined: Approval + Collaboration + Scheduling
156+
## 6. Observability Readers + Application HTTP Wrapper
157+
158+
- **Goal**: Read runtime/persisted observations from the SDK and optionally expose them through your own app-layer HTTP service.
159+
- **Example**: `examples/08-observability-http.ts`
160+
- **Run**: `npm run example:observability-http`
161+
- **Key Steps**:
162+
1. Read point-in-time metrics with `agent.getMetricsSnapshot()`.
163+
2. Read live in-memory observations with `agent.getObservationReader()` or `agent.subscribeObservations()`.
164+
3. Configure `observability.persistence.backend` and query history with `createStoreBackedObservationReader(...)`.
165+
4. Map your own routes, auth, tenant checks, and response shaping in application code.
166+
- **Considerations**:
167+
- Prefer runtime reader for "what is happening now" and persisted reader for audit/history views.
168+
- Treat `metadata.__debug` as internal/debug-only data; do not expose it blindly to external consumers.
169+
- Keep HTTP, auth, rate limiting, and dashboard concerns outside SDK core.
170+
171+
```typescript
172+
const metrics = agent.getMetricsSnapshot();
173+
const runtimeReader = agent.getObservationReader();
174+
const persistedReader = createStoreBackedObservationReader(observationBackend);
175+
176+
const runtime = runtimeReader.listObservations({ limit: 20 });
177+
const persisted = await persistedReader.listObservations({ agentIds: [agent.agentId], limit: 50 });
178+
```
179+
180+
---
181+
182+
## 7. Combined: Approval + Collaboration + Scheduling
157183

158184
- **Scenario**: Code review bot, Planner splits tasks and assigns to Specialists, tool operations need approval, scheduled reminders ensure SLA.
159185
- **Implementation**:
@@ -184,12 +210,13 @@ const stats = await store.aggregateStats(agent.agentId);
184210

185211
- [Getting Started](../getting-started/quickstart.md)
186212
- [Events Guide](../guides/events.md)
213+
- [Observability Guide](../guides/observability.md)
187214
- [Multi-Agent Systems](../advanced/multi-agent.md)
188215
- [Database Guide](../guides/database.md)
189216

190217
---
191218

192-
## 7. CLI Agent Application
219+
## 8. CLI Agent Application
193220

194221
Build command-line AI assistants like Claude Code or Cursor.
195222

docs/en/guides/observability.md

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# Observability Guide
2+
3+
KODE exposes observability as SDK capabilities first, not as an application server.
4+
5+
That means the SDK gives you structured metrics, observations, persistence hooks, and OTEL bridging. Your application decides whether to expose them through HTTP, dashboards, alerting, or internal admin tools.
6+
7+
---
8+
9+
## What KODE Includes
10+
11+
- Runtime metrics via `agent.getMetricsSnapshot()`
12+
- Runtime observation reads via `agent.getObservationReader()`
13+
- Runtime observation streaming via `agent.subscribeObservations()`
14+
- Optional persisted observation queries via `observability.persistence`
15+
- Optional OTEL export via `observability.otel`
16+
17+
## What KODE Deliberately Does Not Include
18+
19+
- Built-in HTTP server lifecycle
20+
- Built-in auth, tenant isolation, or rate limiting
21+
- Built-in observability dashboard UI
22+
- Opinionated public API contracts for app delivery
23+
24+
Those concerns belong in your application layer.
25+
26+
---
27+
28+
## Runtime Metrics and Observations
29+
30+
Use runtime readers when you want to inspect the current agent process without waiting for external exports.
31+
32+
```typescript
33+
const metrics = agent.getMetricsSnapshot();
34+
const reader = agent.getObservationReader();
35+
36+
const latest = reader.listObservations({
37+
kinds: ['generation', 'tool'],
38+
limit: 20,
39+
});
40+
41+
for await (const envelope of agent.subscribeObservations({ runId: metrics.currentRunId })) {
42+
console.log(envelope.observation.kind, envelope.observation.name);
43+
}
44+
```
45+
46+
Typical runtime uses:
47+
48+
- show "live now" generation/tool activity in an admin panel
49+
- inspect approval waits, tool errors, and compression events
50+
- derive counters without polling raw event buses
51+
52+
---
53+
54+
## Persisted Observations
55+
56+
Use persisted readers when you need history, audit views, or process-restart durability.
57+
58+
```typescript
59+
import {
60+
Agent,
61+
JSONStoreObservationBackend,
62+
createStoreBackedObservationReader,
63+
} from '@shareai-lab/kode-sdk';
64+
65+
const observationBackend = new JSONStoreObservationBackend('./.kode-observability');
66+
67+
const agent = await Agent.create({
68+
templateId: 'assistant',
69+
observability: {
70+
persistence: {
71+
backend: observationBackend,
72+
},
73+
},
74+
}, deps);
75+
76+
const persistedReader = createStoreBackedObservationReader(observationBackend);
77+
const history = await persistedReader.listObservations({
78+
agentIds: [agent.agentId],
79+
kinds: ['agent_run', 'generation', 'tool'],
80+
limit: 50,
81+
});
82+
```
83+
84+
Use persisted storage for:
85+
86+
- audit timelines
87+
- run replay pages
88+
- offline analytics jobs
89+
- debugging after process restart
90+
91+
---
92+
93+
## OTEL Bridge
94+
95+
If your platform already standardizes on OpenTelemetry, enable the bridge and ship translated spans to your collector.
96+
97+
```typescript
98+
const agent = await Agent.create({
99+
templateId: 'assistant',
100+
observability: {
101+
otel: {
102+
enabled: true,
103+
serviceName: 'kode-agent',
104+
exporter: {
105+
protocol: 'http/json',
106+
endpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT!,
107+
},
108+
},
109+
},
110+
}, deps);
111+
```
112+
113+
Keep KODE's native observation model as your source of truth. OTEL is best treated as an interoperability/export path.
114+
115+
---
116+
117+
## Data Safety and Capture Boundaries
118+
119+
KODE supports configurable capture levels through `observability.capture`:
120+
121+
- `off`
122+
- `summary`
123+
- `full`
124+
- `redacted`
125+
126+
Prefer `summary` or `redacted` for production unless you have a clear compliance reason to store more detail.
127+
128+
Also note:
129+
130+
- provider-specific raw payloads are not part of the public observation schema
131+
- debug-only extensions may appear under `metadata.__debug`
132+
- `metadata.__debug` should be treated as internal/private and filtered before external exposure
133+
134+
This keeps the public observation model safer and more stable.
135+
136+
---
137+
138+
## Exposing Observability over HTTP
139+
140+
If you need HTTP endpoints, build them in your app on top of the SDK readers/backends.
141+
142+
Reference example:
143+
144+
- `examples/08-observability-http.ts`
145+
- run with `npm run example:observability-http`
146+
147+
That example demonstrates:
148+
149+
- a normal app-owned HTTP server
150+
- `POST /agents/demo/send` to drive an agent run
151+
- `GET /api/observability/.../metrics` for runtime metrics
152+
- `GET /api/observability/.../observations/runtime` for live observation reads
153+
- `GET /api/observability/.../observations/persisted` for persisted history
154+
155+
This boundary is intentional: the SDK provides observability primitives, while the app owns transport, auth, and presentation.
156+
157+
---
158+
159+
## Recommended Rollout
160+
161+
1. Start with runtime metrics and runtime observation readers.
162+
2. Add persisted observation storage for auditability.
163+
3. Add OTEL export only if your platform needs centralized telemetry.
164+
4. Add app-layer HTTP or UI only after the data model and filtering policy are clear.
165+
166+
This order keeps the SDK integration stable and avoids prematurely coupling KODE to one delivery surface.

docs/zh-CN/examples/playbooks.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,33 @@ const stats = await store.aggregateStats(agent.agentId);
153153

154154
---
155155

156-
## 6. 组合拳:审批 + 协作 + 调度
156+
## 6. 观测层读取与应用层 HTTP 包装
157+
158+
- **目标**:从 SDK 读取运行时/持久化 observation,并按你自己的应用边界选择是否通过 HTTP 暴露出去。
159+
- **示例**`examples/08-observability-http.ts`
160+
- **运行**`npm run example:observability-http`
161+
- **关键步骤**
162+
1. 通过 `agent.getMetricsSnapshot()` 读取当前指标快照。
163+
2. 通过 `agent.getObservationReader()``agent.subscribeObservations()` 读取运行时 observation。
164+
3.`observability.persistence.backend` 配置后端,并用 `createStoreBackedObservationReader(...)` 查询历史数据。
165+
4. 在应用代码中自行定义路由、鉴权、租户隔离和响应裁剪。
166+
- **注意事项**
167+
- 运行时 reader 更适合“现在发生了什么”,持久化 reader 更适合审计与历史视图。
168+
- `metadata.__debug` 只能视为内部调试数据,不应直接原样对外暴露。
169+
- HTTP、鉴权、限流、Dashboard 都应留在 SDK 外部。
170+
171+
```typescript
172+
const metrics = agent.getMetricsSnapshot();
173+
const runtimeReader = agent.getObservationReader();
174+
const persistedReader = createStoreBackedObservationReader(observationBackend);
175+
176+
const runtime = runtimeReader.listObservations({ limit: 20 });
177+
const persisted = await persistedReader.listObservations({ agentIds: [agent.agentId], limit: 50 });
178+
```
179+
180+
---
181+
182+
## 7. 组合拳:审批 + 协作 + 调度
157183

158184
- **场景**:代码审查机器人,Planner 负责拆分任务并分配到不同 Specialist,工具操作需审批,定时提醒确保 SLA。
159185
- **实现路径**
@@ -184,5 +210,6 @@ const stats = await store.aggregateStats(agent.agentId);
184210

185211
- [快速开始](../getting-started/quickstart.md)
186212
- [事件指南](../guides/events.md)
213+
- [可观测性指南](../guides/observability.md)
187214
- [多 Agent 系统](../advanced/multi-agent.md)
188215
- [数据库指南](../guides/database.md)

0 commit comments

Comments
 (0)