1- # Flight — Agent Observability Platform
1+ # Flight — Agent Verification & Trust Layer
22
33## What This Is
44
5- An agent observability platform that provides structured tracing, audit, and replay for AI agent systems. Flight supports multiple ingestion paths: TypeScript SDK (direct file I/O), Python SDK (HTTP client), HTTP collector, MCP stdio proxy, and Claude Code hooks. All paths write to the same JSONL format at ` ~/.flight/logs/ ` .
5+ An agent verification and trust layer that evaluates whether what AI agents do is correct, expected, and safe. Flight captures causally-linked traces of agent runs and applies four core capabilities:
6+
7+ 1 . ** Causal trace attribution** — every event is linked back to what caused it (tool results to tool calls, tool calls to LLM outputs, LLM calls to agent decisions)
8+ 2 . ** Inline behavioral assertions** — declarative YAML rules evaluated post-event (non-blocking)
9+ 3 . ** Comparative experiment harness** — YAML spec + ` flight experiment run ` across variants and repetitions
10+ 4 . ** Structured trust records** — unsigned JSON summaries of every run, content-hashed for integrity
11+
12+ All ingress paths emit Trace v2. Data lives in ` ~/.flight/traces/ ` and ` ~/.flight/trust/ ` .
613
714```
8- Agents (TS SDK, Python SDK, HTTP, MCP Proxy, Hooks)
15+ Agents (TS SDK, Python SDK, MCP Proxy, Claude Code Hooks)
16+ ↓
17+ TraceSession (causal attribution, event append)
18+ ↓
19+ ~/.flight/traces/<run_id>.trace.json
920 ↓
10- ~/.flight/logs/ (session JSONL + alerts )
21+ ~/.flight/trust/<run_id>.trust.json (built at run end )
1122```
1223
1324## Stack
@@ -18,92 +29,128 @@ Python SDK: stdlib only (no external deps), Python 3.9+
1829## Project Structure
1930
2031```
21- src/
22- cli.ts — Commander CLI (subcommands: status, proxy, serve, setup, sprites, session, annotate, log, claude, hook)
23- proxy.ts — stdio proxy: spawn upstream, bidirectional JSON-RPC forwarding
24- json-rpc.ts — streaming JSON-RPC parser (readline + JSON.parse per line)
25- logger.ts — session logger with async write queue and alert detection
26- sdk.ts — TypeScript SDK: createFlightClient() for programmatic logging (schema-first)
27- ingest.ts — HTTP collector server (flight serve)
28- query.ts — SQLite-backed query layer (FlightDB, indexing, aggregation, trends)
29- progressive-disclosure.ts — PD handler: phase logic, usage tracking, tool filtering
30- pd-schema.ts — pure schema compression utilities
31- file-lock.ts — advisory file locking (O_CREAT|O_EXCL)
32- retry.ts — automatic retry manager for transient MCP errors
33- hooks.ts — Claude Code hook installation/removal (SessionStart/End, UserPromptSubmit, PostToolUse)
34- init.ts — Claude/Claude Code config file management (wraps mcpServers)
35- setup.ts — interactive setup wizard (wraps servers + installs hooks)
36- shared.ts — shared constants (DEFAULT_LOG_DIR, C colors, McpServerEntry type)
37- summary.ts — session summary computation
38- stats.ts — usage statistics
39- lifecycle.ts — log compression and garbage collection
40- export.ts — CSV/JSONL export
41- replay.ts — tool call replay from logs
42- log-commands.ts — CLI subcommands for log inspection (list, tail, view, filter, inspect, audit, verbose)
43- index.ts — public API re-exports
32+ packages/flight-proxy/src/
33+ schema/
34+ trace-v2.ts — Trace v2 TS types (Trace, Event, CausalLink, AssertionResult, TrustRecord)
35+ trace-v2.schema.json — JSON Schema (canonical; used by Python SDK + tests)
36+ write.ts — TraceSession: append-event + finalize-trace writers
37+ read.ts — load + ajv-validate Trace v2 files
38+ causal.ts — CausalContext: cause-id derivation + parent span tracking
39+ proxy.ts — MCP stdio proxy emitting Trace v2 events
40+ json-rpc.ts — streaming JSON-RPC parser
41+ ingest.ts — HTTP collector accepting Trace v2 events
42+ sdk.ts — TS SDK (Trace v2 native)
43+ hooks.ts — Claude Code hooks emitting Trace v2 lifecycle events
44+ init.ts — Claude config wrapping
45+ assert/
46+ load.ts — YAML rule loader + validator
47+ rules.ts — built-in rule kinds (sequence, threshold, precondition, regex)
48+ evaluator.ts — post-event async evaluator; appends AssertionResult to trace
49+ cli.ts — `flight assert check|watch` handlers
50+ trust/
51+ record.ts — TrustRecord builder: aggregates events + assertion outcomes + anomalies
52+ cli.ts — `flight trust show|list` handlers
53+ experiment/
54+ spec.ts — Experiment YAML schema + loader
55+ runner.ts — run a spec across N variants × M repetitions
56+ compare.ts — pairwise + N-way trust-record comparison
57+ cli.ts — `flight experiment run|compare` handlers
58+ trace/
59+ cli.ts — `flight trace show|ls` (causal-tree pretty printer)
60+ cli.ts — top-level Commander wiring (delegates to subcommand cli.ts files)
61+ shared.ts — DEFAULT_TRACE_DIR, DEFAULT_TRUST_DIR, color constants
62+ index.ts — public API re-exports (createFlightClient + Trace v2 types)
4463
4564sdk/python/
4665 flight_sdk/
47- __init__.py — Package exports (FlightClient, LogEntry, ModelConfig)
48- client.py — Buffered HTTP client for flight serve
49- types.py — LogEntry, ModelConfig dataclasses
66+ __init__.py — Package exports
67+ client.py — Buffered HTTP client emitting Trace v2 events
68+ trace.py — Python dataclasses mirroring trace-v2.schema.json
69+ causal.py — Python-side cause-id helpers
5070 tests/
51- test_client.py — Integration tests (starts flight serve, posts events, verifies JSONL)
52- pyproject.toml — Package config (flight-sdk, Python 3.9+)
71+ test_client.py — Integration tests (starts flight serve, posts events, verifies v2)
72+ pyproject.toml — Package config (flight-sdk, Python 3.9+)
73+
74+ examples/verified-agent/ — End-to-end example
75+ agent.mjs — Runnable agent exercising direct vs probe strategies
76+ flight.assertions.yaml — Behavioral rules (sequence, threshold, regex)
77+ experiment.yaml — 2-strategy comparison spec
78+ expected-trust.json — Reference trust record shape
5379```
5480
5581## CLI Structure
5682
5783``` bash
58- # Top-level commands
59- flight serve [--port 4242] [--log-dir] # HTTP collector
60- flight proxy --cmd < server> -- < args> # MCP stdio proxy
61- flight status # One-line summary of active sessions
62- flight setup # Interactive configuration wizard
63- flight session start| end # Explicit session lifecycle
64- flight annotate < target-id> --label < l> # Attach annotation to a run/session/turn/tool_call
65-
66- # Log commands
67- flight log list| tail| view| filter| inspect| alerts| summary| tools| audit| verbose
68- flight log stats| export| replay| replay-call| gc| prune| query
69-
70- # Claude Code integration
71- flight claude setup # Interactive wizard
72- flight claude hooks install| remove # Hook management
73- flight claude init desktop| code # MCP server wrapping
74-
75- # Internal (used by hooks)
84+ # Ingress
85+ flight proxy --cmd < server> -- < args> # MCP stdio proxy → Trace v2
86+ flight serve [--port 4242] # HTTP collector → Trace v2 (Python SDK ingest)
87+
88+ # Assertions
89+ flight assert check < trace> # Run YAML assertions against a recorded trace
90+ flight assert watch # Live-evaluate assertions on incoming events
91+
92+ # Trust records
93+ flight trust show < run_id> # Print the trust record for a run
94+ flight trust list # List trust records
95+
96+ # Experiments
97+ flight experiment run < spec.yaml> # Run an experiment spec; emit comparison record
98+ flight experiment compare < run_a> < run_b> # Pairwise comparison of two trust records
99+
100+ # Trace inspection
101+ flight trace show < trace> # Inspect a Trace v2 file (causal tree)
102+ flight trace ls # List traces under ~/.flight/traces/
103+
104+ # Claude Code integration (internal + management)
76105flight hook session-start| session-end| user-prompt-submit| post-tool-use
106+ flight claude install| uninstall # Hook + slash command management
77107```
78108
79- ## Log Schema
109+ ## Trace v2 Schema
110+
111+ The canonical schema lives at ` packages/flight-proxy/src/schema/trace-v2.schema.json ` . All ingress paths must emit documents that validate against it.
112+
113+ ** Top-level ` Trace ` document:**
114+
115+ | Field | Required | Description |
116+ | -------| :--------:| -------------|
117+ | ` schema_version ` | yes | ` "2.0" ` |
118+ | ` trace_id ` | yes | ULID — unique per trace file |
119+ | ` run_id ` | yes | ULID — groups related traces (e.g., experiment repetitions) |
120+ | ` started_at ` | yes | ISO 8601 |
121+ | ` ended_at ` | — | set by ` finalize() ` |
122+ | ` input_context ` | yes | prompt or task context |
123+ | ` events ` | yes | ordered ` Event[] ` |
124+ | ` assertions ` | yes | ` AssertionResult[] ` appended post-evaluation |
125+ | ` anomalies ` | yes | ` Anomaly[] ` (loop, error_recovery, assertion_fail, schema_violation) |
126+ | ` metadata ` | yes | model, agent_id, provider, token_counts, cost_usd |
80127
81- Required fields: ` session_id ` , ` timestamp ` , ` event_type `
82- Event types: ` tool_call ` , ` tool_result ` , ` agent_action ` , ` evaluation ` , ` lifecycle `
83- Optional fields: ` run_id ` , ` agent_id ` , ` model_config ` , ` chosen_action ` , ` execution_outcome ` , ` evaluator_score ` , ` labels ` , ` metadata ` , ` call_id ` , ` direction ` , ` method ` , ` tool_name ` , ` payload ` , ` error ` , ` latency_ms ` , ` error_recovery_anomaly ` , ` pd_active ` , ` schema_tokens_saved `
128+ ** ` Event ` fields:** ` event_id ` , ` span_id ` , ` parent_span_id? ` , ` kind ` , ` timestamp ` , ` causal_link? ` , ` payload ` , ` latency_ms? ` , ` error? `
129+
130+ ** Event kinds:** ` lifecycle.run_start ` , ` lifecycle.run_end ` , ` llm.call ` , ` llm.result ` , ` tool_call ` , ` tool_result ` , ` agent.decision ` , ` assertion.evaluated `
131+
132+ ** ` CausalLink ` :** ` caused_by_event_id ` , ` reason ` (` tool_result_consumed | llm_output_emitted | user_input | scheduled | explicit ` ), ` notes? `
133+
134+ ** Data locations:**
135+ - ` ~/.flight/traces/<run_id>.trace.json ` — Trace v2 files
136+ - ` ~/.flight/trust/<run_id>.trust.json ` — trust records
84137
85138## Claude Code Integration
86139
87140### Hooks (always active)
88- Installed in ` ~/.claude/settings.json ` by ` flight claude setup ` :
89- - ** SessionStart** → ` flight hook session-start ` — creates active session marker
90- - ** SessionEnd** → ` flight hook session-end ` — outputs summary, triggers compression/GC
91- - ** UserPromptSubmit** → ` flight hook user-prompt-submit ` — logs user prompt submissions
92- - ** PostToolUse** → ` flight hook post-tool-use ` — logs tool calls to ` <session>_tools.jsonl `
141+ Installed in ` ~/.claude/settings.json ` by ` flight claude install ` :
142+ - ** SessionStart** → ` flight hook session-start ` — opens a TraceSession with a ` lifecycle.run_start ` event
143+ - ** SessionEnd** → ` flight hook session-end ` — finalizes trace, builds trust record
144+ - ** UserPromptSubmit** → ` flight hook user-prompt-submit ` — records user input as ` user_input ` event
145+ - ** PostToolUse** → ` flight hook post-tool-use ` — records ` tool_call ` + ` tool_result ` events
93146
94147### MCP Proxy Wrapping (optional)
95- ` flight claude init code --apply ` rewrites ` ~/.claude.json ` mcpServers .
148+ ` flight proxy --cmd <server> -- <args> ` wraps any MCP server transparently .
96149
97150### Slash Commands
98- Installed in ` ~/.claude/commands/ ` by ` flight claude setup ` :
99- - ** ` /flight ` ** — quick session audit (runs ` flight log audit ` )
100- - ** ` /flight-log ` ** — comprehensive view (runs ` flight log verbose ` )
101-
102- ### Data Locations
103- - ` ~/.flight/logs/session_*.jsonl ` — session recordings
104- - ` ~/.flight/logs/<session>_tools.jsonl ` — tool call metadata from hooks
105- - ` ~/.flight/alerts.jsonl ` — hallucination hints, loops, errors
106- - ` ~/.flight/usage/ ` — token usage statistics
151+ Installed in ` ~/.claude/commands/ ` by ` flight claude install ` :
152+ - ** ` /flight ` ** — ` flight trace show ` for the current session
153+ - ** ` /flight-log ` ** — ` flight trust show ` for the current run
107154
108155## Commands
109156
@@ -119,20 +166,18 @@ Python SDK tests: `cd sdk/python && python3 -m pytest tests/ -v`
119166
120167## Key Patterns
121168
122- - ** Handler result objects** — ` PDResponseResult ` carries rewritten responses, log metadata, and status messages in one return value.
123- - ** Async write queue** — Logger batches writes with a flush timer; ` closeSync() ` drains synchronously for signal handlers.
124- - ** SDK uses logEntry** — ` createFlightClient() ` constructs ` LogEntry ` objects directly via ` logger.logEntry() ` (schema-first, no fake JSON-RPC).
125- - ** HTTP collector** — ` startCollector() ` uses Node built-in ` http.createServer ` , validates entries, batches writes per session.
126- - ** Python SDK buffering** — entries buffer in memory, flush every 1s or 100 entries via ` urllib.request ` POST to ` /ingest ` .
127- - ** JSON-RPC streaming** — ` parseJsonRpcStream ` is a newline-delimited JSON parser on Node readable streams.
128- - ** Progressive disclosure** — Phase 1 (observation), Phase 2 (schema compression), Phase 3 (compression + filtering).
129- - ** SQLite query layer** — ` FlightDB ` indexes JSONL files into SQLite for cross-session queries, aggregation by tool, and daily trends.
130- - ** Alert detection** — Error-recovery anomalies (different tool called after error), loop detection (same tool 5x in 60s).
169+ - ** TraceSession** — unified API wrapping the writer + ` CausalContext ` ; call ` recordEvent() ` , get back a fully linked ` Event ` with ` causal_link ` filled automatically.
170+ - ** Causal attribution rules** — ` tool_result ` links to matching ` tool_call ` by ` call_id ` ; ` tool_call ` links to most recent ` llm.result ` ; ` llm.call ` links to most recent ` agent.decision ` or ` lifecycle.run_start ` .
171+ - ** Async post-event assertion evaluation** — evaluator runs on a microtask queue; results are appended to ` trace.assertions ` ; failures also append an ` Anomaly ` of kind ` assertion_fail ` .
172+ - ** Content-hashed trust records** — ` buildTrustRecord(trace) ` produces a ` TrustRecord ` with ` content_hash = sha256(canonical-JSON(trace)) ` ; unsigned in v2 (attestation deferred).
173+ - ** YAML-driven experiments** — ` experiment.yaml ` specifies variants + repetitions; runner spawns child processes, collects trust records, ` compareResults() ` produces per-metric min/max/mean/stddev.
174+ - ** JSON-RPC streaming** — ` parseJsonRpcStream ` is a newline-delimited JSON parser on Node readable streams (unchanged from v1).
175+ - ** HTTP collector** — ` startCollector() ` uses Node built-in ` http.createServer ` ; accepts ` POST /ingest ` batches of v2 events; ` POST /finalize ` closes the run.
131176
132177## Testing
133178
134- - Tests live in ` test/ ` alongside source
135- - Mock MCP server pattern: spawn a test server, connect via proxy, assert on JSON-RPC messages
179+ - Tests live in ` packages/flight-proxy/ test/` alongside source
180+ - Mock MCP server pattern: spawn a test server, connect via proxy, assert on Trace v2 output
136181- ` test/simulate/ ` contains validation harnesses for Claude API compatibility
137182- ` sdk/python/tests/ ` — Python integration tests (require built CLI for ` flight serve ` )
138183- Run ` npm run test ` — all tests should pass before any PR
0 commit comments