Skip to content

Commit 59ee5bf

Browse files
committed
docs: correct CLAUDE.md and README.md against actual CLI
- Fix project structure: collector.ts → ingest.ts, remove experiments.ts - Fix CLI subcommand list (was wrong: run/show/logs/experiment never existed) - Replace 'flight logs' (plural) with 'flight log' (actual command name) - Add replay-call to log subcommand list alongside replay - Add user-prompt-submit to hook registrations (was missing everywhere) - Remove stale 'renamed to flight logs in 1.5.0' note (code never changed) - Remove non-existent flight run/show/experiment commands from README - Remove Experiment Registry section (feature not implemented) - Fix slash command list: only /flight and /flight-log are installed
1 parent 1464018 commit 59ee5bf

2 files changed

Lines changed: 37 additions & 107 deletions

File tree

CLAUDE.md

Lines changed: 13 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -19,18 +19,18 @@ Python SDK: stdlib only (no external deps), Python 3.9+
1919

2020
```
2121
src/
22-
cli.ts — Commander CLI (subcommands: serve, proxy, run, show, logs, claude, hook)
22+
cli.ts — Commander CLI (subcommands: status, proxy, serve, setup, sprites, session, annotate, log, claude, hook)
2323
proxy.ts — stdio proxy: spawn upstream, bidirectional JSON-RPC forwarding
2424
json-rpc.ts — streaming JSON-RPC parser (readline + JSON.parse per line)
2525
logger.ts — session logger with async write queue and alert detection
2626
sdk.ts — TypeScript SDK: createFlightClient() for programmatic logging (schema-first)
27-
collector.ts — HTTP collector server (flight serve)
27+
ingest.ts — HTTP collector server (flight serve)
2828
query.ts — SQLite-backed query layer (FlightDB, indexing, aggregation, trends)
2929
progressive-disclosure.ts — PD handler: phase logic, usage tracking, tool filtering
3030
pd-schema.ts — pure schema compression utilities
3131
file-lock.ts — advisory file locking (O_CREAT|O_EXCL)
3232
retry.ts — automatic retry manager for transient MCP errors
33-
hooks.ts — Claude Code hook installation/removal (SessionStart/End, PostToolUse)
33+
hooks.ts — Claude Code hook installation/removal (SessionStart/End, UserPromptSubmit, PostToolUse)
3434
init.ts — Claude/Claude Code config file management (wraps mcpServers)
3535
setup.ts — interactive setup wizard (wraps servers + installs hooks)
3636
shared.ts — shared constants (DEFAULT_LOG_DIR, C colors, McpServerEntry type)
@@ -40,7 +40,6 @@ src/
4040
export.ts — CSV/JSONL export
4141
replay.ts — tool call replay from logs
4242
log-commands.ts — CLI subcommands for log inspection (list, tail, view, filter, inspect, audit, verbose)
43-
experiments.ts — Experiment registry: per-file JSON store at ~/.flight/experiments/<name>.json
4443
index.ts — public API re-exports
4544
4645
sdk/python/
@@ -59,32 +58,24 @@ sdk/python/
5958
# Top-level commands
6059
flight serve [--port 4242] [--log-dir] # HTTP collector
6160
flight proxy --cmd <server> -- <args> # MCP stdio proxy
62-
flight run --agent <agent> [--experiment <id>] [--model <name>] # Start a run
63-
flight show <session-id> # View a recorded session
64-
flight logs # List sessions (same as flight logs list)
61+
flight status # One-line summary of active sessions
62+
flight setup # Interactive configuration wizard
63+
flight session start|end # Explicit session lifecycle
64+
flight annotate <target-id> --label <l> # Attach annotation to a run/session/turn/tool_call
6565

6666
# Log commands
67-
flight logs list|tail|view|filter|inspect|alerts|summary|tools|audit|verbose
68-
flight logs stats|export|replay|gc|prune|query
69-
70-
# Experiment registry
71-
flight experiment new <name> # Register/update experiment
72-
flight experiment list # Table with run counts
73-
flight experiment show <name> # Metadata + runs
74-
flight experiment diff <a> <b> # Cross-experiment comparison
75-
flight experiment export <name> # Research JSONL to stdout
67+
flight log list|tail|view|filter|inspect|alerts|summary|tools|audit|verbose
68+
flight log stats|export|replay|replay-call|gc|prune|query
7669

7770
# Claude Code integration
7871
flight claude setup # Interactive wizard
7972
flight claude hooks install|remove # Hook management
8073
flight claude init desktop|code # MCP server wrapping
8174

8275
# Internal (used by hooks)
83-
flight hook session-start|session-end|post-tool-use
76+
flight hook session-start|session-end|user-prompt-submit|post-tool-use
8477
```
8578

86-
Note: `flight log` (singular) was renamed to `flight logs` (plural) in 1.5.0 — update any scripts accordingly. There is no deprecation shim.
87-
8879
## Log Schema
8980

9081
Required fields: `session_id`, `timestamp`, `event_type`
@@ -97,21 +88,18 @@ Optional fields: `run_id`, `agent_id`, `model_config`, `chosen_action`, `executi
9788
Installed in `~/.claude/settings.json` by `flight claude setup`:
9889
- **SessionStart**`flight hook session-start` — creates active session marker
9990
- **SessionEnd**`flight hook session-end` — outputs summary, triggers compression/GC
91+
- **UserPromptSubmit**`flight hook user-prompt-submit` — logs user prompt submissions
10092
- **PostToolUse**`flight hook post-tool-use` — logs tool calls to `<session>_tools.jsonl`
10193

10294
### MCP Proxy Wrapping (optional)
10395
`flight claude init code --apply` rewrites `~/.claude.json` mcpServers.
10496

10597
### Slash Commands
10698
Installed in `~/.claude/commands/` by `flight claude setup`:
107-
- **`/flight`** — quick session audit (runs `flight logs audit`)
108-
- **`/flight-log`** — comprehensive view (runs `flight logs verbose`)
109-
- **`/flight-review`** — annotates a session for retries, errors, tool overuse, and good decisions (runs `flight show` + `flight logs verbose`)
110-
- **`/flight-compare`** — diffs two experiments with a 3-bullet summary: winner, biggest delta, next test (runs `flight experiment diff`)
111-
- **`/flight-annotate`** — labels each turn and emits `flight annotate` shell commands to persist labels (runs `flight logs verbose`)
99+
- **`/flight`** — quick session audit (runs `flight log audit`)
100+
- **`/flight-log`** — comprehensive view (runs `flight log verbose`)
112101

113102
### Data Locations
114-
- `~/.flight/experiments/<name>.json` — experiment registry (one JSON file per experiment)
115103
- `~/.flight/logs/session_*.jsonl` — session recordings
116104
- `~/.flight/logs/<session>_tools.jsonl` — tool call metadata from hooks
117105
- `~/.flight/alerts.jsonl` — hallucination hints, loops, errors
@@ -140,7 +128,6 @@ Python SDK tests: `cd sdk/python && python3 -m pytest tests/ -v`
140128
- **Progressive disclosure** — Phase 1 (observation), Phase 2 (schema compression), Phase 3 (compression + filtering).
141129
- **SQLite query layer**`FlightDB` indexes JSONL files into SQLite for cross-session queries, aggregation by tool, and daily trends.
142130
- **Alert detection** — Error-recovery anomalies (different tool called after error), loop detection (same tool 5x in 60s).
143-
- **Experiment registry**`src/experiments.ts` stores one JSON file per experiment at `~/.flight/experiments/<name>.json`. Race-safe creation via O_EXCL (`flag: "wx"`); `createOrUpdateExperiment` merges patches (arrays replace). `flight run --experiment` auto-registers and prints a hint on first use.
144131

145132
## Testing
146133

README.md

Lines changed: 24 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ flight claude init code --apply # Wrap MCP servers for full traffic record
125125

126126
# Start a Claude Code session — Flight records automatically
127127
# Then inspect what happened:
128-
flight logs tail
128+
flight log tail
129129
```
130130

131131
**Slash commands** (installed by `flight claude setup`):
@@ -193,48 +193,36 @@ flight serve [--port 4242] [--log-dir ~/.flight/logs]
193193
flight proxy --cmd <server> -- <args>
194194
flight proxy --cmd <server> --pd # With progressive disclosure
195195

196-
# Happy-path commands
197-
flight run --agent <agent> [--experiment <id>] [--model <name>] # Start a run
198-
flight show <session-id> # View a recorded session
199-
flight logs # List all sessions (same as flight logs list)
200-
201196
# Session lifecycle + annotation
202197
flight session start --agent <agent> [--run <run-id>]
203198
flight session end [--session <id>] [--status completed|failed]
204199
flight annotate <target-id> --type run|session|turn|tool_call --label <label>
205200

206201
# Log inspection and analysis
207-
flight logs list # List all sessions
208-
flight logs tail [--session <id>] # Live stream a session
209-
flight logs view <session> # Full timeline with summary
210-
flight logs filter --tool <name> # Filter by tool name
211-
flight logs filter --errors # Show only failed calls
212-
flight logs filter --anomalies # Show error-recovery anomalies
213-
flight logs inspect <call-id> # Full request/response payload
214-
flight logs alerts # Anomaly/loop/error alerts
215-
flight logs summary [--session <id>] # Session summary statistics
216-
flight logs tools # Tool call frequency breakdown
217-
flight logs compare --run-id <id> # Compare sessions/models within a run
218-
flight logs stats [session] # Usage statistics across sessions
219-
flight logs export [session] --format research|raw|csv|jsonl
220-
flight logs replay <call-id> --cmd <server> -- <args>
221-
flight logs gc # Compress old sessions, collect garbage
222-
flight logs prune --before <date> # Delete sessions before a date
223-
flight logs prune --keep <n> # Keep only N most recent sessions
202+
flight log list # List all sessions
203+
flight log tail [--session <id>] # Live stream a session
204+
flight log view <session> # Full timeline with summary
205+
flight log filter --tool <name> # Filter by tool name
206+
flight log filter --errors # Show only failed calls
207+
flight log filter --anomalies # Show error-recovery anomalies
208+
flight log inspect <call-id> # Full request/response payload
209+
flight log alerts # Anomaly/loop/error alerts
210+
flight log summary [--session <id>] # Session summary statistics
211+
flight log tools # Tool call frequency breakdown
212+
flight log compare --run-id <id> # Compare sessions/models within a run
213+
flight log stats [session] # Usage statistics across sessions
214+
flight log export [session] --format research|raw|csv|jsonl
215+
flight log replay-call <call-id> --cmd <server> -- <args>
216+
flight log gc # Compress old sessions, collect garbage
217+
flight log prune --before <date> # Delete sessions before a date
218+
flight log prune --keep <n> # Keep only N most recent sessions
224219

225220
# Cross-session queries (SQLite-backed)
226-
flight logs query --aggregate # Error rates + latency percentiles by tool
227-
flight logs query --trend # Daily trend (totals, errors, anomalies)
228-
flight logs query --tool <name> # Filter by tool name
229-
flight logs query --anomalies # Show only error-recovery anomalies
230-
flight logs query --after <date> # Filter by time range
231-
232-
# Experiment registry
233-
flight experiment new <name> [--description <desc>] [--tags <csv>] [--baseline <run-id>] [--model <name>] [--notes <text>]
234-
flight experiment list # Table of all experiments + run counts
235-
flight experiment show <name> # Metadata + recent runs for an experiment
236-
flight experiment diff <name1> <name2> # Compare runs across two experiments
237-
flight experiment export <name> # Stream all runs as research JSONL to stdout
221+
flight log query --aggregate # Error rates + latency percentiles by tool
222+
flight log query --trend # Daily trend (totals, errors, anomalies)
223+
flight log query --tool <name> # Filter by tool name
224+
flight log query --anomalies # Show only error-recovery anomalies
225+
flight log query --after <date> # Filter by time range
238226

239227
# Claude Code integration
240228
flight claude setup # Interactive setup wizard
@@ -277,51 +265,6 @@ flight hook session-start|session-end|user-prompt-submit|post-tool-use
277265

278266
---
279267

280-
## Experiment Registry
281-
282-
The experiment registry provides a lightweight, file-per-experiment store at `~/.flight/experiments/<name>.json`. It lets you group and compare runs across multiple sessions.
283-
284-
### Schema
285-
286-
```json
287-
{
288-
"name": "bench-a",
289-
"created_at": "2026-04-17T12:00:00.000Z",
290-
"description": "Baseline throughput test",
291-
"tags": ["fast", "cheap"],
292-
"baseline_run_id": "run_1713355200_abcd1234",
293-
"model_config": { "model": "claude-sonnet-4-20250514" },
294-
"notes": "Compare against bench-b with streaming enabled"
295-
}
296-
```
297-
298-
### Workflow
299-
300-
```bash
301-
# Register an experiment with metadata
302-
flight experiment new bench-a --description "Baseline" --tags fast,cheap --model claude-sonnet-4
303-
304-
# Start runs that belong to this experiment
305-
flight run --agent my-agent --experiment bench-a
306-
flight run --agent my-agent --experiment bench-b
307-
308-
# List all experiments with run counts
309-
flight experiment list
310-
311-
# Inspect a specific experiment and its runs
312-
flight experiment show bench-a
313-
314-
# Compare two experiments head-to-head
315-
flight experiment diff bench-a bench-b
316-
317-
# Export all runs as research JSONL (for offline analysis)
318-
flight experiment export bench-a | jq .
319-
```
320-
321-
Unknown experiments are **auto-registered** on first `flight run --experiment <name>`, with a one-line stderr hint pointing to `flight experiment new` for adding metadata. The registry files are plain JSON and fully human-editable.
322-
323-
---
324-
325268
## Performance
326269

327270
- **<5ms** added latency per tool call (streaming NDJSON, fire-and-forget log writes)
@@ -337,7 +280,7 @@ Unknown experiments are **auto-registered** on first `flight run --experiment <n
337280
- **One file per session**, append-only
338281
- **Auto-compression:** sessions older than 24h are gzip-compressed (`.jsonl.gz`)
339282
- **Garbage collection:** configurable max sessions (100) and max size (2 GB)
340-
- **Pruning:** `flight logs prune --before <date>` or `--keep <n>`
283+
- **Pruning:** `flight log prune --before <date>` or `--keep <n>`
341284

342285
---
343286

0 commit comments

Comments
 (0)