docs: correct CLAUDE.md and README.md against actual CLI

lewisnsmith · lewisnsmith · commit 59ee5bf1b8fe · 2026-04-25T15:27:31.000-07:00
- Fix project structure: collector.ts → ingest.ts, remove experiments.ts
- Fix CLI subcommand list (was wrong: run/show/logs/experiment never existed)
- Replace 'flight logs' (plural) with 'flight log' (actual command name)
- Add replay-call to log subcommand list alongside replay
- Add user-prompt-submit to hook registrations (was missing everywhere)
- Remove stale 'renamed to flight logs in 1.5.0' note (code never changed)
- Remove non-existent flight run/show/experiment commands from README
- Remove Experiment Registry section (feature not implemented)
- Fix slash command list: only /flight and /flight-log are installed
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -19,18 +19,18 @@ Python SDK: stdlib only (no external deps), Python 3.9+
 
 ```
 src/
-  cli.ts               — Commander CLI (subcommands: serve, proxy, run, show, logs, claude, hook)
+  cli.ts               — Commander CLI (subcommands: status, proxy, serve, setup, sprites, session, annotate, log, claude, hook)
   proxy.ts             — stdio proxy: spawn upstream, bidirectional JSON-RPC forwarding
   json-rpc.ts          — streaming JSON-RPC parser (readline + JSON.parse per line)
   logger.ts            — session logger with async write queue and alert detection
   sdk.ts               — TypeScript SDK: createFlightClient() for programmatic logging (schema-first)
-  collector.ts         — HTTP collector server (flight serve)
+  ingest.ts            — HTTP collector server (flight serve)
   query.ts             — SQLite-backed query layer (FlightDB, indexing, aggregation, trends)
   progressive-disclosure.ts — PD handler: phase logic, usage tracking, tool filtering
   pd-schema.ts         — pure schema compression utilities
   file-lock.ts         — advisory file locking (O_CREAT|O_EXCL)
   retry.ts             — automatic retry manager for transient MCP errors
-  hooks.ts             — Claude Code hook installation/removal (SessionStart/End, PostToolUse)
+  hooks.ts             — Claude Code hook installation/removal (SessionStart/End, UserPromptSubmit, PostToolUse)
   init.ts              — Claude/Claude Code config file management (wraps mcpServers)
   setup.ts             — interactive setup wizard (wraps servers + installs hooks)
   shared.ts            — shared constants (DEFAULT_LOG_DIR, C colors, McpServerEntry type)
@@ -40,7 +40,6 @@ src/
   export.ts            — CSV/JSONL export
   replay.ts            — tool call replay from logs
   log-commands.ts      — CLI subcommands for log inspection (list, tail, view, filter, inspect, audit, verbose)
-  experiments.ts       — Experiment registry: per-file JSON store at ~/.flight/experiments/<name>.json
   index.ts             — public API re-exports
 
 sdk/python/
@@ -59,32 +58,24 @@ sdk/python/
 # Top-level commands
 flight serve [--port 4242] [--log-dir]   # HTTP collector
 flight proxy --cmd <server> -- <args>     # MCP stdio proxy
-flight run --agent <agent> [--experiment <id>] [--model <name>]  # Start a run
-flight show <session-id>                  # View a recorded session
-flight logs                               # List sessions (same as flight logs list)
+flight status                             # One-line summary of active sessions
+flight setup                              # Interactive configuration wizard
+flight session start|end                  # Explicit session lifecycle
+flight annotate <target-id> --label <l>   # Attach annotation to a run/session/turn/tool_call
 
 # Log commands
-flight logs list|tail|view|filter|inspect|alerts|summary|tools|audit|verbose
-flight logs stats|export|replay|gc|prune|query
-
-# Experiment registry
-flight experiment new <name>         # Register/update experiment
-flight experiment list               # Table with run counts
-flight experiment show <name>        # Metadata + runs
-flight experiment diff <a> <b>       # Cross-experiment comparison
-flight experiment export <name>      # Research JSONL to stdout
+flight log list|tail|view|filter|inspect|alerts|summary|tools|audit|verbose
+flight log stats|export|replay|replay-call|gc|prune|query
 
 # Claude Code integration
 flight claude setup                       # Interactive wizard
 flight claude hooks install|remove        # Hook management
 flight claude init desktop|code           # MCP server wrapping
 
 # Internal (used by hooks)
-flight hook session-start|session-end|post-tool-use
+flight hook session-start|session-end|user-prompt-submit|post-tool-use
 ```
 
-Note: `flight log` (singular) was renamed to `flight logs` (plural) in 1.5.0 — update any scripts accordingly. There is no deprecation shim.
-
 ## Log Schema
 
 Required fields: `session_id`, `timestamp`, `event_type`
@@ -97,21 +88,18 @@ Optional fields: `run_id`, `agent_id`, `model_config`, `chosen_action`, `executi
 Installed in `~/.claude/settings.json` by `flight claude setup`:
 - **SessionStart** → `flight hook session-start` — creates active session marker
 - **SessionEnd** → `flight hook session-end` — outputs summary, triggers compression/GC
+- **UserPromptSubmit** → `flight hook user-prompt-submit` — logs user prompt submissions
 - **PostToolUse** → `flight hook post-tool-use` — logs tool calls to `<session>_tools.jsonl`
 
 ### MCP Proxy Wrapping (optional)
 `flight claude init code --apply` rewrites `~/.claude.json` mcpServers.
 
 ### Slash Commands
 Installed in `~/.claude/commands/` by `flight claude setup`:
-- **`/flight`** — quick session audit (runs `flight logs audit`)
-- **`/flight-log`** — comprehensive view (runs `flight logs verbose`)
-- **`/flight-review`** — annotates a session for retries, errors, tool overuse, and good decisions (runs `flight show` + `flight logs verbose`)
-- **`/flight-compare`** — diffs two experiments with a 3-bullet summary: winner, biggest delta, next test (runs `flight experiment diff`)
-- **`/flight-annotate`** — labels each turn and emits `flight annotate` shell commands to persist labels (runs `flight logs verbose`)
+- **`/flight`** — quick session audit (runs `flight log audit`)
+- **`/flight-log`** — comprehensive view (runs `flight log verbose`)
 
 ### Data Locations
-- `~/.flight/experiments/<name>.json` — experiment registry (one JSON file per experiment)
 - `~/.flight/logs/session_*.jsonl` — session recordings
 - `~/.flight/logs/<session>_tools.jsonl` — tool call metadata from hooks
 - `~/.flight/alerts.jsonl` — hallucination hints, loops, errors
@@ -140,7 +128,6 @@ Python SDK tests: `cd sdk/python && python3 -m pytest tests/ -v`
 - **Progressive disclosure** — Phase 1 (observation), Phase 2 (schema compression), Phase 3 (compression + filtering).
 - **SQLite query layer** — `FlightDB` indexes JSONL files into SQLite for cross-session queries, aggregation by tool, and daily trends.
 - **Alert detection** — Error-recovery anomalies (different tool called after error), loop detection (same tool 5x in 60s).
-- **Experiment registry** — `src/experiments.ts` stores one JSON file per experiment at `~/.flight/experiments/<name>.json`. Race-safe creation via O_EXCL (`flag: "wx"`); `createOrUpdateExperiment` merges patches (arrays replace). `flight run --experiment` auto-registers and prints a hint on first use.
 
 ## Testing
 
diff --git a/README.md b/README.md
@@ -125,7 +125,7 @@ flight claude init code --apply       # Wrap MCP servers for full traffic record
 
 # Start a Claude Code session — Flight records automatically
 # Then inspect what happened:
-flight logs tail
+flight log tail
 ```
 
 **Slash commands** (installed by `flight claude setup`):
@@ -193,48 +193,36 @@ flight serve [--port 4242] [--log-dir ~/.flight/logs]
 flight proxy --cmd <server> -- <args>
 flight proxy --cmd <server> --pd           # With progressive disclosure
 
-# Happy-path commands
-flight run --agent <agent> [--experiment <id>] [--model <name>]  # Start a run
-flight show <session-id>            # View a recorded session
-flight logs                         # List all sessions (same as flight logs list)
-
 # Session lifecycle + annotation
 flight session start --agent <agent> [--run <run-id>]
 flight session end [--session <id>] [--status completed|failed]
 flight annotate <target-id> --type run|session|turn|tool_call --label <label>
 
 # Log inspection and analysis
-flight logs list                     # List all sessions
-flight logs tail [--session <id>]    # Live stream a session
-flight logs view <session>           # Full timeline with summary
-flight logs filter --tool <name>     # Filter by tool name
-flight logs filter --errors          # Show only failed calls
-flight logs filter --anomalies       # Show error-recovery anomalies
-flight logs inspect <call-id>        # Full request/response payload
-flight logs alerts                   # Anomaly/loop/error alerts
-flight logs summary [--session <id>] # Session summary statistics
-flight logs tools                    # Tool call frequency breakdown
-flight logs compare --run-id <id>    # Compare sessions/models within a run
-flight logs stats [session]          # Usage statistics across sessions
-flight logs export [session] --format research|raw|csv|jsonl
-flight logs replay <call-id> --cmd <server> -- <args>
-flight logs gc                       # Compress old sessions, collect garbage
-flight logs prune --before <date>    # Delete sessions before a date
-flight logs prune --keep <n>         # Keep only N most recent sessions
+flight log list                     # List all sessions
+flight log tail [--session <id>]    # Live stream a session
+flight log view <session>           # Full timeline with summary
+flight log filter --tool <name>     # Filter by tool name
+flight log filter --errors          # Show only failed calls
+flight log filter --anomalies       # Show error-recovery anomalies
+flight log inspect <call-id>        # Full request/response payload
+flight log alerts                   # Anomaly/loop/error alerts
+flight log summary [--session <id>] # Session summary statistics
+flight log tools                    # Tool call frequency breakdown
+flight log compare --run-id <id>    # Compare sessions/models within a run
+flight log stats [session]          # Usage statistics across sessions
+flight log export [session] --format research|raw|csv|jsonl
+flight log replay-call <call-id> --cmd <server> -- <args>
+flight log gc                       # Compress old sessions, collect garbage
+flight log prune --before <date>    # Delete sessions before a date
+flight log prune --keep <n>         # Keep only N most recent sessions
 
 # Cross-session queries (SQLite-backed)
-flight logs query --aggregate        # Error rates + latency percentiles by tool
-flight logs query --trend            # Daily trend (totals, errors, anomalies)
-flight logs query --tool <name>      # Filter by tool name
-flight logs query --anomalies        # Show only error-recovery anomalies
-flight logs query --after <date>     # Filter by time range
-
-# Experiment registry
-flight experiment new <name> [--description <desc>] [--tags <csv>] [--baseline <run-id>] [--model <name>] [--notes <text>]
-flight experiment list               # Table of all experiments + run counts
-flight experiment show <name>        # Metadata + recent runs for an experiment
-flight experiment diff <name1> <name2>  # Compare runs across two experiments
-flight experiment export <name>      # Stream all runs as research JSONL to stdout
+flight log query --aggregate        # Error rates + latency percentiles by tool
+flight log query --trend            # Daily trend (totals, errors, anomalies)
+flight log query --tool <name>      # Filter by tool name
+flight log query --anomalies        # Show only error-recovery anomalies
+flight log query --after <date>     # Filter by time range
 
 # Claude Code integration
 flight claude setup                 # Interactive setup wizard
@@ -277,51 +265,6 @@ flight hook session-start|session-end|user-prompt-submit|post-tool-use
 
 ---
 
-## Experiment Registry
-
-The experiment registry provides a lightweight, file-per-experiment store at `~/.flight/experiments/<name>.json`. It lets you group and compare runs across multiple sessions.
-
-### Schema
-
-```json
-{
-  "name": "bench-a",
-  "created_at": "2026-04-17T12:00:00.000Z",
-  "description": "Baseline throughput test",
-  "tags": ["fast", "cheap"],
-  "baseline_run_id": "run_1713355200_abcd1234",
-  "model_config": { "model": "claude-sonnet-4-20250514" },
-  "notes": "Compare against bench-b with streaming enabled"
-}
-```
-
-### Workflow
-
-```bash
-# Register an experiment with metadata
-flight experiment new bench-a --description "Baseline" --tags fast,cheap --model claude-sonnet-4
-
-# Start runs that belong to this experiment
-flight run --agent my-agent --experiment bench-a
-flight run --agent my-agent --experiment bench-b
-
-# List all experiments with run counts
-flight experiment list
-
-# Inspect a specific experiment and its runs
-flight experiment show bench-a
-
-# Compare two experiments head-to-head
-flight experiment diff bench-a bench-b
-
-# Export all runs as research JSONL (for offline analysis)
-flight experiment export bench-a | jq .
-```
-
-Unknown experiments are **auto-registered** on first `flight run --experiment <name>`, with a one-line stderr hint pointing to `flight experiment new` for adding metadata. The registry files are plain JSON and fully human-editable.
-
----
-
 ## Performance
 
 - **<5ms** added latency per tool call (streaming NDJSON, fire-and-forget log writes)
@@ -337,7 +280,7 @@ Unknown experiments are **auto-registered** on first `flight run --experiment <n
 - **One file per session**, append-only
 - **Auto-compression:** sessions older than 24h are gzip-compressed (`.jsonl.gz`)
 - **Garbage collection:** configurable max sessions (100) and max size (2 GB)
-- **Pruning:** `flight logs prune --before <date>` or `--keep <n>`
+- **Pruning:** `flight log prune --before <date>` or `--keep <n>`
 
 ---