You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .claude/commands/flight-log.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
Run `flight log audit` to display a full audit of all tool calls from the current session.
1
+
Run `flight logs audit` to display a full audit of all tool calls from the current session.
2
2
3
3
Read the output carefully. Present a concise summary to the user:
4
4
@@ -9,4 +9,4 @@ Read the output carefully. Present a concise summary to the user:
9
9
10
10
If there are errors or suspicious patterns, offer to investigate the specific tool calls or help fix the underlying issues.
11
11
12
-
If the user asks about a specific tool call, you can run `flight log tools` with `--tool <name>` to filter, or read the session's `_tools.jsonl` file directly from `~/.flight/logs/` for full details.
12
+
If the user asks about a specific tool call, you can run `flight logs tools` with `--tool <name>` to filter, or read the session's `_tools.jsonl` file directly from `~/.flight/logs/` for full details.
The experiment registry (`src/experiments.ts`) provides a lightweight, file-per-experiment store at `~/.flight/experiments/<name>.json`. Each file is a JSON object conforming to `ExperimentEntry`:
453
+
454
+
```ts
455
+
typeExperimentEntry= {
456
+
name:string;
457
+
created_at:string;
458
+
description?:string;
459
+
tags:string[];
460
+
baseline_run_id?:string;
461
+
model_config?:Record<string, unknown>;
462
+
notes?:string;
463
+
}
464
+
```
465
+
466
+
### Key properties
467
+
468
+
- **Race-safe creation** — `ensureExperimentRegistered` writes with `{ flag:"wx" }` (O_EXCL). Concurrent callers are safe; exactly one wins `created:true`.
469
+
- **Idempotent** — calling `ensureExperimentRegistered` a second time returns `{ created:false }` without rewriting the file.
470
+
- **Merge semantics** — `createOrUpdateExperiment` merges patch fields (arrays replace, not append) while preserving `created_at`.
471
+
- **Graceful reads** — `listExperiments` skips files with invalid JSON or wrong shape, logging a warning per skipped file.
472
+
- **Auto-creation** — `flightrun --experiment <name>` calls `ensureExperimentRegistered` before `runSessionStart`; on `{ created:true }` it prints a one-line stderr hint.
473
+
474
+
### CLI verbs
475
+
476
+
| Command | Description |
477
+
|---------|-------------|
478
+
| `flightexperimentnew <name>` | Register or update an experiment (description, tags, baseline, model, notes) |
479
+
| `flightexperimentlist` | Table of all experiments with run counts from SQLite |
Copy file name to clipboardExpand all lines: CHANGELOG.md
+15-3Lines changed: 15 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,18 @@
7
7
-`/flight-compare` slash command: 3-bullet experiment diff (winner, biggest delta, suggested next test) via `flight experiment diff`.
8
8
-`/flight-annotate` slash command: per-turn labelling with strict one-command-per-turn output for persisting annotations via `flight annotate`.
9
9
10
+
## 1.5.0
11
+
12
+
### Breaking
13
+
-**`flight log` renamed to `flight logs`** (plural). All subcommands are unchanged. There is no deprecation shim — update any scripts that call `flight log <subcommand>`. Re-run `flight claude setup` to update installed slash commands.
14
+
15
+
### Added
16
+
-`flight run --agent <agent> [--experiment <id>] [--model <name>]` — start a run with a human-friendly output (`Started run <runId> session <sessionId>`). Mirrors `session start` options.
17
+
-`flight show <session-id>` — view a recorded session (alias for `flight logs view`).
18
+
-`flight logs` bare invocation (no subcommand) now behaves like `flight logs list`.
19
+
-`flight experiment {new,list,show,diff,export}` — experiment registry and cross-run analysis commands backed by `~/.flight/experiments/<name>.json`.
20
+
- Auto-registration: `flight run --experiment <name>` creates the experiment registry file on first use and prints a one-line stderr hint to add description/tags.
21
+
10
22
## 1.4.0
11
23
12
24
### Removed
@@ -27,7 +39,7 @@
27
39
## 1.2.0
28
40
29
41
### Added
30
-
-`flight log audit` — rich audit view of tool calls for the current session (powers `/flight-log` slash command)
42
+
-`flight logs audit` — rich audit view of tool calls for the current session (powers `/flight-log` slash command)
31
43
-`/flight-log` slash command installed by `flight setup`
32
44
- Active session marker (`~/.flight/logs/.active_session`) for hook-aware session resolution
33
45
-`mergeSessionUsage()` exported for programmatic usage tracking in progressive disclosure
Old command paths (`flight setup`, `flight hooks`, `flight init`,`flight stats`, `flight export`, `flight replay`) are deprecated aliases that print a warning and delegate.
86
+
Note: `flight log` (singular) was renamed to`flight logs` (plural) in 1.5.0 — update any scripts accordingly. There is no deprecation shim.
76
87
77
88
## Log Schema
78
89
@@ -100,6 +111,7 @@ Installed in `~/.claude/commands/` by `flight claude setup`:
100
111
-**`/flight-annotate`** — labels each turn and emits `flight annotate` shell commands to persist labels (runs `flight logs verbose`)
101
112
102
113
### Data Locations
114
+
-`~/.flight/experiments/<name>.json` — experiment registry (one JSON file per experiment)
-**SQLite query layer** — `FlightDB` indexes JSONL files into SQLite for cross-session queries, aggregation by tool, and daily trends.
130
142
-**Alert detection** — Error-recovery anomalies (different tool called after error), loop detection (same tool 5x in 60s).
143
+
-**Experiment registry** — `src/experiments.ts` stores one JSON file per experiment at `~/.flight/experiments/<name>.json`. Race-safe creation via O_EXCL (`flag: "wx"`); `createOrUpdateExperiment` merges patches (arrays replace). `flight run --experiment` auto-registers and prints a hint on first use.
Copy file name to clipboardExpand all lines: FAQ.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ All session logs are stored locally at:
20
20
~/.flight/logs/<session_id>.jsonl
21
21
```
22
22
23
-
Each session produces one append-only JSONL file. You can list sessions with `flight log list` and inspect them with `flight log view <session>` or `flight log inspect <call-id>`.
23
+
Each session produces one append-only JSONL file. You can list sessions with `flight logs list` and inspect them with `flight logs view <session>` or `flight logs inspect <call-id>`.
24
24
25
25
## How do I set it up with Claude Desktop?
26
26
@@ -52,8 +52,8 @@ Progressive Disclosure is a token optimization feature. Instead of sending full
52
52
Use the export command to extract session data in CSV or JSONL format:
You can also work with the raw JSONL files directly using `jq`, Python, or any tool that reads newline-delimited JSON.
@@ -67,7 +67,7 @@ Yes. All data stays on your local machine. Flight never sends data to any extern
67
67
Flight includes a heuristic hallucination hint detector. It flags cases where the client proceeds after a server error without retrying -- a pattern that often indicates the agent is operating on assumptions rather than real data. View flagged entries with:
68
68
69
69
```bash
70
-
flight log filter --hallucinations
70
+
flight logs filter --hallucinations
71
71
```
72
72
73
73
These hints are investigative leads, not definitive verdicts. They tell you where to look, not what happened.
The experiment registry provides a lightweight, file-per-experiment store at `~/.flight/experiments/<name>.json`. It lets you group and compare runs across multiple sessions.
"notes": "Compare against bench-b with streaming enabled"
293
+
}
294
+
```
295
+
296
+
### Workflow
297
+
298
+
```bash
299
+
# Register an experiment with metadata
300
+
flight experiment new bench-a --description "Baseline" --tags fast,cheap --model claude-sonnet-4
301
+
302
+
# Start runs that belong to this experiment
303
+
flight run --agent my-agent --experiment bench-a
304
+
flight run --agent my-agent --experiment bench-b
305
+
306
+
# List all experiments with run counts
307
+
flight experiment list
308
+
309
+
# Inspect a specific experiment and its runs
310
+
flight experiment show bench-a
311
+
312
+
# Compare two experiments head-to-head
313
+
flight experiment diff bench-a bench-b
314
+
315
+
# Export all runs as research JSONL (for offline analysis)
316
+
flight experiment export bench-a | jq .
317
+
```
318
+
319
+
Unknown experiments are **auto-registered** on first `flight run --experiment <name>`, with a one-line stderr hint pointing to `flight experiment new` for adding metadata. The registry files are plain JSON and fully human-editable.
0 commit comments