Skip to content

Commit 75332b2

Browse files
authored
Merge pull request #7 from lewisnsmith/feat/slash-commands
feat: add /flight-review, /flight-compare, /flight-annotate slash commands
2 parents 4807161 + 3ab0d9c commit 75332b2

6 files changed

Lines changed: 200 additions & 4 deletions

File tree

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
# Changelog
22

3+
## Unreleased
4+
5+
### Added
6+
- `/flight-review` slash command: structured session critique (retries, errors, tool overuse, good decisions) via `flight show` and `flight logs verbose`.
7+
- `/flight-compare` slash command: 3-bullet experiment diff (winner, biggest delta, suggested next test) via `flight experiment diff`.
8+
- `/flight-annotate` slash command: per-turn labelling with strict one-command-per-turn output for persisting annotations via `flight annotate`.
9+
310
## 1.5.0
411

512
### Breaking

CLAUDE.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,9 @@ Installed in `~/.claude/settings.json` by `flight claude setup`:
106106
Installed in `~/.claude/commands/` by `flight claude setup`:
107107
- **`/flight`** — quick session audit (runs `flight logs audit`)
108108
- **`/flight-log`** — comprehensive view (runs `flight logs verbose`)
109+
- **`/flight-review`** — annotates a session for retries, errors, tool overuse, and good decisions (runs `flight show` + `flight logs verbose`)
110+
- **`/flight-compare`** — diffs two experiments with a 3-bullet summary: winner, biggest delta, next test (runs `flight experiment diff`)
111+
- **`/flight-annotate`** — labels each turn and emits `flight annotate` shell commands to persist labels (runs `flight logs verbose`)
109112

110113
### Data Locations
111114
- `~/.flight/experiments/<name>.json` — experiment registry (one JSON file per experiment)

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,16 @@ flight claude init code --apply # Wrap MCP servers for full traffic record
126126
flight logs tail
127127
```
128128

129+
**Slash commands** (installed by `flight claude setup`):
130+
131+
| Command | What it does |
132+
|---|---|
133+
| `/flight` | Quick session audit — overview, tool breakdown, issues |
134+
| `/flight-log` | Full session view with complete input/output payloads |
135+
| `/flight-review` | Structured critique: retries, errors, tool overuse, good decisions |
136+
| `/flight-compare` | 3-bullet experiment diff: winner, biggest delta, suggested next test |
137+
| `/flight-annotate` | Label each turn and emit `flight annotate` commands to persist labels |
138+
129139
```
130140
● Tailing session_20260315_142201
131141

src/cli.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1051,7 +1051,7 @@ claude
10511051
console.log(`\x1b[32m✓\x1b[0m Restored original MCP config from backup`);
10521052
}
10531053
if (result.slashCommandRemoved) {
1054-
console.log(`\x1b[32m✓\x1b[0m Removed /flight and /flight-log slash commands`);
1054+
console.log(`\x1b[32m✓\x1b[0m Removed Flight slash commands`);
10551055
}
10561056
return;
10571057
}

src/setup.ts

Lines changed: 56 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,60 @@ Read the output carefully and present it to the user. This is the detailed view
3838
If the output is very long, focus on errors and notable calls first, then offer to walk through specific sections.
3939
4040
For a quick summary instead, the user can run \`/flight\`.
41+
`,
42+
},
43+
{
44+
filename: "flight-review.md",
45+
content: `Run \`flight show $ARGUMENTS\` to load the session. If you need more detail (full payloads, turn-by-turn breakdown), also run \`flight logs verbose $ARGUMENTS\`.
46+
47+
Analyse the session output and produce a structured critique:
48+
49+
## Session Review
50+
51+
**Retries** — identify any tool calls that were retried; note the original failure reason and whether the retry succeeded.
52+
53+
**Errors** — list every error with: tool name, timestamp, error message, and a diagnosis (transient vs. logic bug vs. permission issue).
54+
55+
**Tool overuse** — flag any tool called more than 3 times in a row for the same purpose, or any redundant read→read sequences where the file did not change between calls.
56+
57+
**Good decisions** — call out at least one thing the agent did well (e.g. correct tool selection, efficient batching, clean error recovery).
58+
59+
**Overall verdict** — one sentence: was this session efficient, acceptable, or problematic?
60+
61+
Be specific. Reference turn IDs and tool names from the output, not vague generalities.
62+
`,
63+
},
64+
{
65+
filename: "flight-compare.md",
66+
content: `Run \`flight experiment diff $ARGUMENTS\` where $ARGUMENTS is two experiment names separated by a space (e.g. \`bench-a bench-b\`).
67+
68+
Read the diff output and produce a 3-bullet summary:
69+
70+
- **Winner** — which experiment performed better overall, and on which primary metric (e.g. total tokens, error rate, latency).
71+
- **Biggest delta** — the single metric with the largest absolute or relative difference between the two experiments; include the numbers.
72+
- **Suggested next test** — one concrete follow-up experiment to run, based on what the diff reveals (e.g. "isolate the model change", "test with a smaller tool set", "re-run with stricter system prompt").
73+
74+
Be specific about which metrics differ. Do not produce prose summaries — use the three bullets only.
75+
`,
76+
},
77+
{
78+
filename: "flight-annotate.md",
79+
content: `Run \`flight logs verbose $ARGUMENTS\` to load the full session.
80+
81+
For each turn in the output, assign exactly one label from: \`good\`, \`bad\`, \`redundant_call\`, \`hallucination\`, \`correct_tool\`.
82+
83+
Emit exactly one shell command per turn, in this format:
84+
\`\`\`
85+
flight annotate <turn-id> --label <label> --type turn
86+
\`\`\`
87+
88+
Rules:
89+
- No prose between the shell commands.
90+
- Do not skip any turn — every turn gets exactly one command.
91+
- Use the turn ID from the verbose output (e.g. \`turn_001\`).
92+
- Choose the most specific label: prefer \`hallucination\` or \`redundant_call\` over \`bad\` when they apply.
93+
94+
After emitting all commands, say: "Run the above commands to persist labels."
4195
`,
4296
},
4397
] as const;
@@ -319,9 +373,9 @@ export async function runSetupWizard(
319373

320374
if (features.slashCommands) {
321375
if (result.slashCommandInstalled) {
322-
console.log(`${C.green}${C.reset} Installed /flight and /flight-log slash commands`);
376+
console.log(`${C.green}${C.reset} Installed Flight slash commands`);
323377
} else {
324-
console.log(`${C.yellow} !${C.reset} /flight and /flight-log slash commands already installed`);
378+
console.log(`${C.yellow} !${C.reset} Flight slash commands already installed`);
325379
}
326380
} else {
327381
console.log(`${C.dim} - Slash commands: skipped${C.reset}`);

test/setup.test.ts

Lines changed: 123 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import { describe, it, expect, afterEach } from "vitest";
2-
import { writeFile, mkdir, rm, readFile } from "node:fs/promises";
2+
import { writeFile, mkdir, rm, readFile, access } from "node:fs/promises";
33
import { join } from "node:path";
44
import { tmpdir } from "node:os";
55
import { runSetup, runRemove } from "../src/setup.js";
@@ -90,3 +90,125 @@ describe("runRemove", () => {
9090
expect(JSON.parse(restored).mcpServers.myserver.command).toBe("my-mcp");
9191
});
9292
});
93+
94+
describe("slash commands", () => {
95+
const testDir = join(tmpdir(), `flight-slash-${Date.now()}`);
96+
const commandsDir = join(testDir, ".claude", "commands");
97+
const expectedFiles = [
98+
"flight.md",
99+
"flight-log.md",
100+
"flight-review.md",
101+
"flight-compare.md",
102+
"flight-annotate.md",
103+
];
104+
105+
afterEach(async () => {
106+
try { await rm(testDir, { recursive: true }); } catch { /* ignore */ }
107+
});
108+
109+
it("installs all five slash command files when slashCommands: true", async () => {
110+
const claudeDir = join(testDir, ".claude");
111+
await mkdir(claudeDir, { recursive: true });
112+
await writeFile(join(claudeDir, "settings.json"), JSON.stringify({}));
113+
114+
await runSetup(
115+
{
116+
homeDir: testDir,
117+
settingsPath: join(claudeDir, "settings.json"),
118+
claudeCodeConfigPath: join(testDir, ".claude.json"),
119+
},
120+
{ hooks: false, proxy: false, pd: false, slashCommands: true, banner: true },
121+
);
122+
123+
for (const filename of expectedFiles) {
124+
const filePath = join(commandsDir, filename);
125+
await expect(access(filePath)).resolves.toBeUndefined();
126+
}
127+
});
128+
129+
it("each command body references flight logs (not flight log)", async () => {
130+
const claudeDir = join(testDir, ".claude");
131+
await mkdir(claudeDir, { recursive: true });
132+
await writeFile(join(claudeDir, "settings.json"), JSON.stringify({}));
133+
134+
await runSetup(
135+
{
136+
homeDir: testDir,
137+
settingsPath: join(claudeDir, "settings.json"),
138+
claudeCodeConfigPath: join(testDir, ".claude.json"),
139+
},
140+
{ hooks: false, proxy: false, pd: false, slashCommands: true, banner: true },
141+
);
142+
143+
for (const filename of expectedFiles) {
144+
const body = await readFile(join(commandsDir, filename), "utf-8");
145+
// No body should reference the old "flight log " pattern (singular, with trailing space)
146+
expect(body).not.toMatch(/`flight log /);
147+
}
148+
});
149+
150+
it("command bodies reference the correct CLI verbs", async () => {
151+
const claudeDir = join(testDir, ".claude");
152+
await mkdir(claudeDir, { recursive: true });
153+
await writeFile(join(claudeDir, "settings.json"), JSON.stringify({}));
154+
155+
await runSetup(
156+
{
157+
homeDir: testDir,
158+
settingsPath: join(claudeDir, "settings.json"),
159+
claudeCodeConfigPath: join(testDir, ".claude.json"),
160+
},
161+
{ hooks: false, proxy: false, pd: false, slashCommands: true, banner: true },
162+
);
163+
164+
const flightBody = await readFile(join(commandsDir, "flight.md"), "utf-8");
165+
expect(flightBody).toContain("flight logs audit");
166+
167+
const flightLogBody = await readFile(join(commandsDir, "flight-log.md"), "utf-8");
168+
expect(flightLogBody).toContain("flight logs verbose");
169+
170+
const reviewBody = await readFile(join(commandsDir, "flight-review.md"), "utf-8");
171+
expect(reviewBody).toContain("flight show $ARGUMENTS");
172+
expect(reviewBody).toContain("flight logs verbose $ARGUMENTS");
173+
174+
const compareBody = await readFile(join(commandsDir, "flight-compare.md"), "utf-8");
175+
expect(compareBody).toContain("flight experiment diff $ARGUMENTS");
176+
177+
const annotateBody = await readFile(join(commandsDir, "flight-annotate.md"), "utf-8");
178+
expect(annotateBody).toContain("flight logs verbose $ARGUMENTS");
179+
expect(annotateBody).toContain("flight annotate <turn-id> --label <label> --type turn");
180+
});
181+
182+
it("removes all five slash command files on runRemove", async () => {
183+
const claudeDir = join(testDir, ".claude");
184+
await mkdir(claudeDir, { recursive: true });
185+
await writeFile(join(claudeDir, "settings.json"), JSON.stringify({}));
186+
187+
// Install first
188+
await runSetup(
189+
{
190+
homeDir: testDir,
191+
settingsPath: join(claudeDir, "settings.json"),
192+
claudeCodeConfigPath: join(testDir, ".claude.json"),
193+
},
194+
{ hooks: false, proxy: false, pd: false, slashCommands: true, banner: true },
195+
);
196+
197+
// Verify they exist
198+
for (const filename of expectedFiles) {
199+
await expect(access(join(commandsDir, filename))).resolves.toBeUndefined();
200+
}
201+
202+
// Remove
203+
await runRemove({
204+
homeDir: testDir,
205+
settingsPath: join(claudeDir, "settings.json"),
206+
claudeCodeConfigPath: join(testDir, ".claude.json"),
207+
});
208+
209+
// Verify they're gone
210+
for (const filename of expectedFiles) {
211+
await expect(access(join(commandsDir, filename))).rejects.toThrow();
212+
}
213+
});
214+
});

0 commit comments

Comments
 (0)