Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
92a7dc2
feat: add bmad-checkpoint skill for guided human change review
alexeyv Mar 27, 2026
c7ec042
chore: rename bmad-checkpoint to bmad-checkpoint-preview
alexeyv Mar 27, 2026
c9c257d
refactor(checkpoint): inline workflow into SKILL.md and add global st…
alexeyv Apr 1, 2026
12b9a1e
refactor(checkpoint): reference global step rules from SKILL.md in st…
alexeyv Apr 1, 2026
1ba1db3
refactor(checkpoint): deduplicate step rules against global step rules
alexeyv Apr 1, 2026
5eb8131
fix(checkpoint): move main_config out of SKILL.md frontmatter
alexeyv Apr 1, 2026
b6ae62e
docs(checkpoint): update skill description and trigger phrases
alexeyv Apr 1, 2026
beeb2e1
fix(checkpoint): align trail format with global step rules and add to…
alexeyv Apr 1, 2026
1f2962a
refactor(checkpoint): rewrite FIND THE CHANGE as numbered priority ca…
alexeyv Apr 1, 2026
4424fff
fix(checkpoint): clarify review_mode and terse-commit instructions in…
alexeyv Apr 1, 2026
bd450d0
fix(checkpoint): make review_mode a numbered cascade, not independent…
alexeyv Apr 1, 2026
fd0d64d
fix(checkpoint): simplify change_type from table to one-liner
alexeyv Apr 1, 2026
f927bb3
fix(checkpoint): make link-to-source conditional on source existing
alexeyv Apr 1, 2026
26e415f
fix(checkpoint): make surface area stats best-effort with baseline ca…
alexeyv Apr 1, 2026
0d50525
refactor(checkpoint): extract fallback trail generation into generate…
alexeyv Apr 1, 2026
c45cc35
fix(checkpoint): add early-exit routing and wrap-up step
alexeyv Apr 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions src/bmm-skills/4-implementation/bmad-checkpoint-preview/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
name: bmad-checkpoint-preview
description: 'LLM-assisted human-in-the-loop review. Make sense of a change, focus attention where it matters, test. Use when the user says "checkpoint", "human review", or "walk me through this change".'
---

# Checkpoint Review Workflow

**Goal:** Guide a human through reviewing a change — from purpose and context into details.

You are assisting the user in reviewing a change.

## Global Step Rules (apply to every step)

- **Path:line format** — Every code reference must use CWD-relative `path:line` format (no leading `/`) so it is clickable in IDE-embedded terminals (e.g., `src/auth/middleware.ts:42`).
- **Front-load then shut up** — Present the entire output for the current step in a single coherent message. Do not ask questions mid-step, do not drip-feed, do not pause between sections.
- **Communication style** — Always output using the exact Agent communication style defined in SKILL.md and the loaded config.

## INITIALIZATION

Load and read full config from `{project-root}/_bmad/bmm/config.yaml` and resolve:

- `implementation_artifacts`
- `planning_artifacts`
- `communication_language`

## FIRST STEP

Read fully and follow `./step-01-orientation.md` to begin.
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Generate Review Trail

Generate a review trail from the diff and codebase context. A generated trail is lower quality than an author-produced one, but far better than none.

## Follow Global Step Rules in SKILL.md

## INSTRUCTIONS

1. Get the full diff against the appropriate baseline (same rules as Surface Area Stats in step-01).
2. Read changed files in full — not just diff hunks. Surrounding code reveals intent that hunks alone miss. If total file content exceeds ~50k tokens, read only the files with the largest diff hunks in full and use hunks for the rest.
3. If a spec exists, use its Intent section to anchor concern identification.
4. Identify 2–5 concerns: cohesive design intents that each explain *why* behind a cluster of changes. Prefer functional groupings and architectural boundaries over file-level splits. A single-concern change is fine — don't invent groupings.
5. For each concern, select 1–4 `path:line` stops — locations where the concern is most visible. Prefer entry points, decision points, and boundary crossings over mechanical changes.
6. Lead with the entry point — the highest-leverage stop a reviewer should see first. Inside each concern, order stops so each builds on the previous. End with peripherals (tests, config, types).
7. Format each stop using `path:line` per the global step rules:

```
**{Concern name}**

- {one-line framing, ≤15 words}
`src/path/to/file.ts:42`
```

When there is only one concern, omit the bold label — just list the stops directly.

## PRESENT

Output after the orientation:

```
I built a review trail for this {change_type} (no author-produced trail was found):

{generated trail}
```

Set review mode to `full-trail`. The generated trail is the Suggested Review Order for subsequent steps.

If git is unavailable or the diff cannot be retrieved, return to step-01 with: "Could not generate trail — git unavailable."
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Step 1: Orientation

Display: `[Orientation] → Walkthrough → Detail Pass → Testing`

## Follow Global Step Rules in SKILL.md

## FIND THE CHANGE

The conversation context before this skill was triggered IS your starting point — not a blank slate. Check in this order — stop as soon as the change is identified:

1. **Explicit argument**
Did the user pass a PR, commit SHA, branch, or spec file this message?
- PR reference → resolve to branch/commit via `gh pr view`. If resolution fails, ask for a SHA or branch.
- Spec file, commit, or branch → use directly.

2. **Recent conversation**
Do the last few messages reveal what change the user wants reviewed? Look for spec paths, commit refs, branches, PRs, or descriptions of a change. Use the same routing as above.

3. **Sprint tracking**
Check for a sprint status file (`*sprint-status*`) in `{implementation_artifacts}` or `{planning_artifacts}`. If found, scan for stories with status `review`:
- Exactly one → suggest it and confirm with the user.
- Multiple → present as numbered options.
- None → fall through.

4. **Current git state**
Check current branch and HEAD. Confirm: "I see HEAD is `<short-sha>` on `<branch>` — is this the change you want to review?"

5. **Ask**
If none of the above identified a change, ask:
- What changed and why?
- Which commit, branch, or PR should I look at?
- Do you have a spec, bug report, or anything else that explains what this change is supposed to do?

If after 3 exchanges you still can't identify a change, HALT.

Never ask extra questions beyond what the cascade prescribes. If a step above already identified the change, skip the remaining steps.

## ENRICH

Once a change is identified from any source above, fill in the complementary artifact:

- If you have a spec, look for `baseline_commit` in its frontmatter to determine the diff baseline.
- If you have a commit or branch, check `{implementation_artifacts}` for a spec whose `baseline_commit` is an ancestor of that commit/branch (i.e., the spec describes work done on top of that baseline).
- If you found both a spec and a commit/branch, use both.

## DETERMINE WHAT YOU HAVE

Set `change_type` to match how the user referred to the change — `PR`, `commit`, `branch`, or their own words (e.g. `auth refactor`). Default to `change` if ambiguous.

Set `review_mode` — pick the first match:

1. **`full-trail`** — ENRICH found a spec with a `## Suggested Review Order` section. Intent source: spec's Intent section.
2. **`spec-only`** — ENRICH found a spec but it has no Suggested Review Order. Intent source: spec's Intent section.
3. **`bare-commit`** — no spec found. Intent source: commit message. If the commit message is terse (under 10 words), scan the diff for the primary change pattern and draft a one-sentence intent. Confirm with the user before proceeding.

## PRODUCE ORIENTATION

### Intent Summary

- If intent comes from a spec's Intent section, display it verbatim regardless of length — it's already written to be concise.
- For other sources (commit messages, bug reports, user description): if ≤200 tokens, display verbatim. If longer, distill to ≤200 tokens. Link to the full source when one exists (e.g. a file path or URL).
- Format: `> **Intent:** {summary}`

### Surface Area Stats

Best-effort stats from `git diff --stat`. Try these baselines in order:

1. `baseline_commit` from the spec's frontmatter.
2. Branch merge-base against `main` (or the default branch).
3. `HEAD~1..HEAD` (latest commit only — tell the user).
4. If git is unavailable or all of the above fail, skip stats and note: "Could not compute stats."

Display as:

```
N files changed · M modules touched · ~L lines of logic · B boundary crossings · P new public interfaces
```

- **Files changed**: from `git diff --stat`.
- **Modules touched**: distinct top-level directories with changes.
- **Lines of logic**: added/modified lines excluding blanks, imports, formatting. `~` because approximate.
- **Boundary crossings**: changes spanning more than one top-level module. `0` if single module.
- **New public interfaces**: new exports, endpoints, public methods. `0` if none.

Omit any metric you cannot compute rather than guessing.

### Present

```
[Orientation] → Walkthrough → Detail Pass → Testing

> **Intent:** {intent_summary}

{stats line}
```

## FALLBACK TRAIL GENERATION

If review mode is not `full-trail`, read fully and follow `./generate-trail.md` to build one from the diff. Then return here and continue to NEXT.

## NEXT

Read fully and follow `./step-02-walkthrough.md`
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Step 2: Walkthrough

Display: `Orientation → [Walkthrough] → Detail Pass → Testing`

## Follow Global Step Rules in SKILL.md

- Organize by **concern**, not by file. A concern is a cohesive design intent — e.g., "input validation," "state management," "API contract." One file may appear under multiple concerns; one concern may span multiple files.
- The walkthrough activates **design judgment**, not correctness checking. Frame each concern as "here's what this change does and why" — the human evaluates whether it's the right approach for the system.

## BUILD THE WALKTHROUGH

### Identify Concerns

**With Suggested Review Order** (`full-trail` mode):

1. Read the Suggested Review Order stops from the spec (or from conversation context if generated by step-01 fallback).
2. Resolve each stop to a file in the current repo. Output in `path:line` format per the standing rule.
3. Read the diff to understand what each stop actually does.
4. Group stops by concern. Stops that share a design intent belong together even if they're in different files. A stop may appear under multiple concerns if it serves multiple purposes.

**Without Suggested Review Order** (`spec-only` or `bare-commit` mode):

1. Get the diff against the appropriate baseline (same rules as step 1).
2. Identify concerns by reading the diff for cohesive design intents:
- Functional groupings — what user-facing behavior does each cluster of changes support?
- Architectural layers — does the change cross boundaries (API → service → data)?
- Design decisions — where did the author choose between alternatives?
3. For each concern, identify the key code locations as `path:line` stops.

### Order for Comprehension

Sequence concerns top-down: start with the highest-level intent (the "what and why"), then drill into supporting implementation. Within each concern, order stops so each one builds on the previous. The reader should never encounter a reference to something they haven't seen yet.

If the change has a natural entry point (e.g., a new public API, a config change, a UI entry point), lead with it.

### Write Each Concern

For each concern, produce:

1. **Heading** — a short phrase naming the design intent (not a file name, not a module name).
2. **Why** — 1–2 sentences: what problem this concern addresses, why this approach was chosen over alternatives. If the spec documents rejected alternatives, reference them here.
3. **Stops** — each stop on its own line: `path:line` followed by a brief phrase (not a sentence) describing what this location does for the concern. Keep framing under 15 words per stop.

Target 2–5 concerns for a typical change. A single-concern change is fine — don't invent groupings. A change with more than 7 concerns is a signal the scope may be too large, but present it anyway.

## PRESENT

Output the full walkthrough as a single message with this structure:

```
Orientation → [Walkthrough] → Detail Pass → Testing
```

Then each concern group using this format:

```
### {Concern Heading}

{Why — 1–2 sentences}

- `path:line` — {brief framing}
- `path:line` — {brief framing}
- ...
```

End the message with:

```
---

Take your time — click through the stops, read the diff, trace the logic. While you are reviewing, you can:
- "run advanced elicitation on the error handling"
- "party mode on whether this schema migration is safe"
- or just ask anything

When you're ready, say **next** and I'll surface the highest-risk spots.
```

## EARLY EXIT

If at any point the human signals they want to make a decision about this {change_type} (e.g., "let's ship it", "this needs a rethink", "I'm done reviewing", or anything suggesting they're ready to decide), confirm their intent:

- If they want to **approve and ship** → read fully and follow `./step-05-wrapup.md`
- If they want to **reject and rework** → read fully and follow `./step-05-wrapup.md`
- If you misread them → acknowledge and continue the current step.

## NEXT

Default: read fully and follow `./step-03-detail-pass.md`
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Step 3: Detail Pass

Display: `Orientation → Walkthrough → [Detail Pass] → Testing`

## Follow Global Step Rules in SKILL.md

- The detail pass surfaces what the human should **think about**, not what the code got wrong. Machine hardening already handled correctness. This activates risk awareness.
- The LLM detects risk category by pattern. The human judges significance. Do not assign severity scores or numeric rankings — ordering by blast radius (below) is sequencing for readability, not a severity judgment.
- If no high-risk spots exist, say so explicitly. Do not invent findings.

## IDENTIFY RISK SPOTS

Scan the diff for changes touching risk-sensitive patterns. Look for 2–5 spots where a mistake would have the highest blast radius — not the most complex code, but the code where being wrong costs the most.

Risk categories to detect:

- `[auth]` — authentication, authorization, session, token, permission, access control
- `[public API]` — new/changed endpoints, exports, public methods, interface contracts
- `[schema]` — database migrations, schema changes, data model modifications, serialization
- `[billing]` — payment, pricing, subscription, metering, usage tracking
- `[infra]` — deployment, CI/CD, environment variables, config files, infrastructure
- `[security]` — input validation, sanitization, crypto, secrets, CORS, CSP
- `[config]` — feature flags, environment-dependent behavior, defaults
- `[other]` — anything risk-sensitive that doesn't fit the above (e.g., concurrency, data privacy, backwards compatibility). Use a descriptive tag.

Sequence spots so the highest blast radius comes first (how much breaks if this is wrong), not by diff order or file order. If more than 5 spots qualify, show the top 5 and note: "N additional spots omitted — ask if you want the full list."

If the change has no spots matching these patterns, state: "No high-risk spots found in this change — the diff speaks for itself." Do not force findings.

## SURFACE MACHINE HARDENING FINDINGS

Check whether the spec has a `## Spec Change Log` section with entries (populated by adversarial review loops).

- **If entries exist:** Read them. Surface findings that are instructive for the human reviewer — not bugs that were already fixed, but decisions the review loop flagged that the human should be aware of. Format: brief summary of what was flagged and what was decided.
- **If no entries or no spec:** Skip this section entirely. Do not mention it.

## PRESENT

Output as a single message:

```
Orientation → Walkthrough → [Detail Pass] → Testing
```

### Risk Spots

For each spot, one line:

```
- `path:line` — [tag] reason-phrase
```

Example:

```
- `src/auth/middleware.ts:42` — [auth] New token validation bypasses rate limiter
- `migrations/003_add_index.sql:7` — [schema] Index on high-write table, check lock behavior
- `api/routes/billing.ts:118` — [billing] Metering calculation changed, verify idempotency
```

### Machine Hardening (only if findings exist)

```
### Machine Hardening

- Finding summary — what was flagged, what was decided
- ...
```

### Closing menu

End the message with:

```
---

You've seen the design and the risk landscape. From here:
- **"dig into [area]"** — I'll deep-dive that specific area with correctness focus
- **"next"** — I'll suggest how to observe the behavior
```

## EARLY EXIT

If at any point the human signals they want to make a decision about this {change_type} (e.g., "let's ship it", "this needs a rethink", "I'm done reviewing", or anything suggesting they're ready to decide), confirm their intent:

- If they want to **approve and ship** → read fully and follow `./step-05-wrapup.md`
- If they want to **reject and rework** → read fully and follow `./step-05-wrapup.md`
- If you misread them → acknowledge and continue the current step.

## TARGETED RE-REVIEW

When the human says "dig into [area]" (e.g., "dig into the auth changes", "dig into the schema migration"):

1. If the specified area does not map to any code in the diff, say so: "I don't see [area] in this change — did you mean something else?" Return to the closing menu.
2. Identify all code locations in the diff relevant to the specified area.
3. Read each location in full context (not just the diff hunk — read surrounding code).
4. Shift to **correctness mode**: trace edge cases, check boundary conditions, verify error handling, look for off-by-one errors, race conditions, resource leaks.
5. Present findings as a compact list — each finding is `path:line` + what you found + why it matters.
6. If nothing concerning is found, say so: "Looked closely at [area] — nothing concerning. The implementation is solid."
7. After presenting, show only the closing menu (not the full risk spots list again).

The human can trigger multiple targeted re-reviews. Each time, present new findings and the closing menu only.

## NEXT

Read fully and follow `./step-04-testing.md`
Loading
Loading