Purpose: [One-sentence description of what this runbook accomplishes end-to-end.]
Audience: AI coding agents first, humans second. This document is written to reduce ambiguity, prevent scope drift, and improve code quality with the same model capability.
How to use: Work through milestones sequentially. Before starting any milestone, read its full section and the Global Execution Rules. After completing it, follow the Global Exit Rules. Never skip ahead. Never silently widen scope.
Prerequisite reading: ARCHITECTURE.md, README.md, [relevant design docs]
- Runbook ID:
[short-id] - Prefix for test files and lessons files:
[prefix] - Primary stack:
[e.g. Rust + Tauri + React + TypeScript] - Primary package/app names:
[package names] - Default test commands:
- Backend:
[command] - Frontend:
[command] - E2E backend:
[command] - E2E frontend:
[command] - Build/boot:
[command]
- Backend:
- Allowed new dependencies by default:
none - Schema/config migration allowed by default:
no - Public interfaces that must remain stable unless explicitly listed otherwise:
[API/command/event/state file/UI route/public type][API/command/event/state file/UI route/public type]
Update this table as each milestone is completed. This is the single source of truth for progress.
| # | Milestone | Status | Started | Completed | Lessons File | Completion Summary |
|---|---|---|---|---|---|---|
| 1 | [Milestone 1 title] | not_started |
||||
| 2 | [Milestone 2 title] | not_started |
||||
| 3 | [Milestone 3 title] | not_started |
Provide a complete architecture diagram of the proposed end state after all milestones are complete. This diagram should be understandable at a glance and serve as the north star for every milestone.
- Show all major components, services, and actors.
- Show data flow direction between components with labeled arrows.
- Show persistence boundaries (databases, file systems, caches).
- Show trust boundaries and external integration points.
- Show IPC, API, and event boundaries.
- Distinguish between what exists today (solid lines) and what will be built (dashed lines).
- Include a legend explaining symbols and line styles.
┌─────────────────────────────────────────────────────────────────────┐
│ [System Name] │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ [Actor] │───▶│ [Component] │───▶│ [Component] │ │
│ └──────────┘ └──────────────┘ └───────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌───────────────┐ │
│ │ [Component] │ │ [Persistence] │ │
│ └──────────────┘ └───────────────┘ │
│ │
│ Legend: │
│ ─── existing - - - new ═══ external ▶ data flow │
└─────────────────────────────────────────────────────────────────────┘
[Replace the above with the actual architecture diagram for this runbook. Use ASCII art or Mermaid syntax. Include all components that any milestone touches.]
| Component | Responsibility | Milestone Introduced/Changed | Key Interfaces |
|---|---|---|---|
| [Component name] | [What it does] | M[N] | [APIs, events, commands] |
| [Component name] | [What it does] | M[N] | [APIs, events, commands] |
| [Component name] | [What it does] | M[N] | [APIs, events, commands] |
| Flow | From | To | Protocol/Mechanism | Milestone |
|---|---|---|---|---|
| [Flow name] | [Source component] | [Target component] | [IPC/HTTP/event/file] | M[N] |
| [Flow name] | [Source component] | [Target component] | [IPC/HTTP/event/file] | M[N] |
This section captures the system's abstract behavior as a protocol/state-machine design suitable for TLA+ modeling. It focuses on correctness-critical concurrency, state, and failure modes — not implementation details.
When to fill this section: Before starting milestone implementation if the system involves concurrent actors, distributed state, ordering guarantees, resource ownership, or failure recovery. For simple CRUD systems with no concurrency concerns, mark N/A with a brief justification.
Design guidance: Omit low-level code, APIs, schemas, retries, logging, metrics, and deployment details. Avoid timestamps, UUIDs, large payloads, or unbounded queues unless correctness depends on them. Reduce the design to the smallest set of states and transitions that captures the real correctness risks.
[One paragraph: what must the system do, focusing on correctness-critical aspects.]
For each component/actor, list only what matters for correctness.
| Component | Responsibility | Key State (durable / volatile) | Visible Actions |
|---|---|---|---|
| [Name] | [Protocol-level role] | [State that matters for correctness] | [Messages, events, transitions] |
| [Name] | [Role] | [State] | [Actions] |
The minimum set of state variables needed to capture correctness. Flag anything likely to cause state explosion.
| Variable | Type (abstract) | Why Necessary | Explosion Risk |
|---|---|---|---|
[var] |
[e.g., function Node → Status] | [Which property depends on this] | [low / medium / high] |
[var] |
[e.g., bounded sequence of Request] | [Why needed] | [risk] |
| Action | Preconditions | State Updates | Failure / Interleaving Notes |
|---|---|---|---|
| [Action name] | [What must be true] | [What changes] | [Can it fail partway? Concurrency-relevant?] |
| [Action name] | [Preconditions] | [Updates] | [Notes] |
Invariants that TLC should check exhaustively. Write in crisp English translatable to TLA+.
- [Property]: [e.g., "No two nodes hold the lock simultaneously"]
- [Property]: [Invariant statement]
Progress properties and their fairness requirements.
- [Property]: [e.g., "Every submitted request is eventually processed or rejected"] — Fairness: [weak/strong on which actions]
- [Property]: [Statement] — Fairness: [assumption]
| What Was Simplified | Why It Still Covers the Bugs We Care About |
|---|---|
| [Detail removed or bounded] | [Justification] |
| [Detail removed or bounded] | [Justification] |
These rules apply to every milestone without exception.
- Only change files listed in the current milestone unless a listed step explicitly requires one additional file.
- Do not refactor unrelated code.
- Do not rename public APIs, commands, routes, events, persisted state shapes, or config keys unless the milestone explicitly says so.
- Do not introduce a new dependency unless the milestone explicitly allows it.
- Do not change database schema, file formats, or migration behavior unless the milestone explicitly includes migration work and migration tests.
- Write BDD tests before production code.
- Write E2E runtime validation stubs before production code.
- Confirm new tests fail for the right reason before implementing.
- A milestone is not done when code compiles. It is done when the declared contract is satisfied and evidence is recorded.
The following are not allowed unless explicitly permitted in the milestone:
- TODO or placeholder logic in production code
- silent fallbacks that hide errors
- swallowed errors without structured logging or user-visible handling
- fake implementations left in place after tests pass
- commented-out dead code
- temporary mocks in production paths
- hard-coded secrets, test keys, or unsafe defaults
Every milestone must explicitly verify that previously working user flows, commands, routes, persisted state, and public interfaces still work unless the milestone explicitly replaces them.
- Prefer narrow, local modifications over broad rewrites.
- Prefer extending existing patterns over inventing new abstractions.
- Prefer deleting complexity over adding new layers.
- If a refactor is required, keep it minimal and directly justified by the milestone goal.
All meaningful checks must be recorded in the milestone Evidence Log:
- command run
- relevant file or test
- expected result
- actual result
- pass/fail
- notes
- If a milestone introduces new build outputs, generated files, test fixtures, scratch directories, or tool-specific caches, add matching patterns to
.gitignorebefore committing. - Review
.gitignoreat the end of every milestone for staleness — remove patterns that no longer apply. - Never commit test output data, temporary fixtures, scratch files, or generated artifacts to source control.
- Every test that creates files on disk must clean up after itself (use
tempdir,tempfile,afterEachcleanup, or equivalent). Tests must not leave residual data in the working tree. - Record the
.gitignorereview in the Evidence Log.
Do this before every milestone.
- Read the lessons file from the previous milestone, if one exists. Apply any design corrections, naming rules, test strategy improvements, and failure-mode coverage it calls for before writing new code.
- Read the current milestone fully: goal, context, contract block, out-of-scope block, file list, BDD scenarios, regression tests, E2E tests, smoke tests, and definition of done.
- Run the full existing test suite and confirm it passes. Record the baseline in the Evidence Log.
If any tests fail before you start, stop and fix the baseline first. Do not begin a milestone on a red baseline.
[backend test command] [frontend test command] - Read the files listed in "Files Allowed To Change" and "Files To Read Before Changing Anything". Understand their current shape before editing.
- Update the Milestone Tracker in this file: set the current milestone status to
in_progressand record the Started date. - Create BDD test files first.
- Create E2E runtime validation test stubs first.
- Copy the milestone's Evidence Log template into working notes and begin filling it out as work happens.
- Re-state the milestone constraints in your own words before coding:
- goal
- allowed files
- forbidden changes
- compatibility requirements
- tests that must pass
Do this after every milestone.
- Run the full test suite. Every pre-existing test must still pass. Every new BDD scenario must pass.
[backend test command] [frontend test command] - Run the milestone E2E runtime validation tests.
[backend E2E test command] [frontend E2E test command] - Verify the app builds and boots to a usable state.
[build/boot commands] - Run the smoke tests listed in the milestone. Check off each item in the runbook.
- Verify backward compatibility for all items listed in the milestone Compatibility Checklist.
- Complete the Self-Review Gate.
- Clean up test artifacts: Verify no test output files, temporary fixtures, or generated data remain in the working tree. Run
git statusand confirm no untracked test artifacts exist. - Review .gitignore: Ensure any new build outputs, generated files, or tool caches introduced in this milestone have matching
.gitignorepatterns. Remove stale patterns that no longer apply. - Update ARCHITECTURE.md following the Documentation Update Table.
- Update README.md if user-facing capabilities changed.
- Write a lessons-learned file at
docs/slo/lessons/<prefix>-m<N>.md. - Write a completion summary at
docs/slo/completion/<prefix>-m<N>.md. - Update the Milestone Tracker in this file: set status to
done, record Completed date, and fill in the lessons and completion summary paths. - Re-read the next milestone with fresh eyes and record any assumption changes in the lessons file.
[Describe the current state of the system. What exists today? What works? List major subsystems and their capabilities. Be specific — reference file paths, module names, major entry points, and concrete data where relevant.]
[List the specific gaps this runbook addresses. Number each gap and describe it concretely — reference specific code, UI behavior, test gaps, and user impact. Avoid vague generalities.]
- [Gap title]: [Description referencing concrete code and behavior.]
- [Gap title]: [Description.]
[ASCII diagram or description of the target end state after all milestones are complete.
Show major components, data flow, boundaries, persistence, and integration points.]
These are system-wide rules the AI agent must follow when making implementation decisions.
- [Principle name]: [Explanation.]
- [Principle name]: [Explanation.]
- [Principle name]: [Explanation.]
Explicitly list existing subsystems, patterns, and code that must not be changed or broken.
- [Subsystem / module / pattern to preserve]
- [Subsystem / module / pattern to preserve]
List the specific files, modules, or behaviors that will be modified across milestones.
- [File or module] — [summary of change]
- [File or module] — [summary of change]
These are forbidden unless explicitly overridden inside a milestone.
- No unrelated refactors
- No new dependencies
- No schema migrations
- No config key renames
- No public API/event/route renames
- No production placeholders
- No silent error swallowing
- No secrets in source control
- No test output data committed to source control
Optional section. Existing runbooks without this section remain valid;
/slo-executeStep 1.5 falls back to a livegh issue list --label retro-derivedquery. Authors of new runbooks SHOULD include this section once/slo-retrofiles at least one retro-derived issue against this runbook's prefix.What this section is: a table of open prior-retro issues (filed by
/slo-retrofor this runbook's prefix) that should be considered as scope candidates at each milestone start. Each row has a suggested lane so small follow-ups stay small and large follow-ups do not silently widen scope.What this section is NOT: an auto-extension of any milestone's allow-list. The user decides each milestone's bounds. Carry-forward is informational input to that decision, not a substitute for it.
| Issue | Title | Suggested lane | Suggested milestone | Status |
|---|---|---|---|---|
| (e.g., #42) | (one-line summary) | micro | milestone | fresh-runbook |
(M3 | M4 | next runbook) | (open | closed-via-PR-pending | transferred) |
micro— safe, bounded follow-up. Can be folded into the current or immediate next milestone without widening scope (typical: doc polish, small test gap, naming-convention drift).milestone— real milestone-sized work. Warrants its own milestone in this runbook or the next; do not bolt onto an unrelated milestone.fresh-runbook— material scope or risk shift. Do NOT widen the current runbook silently; spin a separate runbook (typical: new architecture work, regulated-domain question, multi-week effort).
/slo-execute M<N> pre-flight Step 1.5 prefers rows from this section over a live gh query when the rows are fresh. Rows with status: closed-via-PR-pending or transferred surface with annotation; the user decides whether to track. Inline output caps at the top 3 items.
/slo-resume reads the milestone tracker plus this section to emit one next action with a lane. Top-3 inline cap; remainder summarized as ... N more.
Runbooks without this section continue to work; /slo-execute and /slo-resume fall back to the live gh query and the tracker-only orientation respectively.
Every milestone follows these rules.
For each milestone:
- Read the BDD acceptance table.
- Create the test file(s) first.
- Confirm the tests fail for the expected reason.
- Write production code to make the tests pass.
- Re-run tests after any refactor.
Every milestone must explicitly cover the categories that apply:
- happy path
- invalid input
- empty state / first-run state
- dependency failure / partial failure
- retry or rollback behavior if relevant
- concurrency or race behavior if relevant
- persistence / restore behavior if relevant
- backward compatibility behavior
If a category does not apply, state why.
Every BDD scenario uses Given/When/Then:
#[test]
fn descriptive_test_name() {
// Given: [precondition]
// When: [action]
// Then: [expected outcome]
}it("descriptive test name", () => {
// Given: [precondition]
// When: [action]
// Then: [expected outcome]
});| Layer | Convention | Location |
|---|---|---|
| Backend unit tests | #[cfg(test)] mod tests inside the source file |
Same file as production code |
| Backend integration/BDD tests | tests/<prefix>_<feature>.rs |
src-tauri/tests/ (or equivalent) |
| Frontend unit tests | <module>.test.ts |
Co-located with source file |
| Frontend page tests | <Page>.test.tsx |
Co-located with component |
| Scenario/e2e tests | tests/scenarios/<prefix>_scenario_<name>.rs |
src-tauri/tests/scenarios/ (or equivalent) |
| E2E runtime validation (backend) | tests/e2e_<prefix>_m<N>.rs |
src-tauri/tests/ (or equivalent) |
| E2E runtime validation (frontend) | e2e/<feature>.e2e.test.tsx |
src/e2e/ |
Every test that creates files, directories, or temporary data on disk must follow these rules:
- Use temporary directories: Prefer
tempdir(),tempfile::TempDir,tmpfrom the test framework, or OS-provided temp locations. Never write test output into the source tree. - Clean up on completion and failure: Use RAII patterns (Rust
Drop),afterEach/afterAllhooks (JS/TS), ordeferstatements to ensure cleanup runs even when tests fail. - No residual state: After the full test suite runs,
git statusmust show no untracked files from test execution. - Dedicated output directories: If a test must write to a project-relative path (e.g.,
output/), that directory must be in.gitignoreand tests must clean it between runs. - CI parity: Test cleanup behavior must be identical locally and in CI. Do not rely on CI ephemeral filesystems as an excuse to skip cleanup.
Every milestone must include E2E tests that go beyond compilation and verify that the system works correctly at runtime. These tests prove:
- the app boots without errors
- runtime contracts are met across IPC/API boundaries
- BDD scenarios work at runtime, not just in isolation
- there are no runtime panics, unhandled rejections, or silent failures
- degraded states behave safely and visibly
- Test runtime behavior, not just types.
- Test the full stack where possible.
- Test degraded and failure states, not just the happy path.
- Assert against observable behavior.
- Prefer at least one test that crosses the backend/frontend boundary when both layers changed.
A new dependency is allowed only if the milestone explicitly includes:
- package/crate name
- why existing dependencies are insufficient
- security and maintenance rationale
- build/runtime cost rationale
- tests covering the new integration
Any schema, config, or persisted-state change requires:
- migration plan
- backward compatibility strategy
- migration tests
- rollback strategy if relevant
- documentation updates
Each milestone must state one of the following:
No refactor permitted beyond direct implementationMinimal local refactor permitted in listed files onlyTargeted refactor permitted for [specific reason]
Copy this table into each milestone section and fill it in during execution.
| Step | Command / Check | Expected Result | Actual Result | Pass/Fail | Notes |
|---|---|---|---|---|---|
| Baseline tests | [command] |
all pre-existing tests green | |||
| BDD tests created | [files] |
compile or fail for expected reason | |||
| E2E stubs created | [files] |
compile or fail for expected reason | |||
| Implementation | [summary] |
contract satisfied | |||
| Full tests | [command] |
green | |||
| E2E runtime | [command] |
green | |||
| Build/boot | [command] |
boots cleanly | |||
| Smoke tests | [steps] |
all checked | |||
| Test artifact cleanup | git status |
no untracked test artifacts | |||
| .gitignore review | review .gitignore |
patterns current, no stale entries | |||
| Compatibility checks | [checks] |
no regressions |
Before marking a milestone done, answer every question.
- Did I change only allowed files?
- Did I avoid unrelated refactors?
- Did I preserve all listed public interfaces and compatibility requirements?
- Did I add tests for failure modes, not just happy paths?
- Did I remove temporary debug code, mocks, placeholders, and commented-out dead code?
- Did I update documentation to match the implementation?
- Is every assumption either verified or explicitly documented as unresolved?
- Do all tests clean up their output artifacts? Does
git statusshow a clean working tree? - Is
.gitignoreup to date with any new generated files or build outputs? - Is the milestone truly done according to its Definition of Done?
If any answer is "no", the milestone is not complete.
Path: docs/slo/lessons/<prefix>-m<N>.md
# Lessons Learned — <prefix> Milestone <N>
## What changed
- [summary]
## Design decisions and why
- [decision] — [reason]
## Mistakes made
- [mistake]
## Root causes
- [root cause]
## What was harder than expected
- [note]
## Naming conventions established
- [types, files, tests, events, commands]
## Test patterns that worked well
- [pattern]
## Missing tests that should exist now
- [test]
## Rules for the next milestone
- [rule]
## Template improvements suggested
- [improvement]Path: docs/slo/completion/<prefix>-m<N>.md
# Completion Summary — <prefix> Milestone <N>
## Goal completed
- [what capability now exists]
## Files changed
- [file]
- [file]
## Tests added
- [test file]
- [test file]
## Runtime validations added
- [e2e file]
## Compatibility checks performed
- [check]
## Documentation updated
- [doc and section]
## .gitignore changes
- [patterns added or removed]
## Test artifact cleanup verified
- [confirmation that git status is clean after test run]
## Deferred follow-ups
- [follow-up]
## Known non-blocking limitations
- [limitation]Goal: [One-sentence description of what this milestone accomplishes. What capability exists at the end that did not exist before?]
Context: [2–4 sentences describing the current state relevant to this milestone. Reference specific files, comments, interfaces, and why this change is needed.]
Important design rule: [One key design decision that must guide implementation.]
Refactor budget: [No refactor permitted beyond direct implementation | Minimal local refactor permitted in listed files only | Targeted refactor permitted for ...]
| Field | Value |
|---|---|
| Inputs | [user input, command input, event input, state input] |
| Outputs | [UI state, return values, persisted state, events] |
| Interfaces touched | [commands, APIs, routes, events, structs, files] |
| Files allowed to change | [explicit list] |
| Files to read before changing anything | [explicit list] |
| New files allowed | [explicit list or none] |
| New dependencies allowed | [explicit list or none] |
| Migration allowed | [yes or no] |
| Compatibility commitments | [what must still work] |
| Forbidden shortcuts | [mocks in prod, TODOs, silent fallbacks, broad refactor, etc.] |
- [Explicit non-goal]
- [Explicit non-goal]
- [Explicit non-goal]
- Complete the Global Entry Rules.
- Read
docs/slo/lessons/<prefix>-m<N-1>.mdand apply relevant corrections. - Read the allowed files before editing.
- Copy the Evidence Log template into this milestone section or working notes.
- Re-state the milestone constraints before coding.
| File | Planned Change |
|---|---|
[existing file path] |
[summary of change] |
[new file path if allowed] |
NEW: [what this file does] |
.gitignore |
Add patterns for any new generated files, build outputs, or test artifacts introduced in this milestone |
- Write BDD test stubs first for all scenarios below.
- Write E2E runtime validation stubs first for all tests below.
- Implement the smallest safe change that satisfies the contract.
- Make all BDD tests pass.
- Run the full test suite.
- Run E2E runtime validation.
- Verify test artifact cleanup: Run
git statusand confirm no untracked test output remains. - Update .gitignore: Add patterns for any new generated files or build outputs. Remove stale patterns.
- Run smoke tests.
- Complete the Self-Review Gate.
Feature: [feature name]
| Scenario | Category | Given | When | Then |
|---|---|---|---|---|
| [Scenario name] | happy path | [Precondition] | [Action] | [Expected outcome] |
| [Scenario name] | invalid input | [Precondition] | [Action] | [Expected outcome] |
| [Scenario name] | empty state | [Precondition] | [Action] | [Expected outcome] |
| [Scenario name] | partial failure | [Precondition] | [Action] | [Expected outcome] |
Add more rows as needed. If a category does not apply, state why under Notes.
- [Existing test suite or feature that must still pass]
- [Specific edge case to verify]
- [Backward compatibility check]
- [Persistence/config/state compatibility check if relevant]
- [Public API/command still behaves the same]
- [Existing route/page still renders correctly]
- [Persisted state remains readable]
- [Existing tests for related features still pass]
File: [backend E2E test file path]
| E2E Test | What It Proves | Pass Criteria |
|---|---|---|
[test_function_name] |
[Runtime behavior validated] | [Specific assertion criteria] |
[test_function_name] |
[Runtime behavior validated] | [Specific assertion criteria] |
File: [frontend E2E test file path]
| E2E Test | What It Proves | Pass Criteria |
|---|---|---|
[test name] |
[Runtime behavior validated] | [Specific assertion criteria] |
[test name] |
[Runtime behavior validated] | [Specific assertion criteria] |
- [Manual verification step — what to do and what to observe]
- [Manual verification step]
-
[test command]passes - App launches without errors
-
git statusshows no untracked test artifacts -
.gitignorecovers all new generated files and build outputs
| Step | Command / Check | Expected Result | Actual Result | Pass/Fail | Notes |
|---|---|---|---|---|---|
| Baseline tests | [command] |
all green | |||
| BDD tests created | [files] |
fail for expected reason | |||
| E2E stubs created | [files] |
fail for expected reason | |||
| Implementation | [summary] |
contract satisfied | |||
| Full tests | [command] |
green | |||
| E2E runtime | [command] |
green | |||
| Build/boot | [command] |
boots cleanly | |||
| Smoke tests | [steps] |
all checked | |||
| Test artifact cleanup | git status |
no untracked test artifacts | |||
| .gitignore review | review .gitignore |
patterns current, no stale entries | |||
| Compatibility checks | [checks] |
no regressions |
The milestone is done only when all of the following are true:
- all listed BDD scenarios pass
- all listed E2E runtime validations pass
- full existing test suite remains green
- smoke tests are checked off
- compatibility checklist is complete
- no forbidden shortcuts remain in production code
- all tests clean up their output artifacts —
git statusis clean .gitignoreis up to date with any new generated files or build outputs- docs are updated to match implementation
- lessons file is written
- completion summary is written
- Milestone Tracker is updated
Complete the Global Exit Rules above. Key documentation updates:
- ARCHITECTURE.md: [What to document]
- README.md: [What to update]
- Other docs: [What to update]
- [Why certain coverage categories do not apply]
- [Any explicit deferred work for future milestone]
Track which documents need updating per milestone.
| Milestone | ARCHITECTURE.md Update | README.md Update | .gitignore Update | Other Docs |
|---|---|---|---|---|
| 1 | [Section to add/update] | [Section to add/update] | [Patterns to add/remove] | [Section/file] |
| 2 | [Section to add/update] | [Section to add/update] | [Patterns to add/remove] | [Section/file] |
| 3 | [Section to add/update] | [Section to add/update] | [Patterns to add/remove] | [Section/file] |
Use this before writing production code:
Restate the milestone goal, allowed files, forbidden changes, compatibility requirements, tests that must be written first, and the exact Definition of Done. Then list the smallest implementation approach that satisfies the contract without widening scope.