Purpose: [One-sentence description of what this runbook accomplishes end-to-end.] Audience: AI coding agents first, humans second. The template is designed to reduce ambiguity, suppress scope drift, and force the same code-quality discipline from any capable agent. Core philosophy: Prefer automated guardrails over developer intention. Prefer direct inspection over guessing. Prefer executable assumptions over comments. Prefer bounded design over silent growth. Prefer evidence over claims. How to use: Work through milestones sequentially. Before each milestone, complete the Global Entry Protocol. After each, complete the Global Exit Protocol. Never skip ahead. Never silently widen scope. Treat this document as an execution contract, not as guidance that can be loosely interpreted. Prerequisite reading: ARCHITECTURE.md, README.md, docs/LOOPS-ENGINEERING.md, docs/LOOPS-BUSINESS.md, [relevant design docs].
What's new in v4 vs v3: explicit Carmack-style reliability rules (debugger-first inspection, mandatory static analysis, assertion-driven invariants, bounded resource design, "make invalid states unrepresentable"); extended Contract Block with resource bounds + invariants + debugger expectation + static-analysis gates; richer Lessons / Completion / Self-Review templates capturing assumptions, invariants, and resource-bound evidence. v3's Carry-forward from prior retros section is preserved verbatim.
- Fill out Runbook Metadata, Architecture, and Milestone Plan before implementation starts.
- Work milestones sequentially.
- Before each milestone, complete the Global Entry Protocol.
- During implementation, follow Section 4 (Carmack-Style Development Best Practices) and the milestone Contract Block literally.
- After each milestone, complete the Global Exit Protocol and fill the Evidence Log.
- Do not mark a milestone done until the Definition of Done is objectively satisfied.
| Field | Value |
|---|---|
| Runbook ID | [short-id] |
| Project name | [project] |
| Primary stack | [e.g., Rust + Tauri + React + TypeScript] |
| Primary package/app names | [package names] |
| Prefix for tests and lesson files | [prefix] |
| Default unit test command | [command] |
| Default integration/BDD test command | [command] |
| Default E2E/runtime validation command | [command] |
| Default build/boot command | [command] |
| Default formatter command | [command] |
| Default static analysis / lint command | [command] |
| Default dependency / security audit command | [command] |
| Default debugger or state-inspection tool | [debugger / IDE / command] |
| Allowed new dependencies by default | none |
| Schema/config migration allowed by default | no |
| Public interfaces stable by default | yes |
[API / command / event / route / public type / state file / config key][API / command / event / route / public type / state file / config key]
This is the single source of truth for progress. Update as each milestone completes.
| # | Milestone | Status | Started | Completed | Lessons File | Completion Summary |
|---|---|---|---|---|---|---|
| 1 | [Milestone title] |
not_started |
||||
| 2 | [Milestone title] |
not_started |
||||
| 3 | [Milestone title] |
not_started |
Provide a complete architecture diagram of the proposed end state after all milestones are complete. This diagram should be understandable at a glance and serve as the north star for every milestone.
- Show all major actors, components, services, and processes.
- Show data flow direction with labeled arrows.
- Show persistence boundaries (databases, file systems, caches).
- Show trust boundaries and external integration points.
- Show API, IPC, event, queue, and file boundaries.
- Distinguish between what exists today (solid lines) and what will be built (dashed lines).
- Include a legend.
┌─────────────────────────────────────────────────────────────────────┐
│ [System Name] │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ [Actor] │───▶│ [Component] │───▶│ [Component] │ │
│ └──────────┘ └──────────────┘ └───────────────┘ │
│ │
│ Legend: ─── existing - - - new ═══ external ▶ data flow │
│ ║ trust boundary │
└─────────────────────────────────────────────────────────────────────┘
[Replace the above with the actual architecture diagram for this runbook. Use ASCII art or Mermaid syntax.]
| Component | Responsibility | Existing/New/Changed | Milestone | Key Interfaces |
|---|---|---|---|---|
[Component name] |
[What it does] |
[existing/new/changed] |
M[N] | [APIs, events, commands] |
[Component name] |
[What it does] |
[existing/new/changed] |
M[N] | [APIs, events, commands] |
| Flow | From | To | Protocol/Mechanism | Bounded? | Failure Mode | Milestone |
|---|---|---|---|---|---|---|
[Flow name] |
[Source] |
[Target] |
[IPC/HTTP/event/file] |
[yes/no] |
[behavior on failure] |
M[N] |
These rules apply to every language and every milestone. They are how we get the same code-quality discipline from every capable agent.
Logging is useful for production observability, but it is not a substitute for interactive debugging and state inspection.
| Requirement | Project-Specific Tool/Command | Evidence Required |
|---|---|---|
| Interactive debugger available | [debugger / IDE / command] |
[how verified] |
| Breakpoints can be set in changed code | [how] |
[note if needed] |
| Runtime state can be inspected | [how] |
[what was inspected] |
| Tests can be debugged | [how] |
[test/debug command] |
Agent rules:
- If a failure is not explained by compiler, test assertion, or stack trace, use a debugger or equivalent state-inspection tool before making speculative changes.
- Do not add permanent print/debug statements to production paths.
- If logging is added, it must be structured, intentional, and useful in production.
- Remove temporary debug output before completing the milestone.
Every milestone must run the project's static-analysis and lint tools. Treat tool findings as design feedback, not personal criticism.
| Check | Command | Required Level | Notes |
|---|---|---|---|
| Formatter | [formatter command] |
must pass | No style-only churn outside changed files unless allowed |
| Type check / compile check | [typecheck command] |
must pass | Must include all changed targets |
| Static analyzer / linter | [lint command] |
must pass | Warnings fail CI unless explicitly waived |
| Security/dependency audit | [audit command] |
must pass or documented exception | Required if dependency graph changes |
Waiver rule: a static-analysis waiver must be local, minimal, and justified in code or the Evidence Log. Global disables are forbidden unless explicitly approved in the milestone contract.
Assertions document and enforce assumptions. Use them to catch incorrect mental models early.
Use assertions for:
- internal invariants
- unreachable states that should be impossible by design
- size and capacity assumptions
- ordering assumptions
- preconditions inside internal APIs
- postconditions after transformations
Do not use assertions for:
- normal user-input validation
- expected network, filesystem, or external service failures
- recoverable business-rule failures
Assertion policy:
| Assertion Type | Use For | Production Behavior |
|---|---|---|
| Development-only assertion | Expensive or diagnostic invariant checks | Disabled or lower-cost in production if the language supports it |
| Runtime assertion | Invariants that must never be violated | Active in production |
| Contract validation | Public boundary checks | Return structured errors, not crashes |
Unbounded collections, queues, retries, caches, recursion, and concurrency hide architectural failures until production. Every milestone must identify newly introduced or modified resource growth.
| Resource | Expected Bound | Hard Limit | Behavior At Limit | Evidence/Test |
|---|---|---|---|---|
[queue/cache/list/etc.] |
[N] |
[N] |
[reject/backpressure/error] |
[test] |
Rules:
- If a maximum is known, encode it.
- If a maximum is not known, document why and add observability around growth.
- Dynamic collections must have explicit expected-size assumptions in tests or assertions.
- Retries must be bounded.
- Queues must have backpressure, rejection, or shedding behavior.
- Caches must have eviction or explicit lifecycle rules.
Use the language's strongest available mechanisms to encode domain constraints.
| Concept | Prefer | Avoid |
|---|---|---|
| Domain IDs | dedicated ID type / value object | raw string/int everywhere |
| State machines | enum / sum type / tagged union / classes with restricted transitions | loose string states |
| Optional data | explicit optional / maybe type | sentinel values |
| Validated strings | constrained constructor | free-form string reuse |
| Units | unit-specific type | raw numbers without unit |
| Protocol messages | schema-validated typed messages | ad hoc maps/dictionaries |
Agent rule: before implementing a feature, identify at least one invalid state the design should prevent. If none exists, state why.
Compatibility checks are part of correctness. Must verify:
- public APIs
- CLI / commands / events / routes
- persisted state and migration behavior
- configuration keys and defaults
- user-facing behavior
- integration contracts
A milestone may break compatibility only if the contract block explicitly says so and includes migration, documentation, and tests.
Optimize for the minimal safe change.
- Change only allowed files.
- Prefer extending existing patterns over inventing new abstractions.
- Do not rewrite subsystems unless the contract explicitly permits it.
- Do not rename public symbols for style reasons.
- Do not combine refactor and feature work unless the refactor is required and listed.
The following are forbidden in production paths unless explicitly permitted:
- swallowed exceptions / errors
- silent fallbacks that hide broken behavior
- default values that mask corruption
- fake implementations after tests pass
- temporary mocks in real code paths
- TODO / placeholder logic
- commented-out dead code
- hard-coded secrets or unsafe defaults
All failure modes must be visible through one or more of:
- returned structured error
- user-visible error state
- structured log / event / metric
- retry with bounded policy
- explicit degraded-mode behavior
Fill this section before implementation when the system includes concurrency, distributed state, resource ownership, ordering guarantees, retries, queues, idempotency, persistence recovery, or irreversible actions.
For simple CRUD with no meaningful concurrency or failure-recovery risk, mark N/A and explain why. TLA+ is one option for formal verification (/slo-tla is the SLO skill that drives it); state-machine modeling, property-based testing, and contract tests are equally valid substitutes for simpler systems.
[One paragraph: correctness-focused goal, not implementation detail.]
| Component | Protocol Role | Key State (durable / volatile) | Visible Actions |
|---|---|---|---|
[component] |
[role] |
[state] |
[actions] |
The minimum set of state variables needed to capture correctness. Flag anything likely to cause state explosion.
| Variable | Abstract Type | Why Needed | Bound | Explosion Risk |
|---|---|---|---|---|
[var] |
[type] |
[property] |
[N] |
[low/medium/high] |
| Action | Preconditions | State Updates | Failure / Interleaving Notes |
|---|---|---|---|
[action] |
[preconditions] |
[updates] |
[notes] |
- No duplicate ownership:
[specific invariant] - No lost accepted work:
[specific invariant] - No invalid persisted state:
[specific invariant] - Bound never exceeded silently:
[specific invariant]
- Eventual completion or visible rejection:
[fairness assumption] - Bounded retry exhaustion:
[fairness assumption]
| Simplification | Why It Still Catches Relevant Bugs |
|---|---|
[simplification] |
[reason] |
Additive to the design-level modeling above (it does not replace it). Fill this sub-block when the stack is Rust and /slo-architect set kani_required: true; otherwise mark N/A — <reason> (e.g. N/A — no Rust kernels). /slo-kani is the SLO skill that drives it. Where the design also has a TLA+ spec, each Kani obligation should correspond to a TLA+ atomic action (refinement pairing: action → Rust fn → Kani harness) — TLA+ proves the protocol, Kani proves the kernel; Kani never claims concurrency/interleavings.
| # | Target fn | Property | Bound / assumptions | Expected pre-fix | Expected post-fix |
|---|---|---|---|---|---|
K1 |
[fn] |
[no panic / no UB / invariant / postcondition] |
[#[kani::unwind(N)], sizes, kani::assume(...)] |
[FAILED — why] |
SUCCESSFUL |
A green Kani run is proved within the stated harness, assumptions, and bounds — never "whole system proved." Each obligation's pre-fix variant must fail first (anti-vacuity). Bounds, assumptions, stubs, and contracts are recorded in docs/slo/verify/<slug>-kani.md.
Optional section. Legacy runbooks without this section remain valid (same backward-compat posture as §10 Carry-forward). But
/slo-planREQUIRES it for any value-bearing feature (one that introduces or changes user-facing capability; internal refactor / docs-only / test-only work is exempt). It is the planning-time half of the feature-performance loop: it makes telemetry a contracted deliverable, not a best-effort afterthought, so post-release questions ("did this work? what do we change next?") are answerable.
Fill this for value-bearing features. Carry the per-feature inputs forward from the idea doc's ## Success thesis (/slo-ideate) and the /slo-product metrics feature measurement spec (feature_measurement_spec: true).
| Field | Meaning |
|---|---|
| Value hypothesis | The change in user behaviour / outcome we expect this feature to cause |
| Review windows | When we read results — e.g. 24h / 7d / 28d (or equivalent for the cadence) |
| Primary leading metric | The first behavioural signal the feature created value, observable within the first window |
| Primary lagging metric | The durable user / business outcome that should eventually move |
| Guardrails | What must NOT regress (core conversion, error rate, latency, support load) + each guardrail's owner |
| Telemetry deliverables | Named behavioural events + runtime/reliability metrics + saved queries/dashboards the milestones must ship; failure paths emit a visible signal |
| Rollout plan | Flags, cohorts, staged release |
| Diagnosis plan | For each likely drop-off, the question to ask (technical / pricing / confusing UX / weak demand) and the evidence that distinguishes them |
| Experiment plan | The first iteration to run if the baseline misses |
| Privacy controls | Pseudonymised event identifiers + masking + data minimisation by default; consent for non-essential cookies/tracking (route to /slo-legal triage for PECR); DPIA trigger for behaviour/geolocation tracking |
Each value-bearing milestone then names its slice of these telemetry deliverables in its Contract Block Measurement deliverables row, and /slo-verify's measurement pass checks they fire, are masked/pseudonymised, and emit on failure paths. /slo-retro records actual-vs-thesis movement.
For non-value-bearing runbooks (pure refactor / docs / tooling), mark this section N/A — not a value-bearing feature, see <reason>.
Optional section. Legacy runbooks without this section remain valid (same backward-compat posture as §5A Measurement Contract and §10 Carry-forward). But
/slo-planREQUIRES it for any value-bearing OR security-relevant milestone — security-relevant means the work touches identity, secrets, PII, payment, cloud accounts, AI agents, public/network boundaries, CI/CD, or infrastructure. It is the planning-time half of the Secure Value Loop (see docs/SECURE-VALUE-LOOP.md): it makes operator readiness, security tests, and finding-disposition contracted deliverables rather than best-effort afterthoughts. It reuses the shipped security machinery (/slo-architectthreat model,/slo-verifyPass 4/5,/slo-retrolanes) — it does not rebuild it.
Fill this for value-bearing/security-relevant milestones. For pure refactor / docs / tooling with no security-relevant surface, mark this section N/A — not value-bearing or security-relevant, see <reason>.
| Field | Value |
|---|---|
| Value hypothesis | [what user/business/security outcome changes] |
| Smallest valuable wedge | [smallest slice that proves value without becoming useless] |
| User-visible proof of value | [how a user experiences the value] |
| Security-visible proof of safety | [how safety is demonstrable] |
| What would make this wedge too small to matter? | [the decision rule] |
The Operator Readiness Gate is enforced by
/slo-execute's Global Entry from the M3 release of the Secure Value Loop onward. Ifsafe_to_continue_without_blockersisfalse, the milestone MUST NOT start (fail closed).validationmust be an executable proof, not a self-asserted checkbox.
| Prerequisite | Owner (human | agent | upstream) |
Needed by | Validation (executable proof) | Status (ready | partially_ready | blocked) |
|---|---|---|---|---|
| M[N] |
safe_to_continue_without_blockers: true | false
Populated from the existing
/slo-architectthreat model (docs/slo/design/<slug>-threat-model.md+.slo.json) — cite it, do not re-derive. Cite abuse cases by their frozentm-<slug>-abuse-NIDs.
| Area | Summary |
|---|---|
| Assets | |
| Actors | |
| Trust boundaries | |
| Entry points | |
| Abuse cases | [tm-<slug>-abuse-N: <description>] |
| Required controls | |
| Residual risks | [owner + review-by date] |
Reference the security-test Bundle(s) the milestone's surface triggers (Bundle A docs / B app / C backend-API / D cloud-IaC / E AI-LLM / F mobile — see docs/SECURE-VALUE-LOOP.md §6). Each row resolves to
pass | not_applicable | waived_with_reasonat/slo-verifytime — never blank. SBOM/provenance is conditional:not_applicableunless the milestone builds a released artifact.
| Test | Required? | Command/tool | Evidence path | Waiver if not applicable |
|---|---|---|---|---|
| SAST | ||||
| SCA/dependency audit | ||||
| Secrets scan | ||||
| IaC scan | ||||
| Container/image scan | ||||
| DAST/API security | ||||
| Authn/authz negative tests | ||||
| Abuse-case tests | ||||
| Privacy/telemetry tests | ||||
| Fuzz/property/formal tests |
Every finding discovered during execution gets exactly one disposition and may never end as merely "observed". The five dispositions route to existing
/slo-retrolanes — they introduce no new lane verb (see docs/SECURE-VALUE-LOOP.md §4)./slo-executerefuses to mark the milestonedonewhile any row is undisposed. Dispositions:fix_now | file_github_issue | operator_action | upstream_feedback | accepted_risk.
| ID | Finding | Severity | Disposition | Owner | Evidence/link | Due |
|---|---|---|---|---|---|---|
| DW-001 |
Optional section. Legacy runbooks without this section remain valid (same backward-compat posture as §5A Measurement Contract and §5B Secure Value Contract). But
/slo-planREQUIRES it for any value-bearing milestone (one that introduces or changes user-facing capability; internal refactor / docs-only / test-only work is exempt). It is the planning-time half of Outcome First Engineering: it makes the promised user outcome a contracted, testable deliverable — the primary Definition of Done — not an afterthought. Code completion alone is insufficient: a milestone is done only when the promised user outcome exists AND existing important outcomes still exist.
Fill this for every value-bearing milestone. For pure refactor / docs / tooling with no user-facing outcome, mark this section N/A — not value-bearing, see <reason>.
| Field | Meaning |
|---|---|
| Outcome | The promised user value in one sentence — the outcome this milestone makes exist. |
| Success Criteria | Bulleted, each independently observable (e.g. discovered / classified / visible in UI / remediation shown / appears in history / survives restart). |
| Front-to-End Validation | The proof path, authored per layer — each step is applicable | not_applicable(reason): seed test data → run → verify backend result → verify persisted record → verify API/IPC response → verify UI display. At least one real cross-layer assertion is required (e.g. backend→persisted) even when the UI layer is not_applicable — a single-layer or mock-only assertion does NOT satisfy this row. |
| Regression Requirements | Which existing critical capabilities must still work (ties to the §17 Core Capability Regression Matrix). |
User-provided strings rendered into any generated security/threat artifact are wrapped in a ~~~text fence (the load-bearing /slo-architect rule). Authored outcome text is descriptive Markdown only and never selects control fields (oc-/cuj-/tm- ids, resolution verbs, or gate outcomes).
Each value-bearing milestone names its Outcome Scenarios (oc-<slug>-N) and Critical User Journeys (cuj-<slug>-N) in §17, and /slo-verify's Outcome Validation pass runs them front-to-end at runtime as the highest-authority gate (§6.12, §11.8).
These rules apply to every milestone without exception.
- Only change files listed in the current milestone unless a listed step explicitly requires one additional file.
- Do not refactor unrelated code.
- Do not rename public APIs, commands, routes, events, persisted-state shapes, or config keys unless the milestone explicitly says so.
- Do not introduce a new dependency unless the milestone explicitly allows it.
- Do not change database schema, file formats, or migration behavior unless the milestone explicitly includes migration work and migration tests.
- Write BDD tests before production code.
- Write E2E runtime validation stubs before production code.
- Confirm new tests fail for the right reason before implementing.
- A milestone is not done when code compiles. It is done when the declared contract is satisfied and evidence is recorded.
- Every milestone that introduces or modifies internal invariants, ordering assumptions, preconditions, or postconditions must encode them as assertions per §4.3.
- Every milestone must list the invariants/assertions added or strengthened in its Contract Block and in the lessons file.
- Every milestone that introduces or modifies a queue, cache, list, retry policy, recursive call, or concurrent-task pool must declare expected bound, hard limit, and behavior-at-limit per §4.4.
- Unbounded growth is allowed only if explicitly justified in the Contract Block and observability is added in the same milestone.
- Every milestone must run formatter, typecheck/compile check, static analyzer/linter, and (if dependencies changed) security/dependency audit before close-out.
- Waivers must be local, minimal, and justified in code or the Evidence Log.
- If a failure is not fully explained by compiler, test assertion, or stack trace, use the project's debugger (Section 4.1) before making speculative changes.
- Document non-obvious state inspections in the lessons file under "Debugging/inspection notes".
The following are not allowed unless explicitly permitted in the milestone:
- TODO or placeholder logic in production code
- silent fallbacks that hide errors
- swallowed errors without structured logging or user-visible handling
- fake implementations left in place after tests pass
- commented-out dead code
- temporary mocks in production paths
- hard-coded secrets, test keys, or unsafe defaults
Every milestone must explicitly verify that previously working user flows, commands, routes, persisted state, and public interfaces still work unless the milestone explicitly replaces them.
- Prefer narrow, local modifications over broad rewrites.
- Prefer extending existing patterns over inventing new abstractions.
- Prefer deleting complexity over adding new layers.
- If a refactor is required, keep it minimal and directly justified by the milestone goal.
All meaningful checks must be recorded in the milestone Evidence Log:
- command run
- relevant file or test
- expected result
- actual result
- pass/fail
- notes
Never claim a command passed unless it ran or the limitation is explicitly stated.
- If a milestone introduces new build outputs, generated files, test fixtures, scratch directories, or tool-specific caches, add matching patterns to
.gitignorebefore committing. - Review
.gitignoreat the end of every milestone for staleness — remove patterns that no longer apply. - Never commit test output data, temporary fixtures, scratch files, or generated artifacts to source control.
- Every test that creates files on disk must clean up after itself (
tempdir,tempfile,afterEach, equivalent). Tests must not leave residual data in the working tree. - Record the
.gitignorereview in the Evidence Log.
Code completion alone is insufficient. A value-bearing milestone is done only when the promised user outcome exists (its Outcome Scenarios + Critical User Journeys pass front-to-end) AND existing important outcomes still exist (the §17 Core Capability Regression Matrix has no required-row failures). A failing Outcome Scenario, Critical User Journey, or required Regression-Matrix row blocks milestone completion regardless of how many unit / integration tests pass. This is the highest-authority gate (§11.8), enforced at runtime by /slo-verify's Outcome Validation pass and at close-out by /slo-retro's refusal gate. Non-value-bearing milestones (pure refactor / docs / tooling) are exempt with N/A — not value-bearing.
Do this before every milestone.
- Read the lessons file from the previous milestone, if one exists. Apply any design corrections, naming rules, test-strategy improvements, and failure-mode coverage it calls for before writing new code.
- (v4 carry-forward) If
/slo-executeis the driver, run pre-flight Step 1.5 — read open prior-retro issues filtered by this runbook's prefix and surface them as scope candidates with a suggested lane (micro | milestone | fresh-runbook). Carry-forward never auto-extends the allow-list. - Read the current milestone fully: goal, context, contract block, out-of-scope, file list, BDD scenarios, regression tests, E2E tests, smoke tests, definition of done.
- Run the full existing test suite and confirm it passes. Record the baseline in the Evidence Log.
If any tests fail before you start, stop and fix the baseline first. Do not begin a milestone on a red baseline.
[unit test command] [integration/BDD test command] [E2E test command] - Read the files listed in "Files Allowed To Change" and "Files To Read Before Changing Anything". Understand their current shape before editing.
- Update the Milestone Tracker: set the current milestone status to
in_progressand record the Started date. - Create BDD test files first.
- Create E2E runtime validation test stubs first.
- Copy the milestone's Evidence Log template into working notes and begin filling it as work happens.
- Re-state the milestone constraints in your own words before coding:
- goal
- allowed files
- forbidden changes
- compatibility requirements
- dependency / migration rules
- resource bounds (per §4.4)
- invariants/assertions required (per §4.3)
- static-analysis gates (per §4.2)
- tests that must pass
- Definition of Done
Do this after every milestone.
- Run formatter.
- Run typecheck / build check.
- Run static analyzer / linter (warnings fail unless waived per §4.2).
- If the dependency graph changed, run the security/dependency audit.
- Run the full test suite. Every pre-existing test must still pass. Every new BDD scenario must pass.
- Run the milestone's E2E runtime validation tests.
- Verify the app builds and boots to a usable state.
- Run the smoke tests listed in the milestone. Check off each item in the runbook.
- Verify backward compatibility for all items listed in the milestone Compatibility Checklist.
- Verify resource bounds (§4.4) and assertion/invariant additions (§4.3) are encoded as documented.
- Complete the Self-Review Gate (Section 14).
- Remove temporary debug code, mocks, placeholders, and commented-out dead code.
- Clean up test artifacts: run
git statusand confirm no untracked test artifacts. - Review .gitignore: ensure new outputs have patterns; remove stale entries.
- Update ARCHITECTURE.md following the Documentation Update Table.
- Update README.md if user-facing capabilities changed.
- Write a lessons-learned file at
docs/slo/lessons/<prefix>-m<N>.md. - Write a completion summary at
docs/slo/completion/<prefix>-m<N>.md. - Update the Milestone Tracker: set status to
done, record Completed date, fill in the lessons and completion summary paths. - (v4 lessons loop) If
/slo-retrois the driver, run the issue-filing flow per skills/slo-retro/references/issue-filing-discipline.md. Always write the lessons file first; issue filing is strictly additive. - Re-read the next milestone with fresh eyes and record any assumption changes in the lessons file.
[Describe the current state of the system. What exists today? What works? List major subsystems and their capabilities. Be specific — reference file paths, module names, major entry points, and concrete data where relevant.]
[List the specific gaps this runbook addresses. Number each gap and describe it concretely — reference specific code, UI behavior, test gaps, and user impact. Avoid vague generalities.]
- [Gap title]: [Description referencing concrete code and behavior.]
- [Gap title]: [Description.]
[ASCII diagram or description of the target end state after all milestones are complete.
Show major components, data flow, boundaries, persistence, and integration points.]
These are system-wide rules the AI agent must follow when making implementation decisions.
- [Principle name]: [Explanation.]
- [Principle name]: [Explanation.]
- [Principle name]: [Explanation.]
Explicitly list existing subsystems, patterns, and code that must not be changed or broken.
- [Subsystem / module / pattern to preserve]
- [Subsystem / module / pattern to preserve]
List the specific files, modules, or behaviors that will be modified across milestones.
- [File or module] — [summary of change]
- [File or module] — [summary of change]
These are forbidden unless explicitly overridden inside a milestone.
- No unrelated refactors
- No new dependencies
- No schema migrations
- No config key renames
- No public API/event/route renames
- No production placeholders
- No silent error swallowing
- No secrets in source control
- No test output data committed to source control
- No unbounded resource growth without justification (§4.4)
- No new public boundary without input validation and structured error returns (§4.8)
Optional section. Existing runbooks without this section remain valid;
/slo-executeStep 1.5 falls back to a livegh issue list --label retro-derivedquery. Authors of new runbooks SHOULD include this section once/slo-retrofiles at least one retro-derived issue against this runbook's prefix.What this section is: a table of open prior-retro issues (filed by
/slo-retrofor this runbook's prefix) that should be considered as scope candidates at each milestone start. Each row has a suggested lane so small follow-ups stay small and large follow-ups do not silently widen scope.What this section is NOT: an auto-extension of any milestone's allow-list. The user decides each milestone's bounds. Carry-forward is informational input to that decision, not a substitute for it.
| Issue | Title | Suggested lane | Suggested milestone | Status |
|---|---|---|---|---|
| (e.g., #42) | (one-line summary) | micro | milestone | fresh-runbook |
(M3 | M4 | next runbook) | (open | closed-via-PR-pending | transferred) |
micro— safe, bounded follow-up. Can be folded into the current or immediate next milestone without widening scope (typical: doc polish, small test gap, naming-convention drift).milestone— real milestone-sized work. Warrants its own milestone in this runbook or the next; do not bolt onto an unrelated milestone.fresh-runbook— material scope or risk shift. Do NOT widen the current runbook silently; spin a separate runbook (typical: new architecture work, regulated-domain question, multi-week effort).
/slo-execute M<N> pre-flight Step 1.5 prefers rows from this section over a live gh query when the rows are fresh. Rows with status: closed-via-PR-pending or transferred surface with annotation; the user decides whether to track. Inline output caps at the top 3 items.
/slo-resume reads the milestone tracker plus this section to emit one next action with a lane. Top-3 inline cap; remainder summarized as ... N more.
Runbooks without this section continue to work; /slo-execute and /slo-resume fall back to the live gh query and the tracker-only orientation respectively.
Every milestone follows these rules.
For each milestone:
- Read the BDD acceptance table.
- Create the test file(s) first.
- Confirm the tests fail for the expected reason.
- Write production code to make the tests pass.
- Re-run tests after any refactor.
Every milestone must explicitly cover the categories that apply:
- happy path
- invalid input
- empty / first-run state
- dependency failure / partial failure
- retry or rollback behavior
- concurrency / race behavior
- resource-limit behavior (§4.4)
- assertion/invariant violation (§4.3)
- persistence / restore behavior
- backward compatibility behavior
- abuse case (security-relevant milestones — see threat model)
If a category does not apply, state why.
Scenario: [name]
Given [precondition]
When [action]
Then [observable outcome]
And [failure/resource/compatibility expectation if relevant]
In code:
#[test]
fn descriptive_test_name() {
// Given: [precondition]
// When: [action]
// Then: [expected outcome]
}it("descriptive test name", () => {
// Given: [precondition]
// When: [action]
// Then: [expected outcome]
});| Layer | Convention | Location |
|---|---|---|
| Backend unit tests | #[cfg(test)] mod tests inside the source file |
Same file as production code |
| Backend integration/BDD tests | tests/<prefix>_<feature>.rs |
src-tauri/tests/ (or equivalent) |
| Frontend unit tests | <module>.test.ts |
Co-located with source file |
| Frontend page tests | <Page>.test.tsx |
Co-located with component |
| Scenario / e2e tests | tests/scenarios/<prefix>_scenario_<name>.rs |
src-tauri/tests/scenarios/ (or equivalent) |
| E2E runtime validation (backend) | tests/e2e_<prefix>_m<N>.rs |
src-tauri/tests/ (or equivalent) |
| E2E runtime validation (frontend) | e2e/<feature>.e2e.test.tsx |
src/e2e/ |
| Outcome (front-to-end, highest authority) | tests/outcome/<prefix>_outcome_<journey>.<ext> (backend) / outcome/<journey>.outcome.test.tsx (frontend, Playwright-driven) |
tests/outcome/ / src/outcome/ |
Every test that creates files, directories, or temporary data on disk must follow these rules:
- Use temporary directories: prefer
tempdir(),tempfile::TempDir,tmpfrom the test framework, or OS-provided temp locations. Never write test output into the source tree. - Clean up on completion and failure: use RAII (
Drop),afterEach/afterAllhooks, ordeferstatements to ensure cleanup runs even when tests fail. - No residual state: after the full test suite runs,
git statusmust show no untracked files from test execution. - Dedicated output directories: if a test must write to a project-relative path, that directory must be in
.gitignoreand tests must clean it between runs. - CI parity: cleanup behavior must be identical locally and in CI.
Every milestone must include E2E tests that go beyond compilation and verify the system works correctly at runtime. These tests prove:
- the app boots without errors
- runtime contracts are met across IPC/API boundaries
- BDD scenarios work at runtime, not just in isolation
- there are no runtime panics, unhandled rejections, or silent failures
- degraded states behave safely and visibly
- resource bounds (§4.4) hold under stress paths exercised in tests
- Test runtime behavior, not just types.
- Test the full stack where possible.
- Test degraded and failure states, not just the happy path.
- Assert against observable behavior.
- Prefer at least one test that crosses the backend/frontend boundary when both layers changed.
- Prefer at least one test that exercises a resource-bound boundary when one was introduced or modified.
Above unit / integration / E2E sits the Outcome layer — the smallest layer but the highest authority. The pyramid is authority-inverted:
Outcome ← smallest layer, HIGHEST authority
E2E
Integration
Unit ← largest layer, base authority
Outcome tests are user-centric and cross-system: input → backend → storage → processing → UI → user outcome. They prove the promised user outcome exists and that existing critical outcomes still exist. The authority inversion (§6.12): if 1000 unit tests pass but one Outcome Scenario, Critical User Journey, or required Regression-Matrix row fails, the milestone fails. /slo-verify's Outcome Validation pass runs them front-to-end (Playwright for UI) over the highest applicable layer chain — a mock-only assertion never satisfies an Outcome test.
A new dependency is allowed only if the milestone explicitly includes:
- package/crate name
- version/range if known
- why existing code/tools are insufficient
- security and maintenance rationale
- license rationale if applicable
- build/runtime cost rationale
- tests covering the integration
- rollback/removal path if the dependency proves unsuitable
Any schema, config, or persisted-state change requires:
- migration plan
- backward compatibility strategy
- migration tests
- rollback strategy if relevant
- documentation updates
- old-version fixture or compatibility test where possible
Each milestone must state exactly one of the following:
No refactor permitted beyond direct implementationMinimal local refactor permitted in listed files onlyTargeted refactor permitted for [specific reason]
Copy this table into each milestone section and fill it in during execution.
| Step | Command / Check | Expected Result | Actual Result | Pass/Fail | Notes |
|---|---|---|---|---|---|
| Baseline tests | [command] |
all pre-existing tests green | |||
| BDD tests created | [files] |
fail for expected reason | |||
| E2E stubs created | [files] |
fail for expected reason | |||
| Implementation | [summary] |
contract satisfied | |||
| Formatter | [command] |
clean | |||
| Typecheck / build check | [command] |
clean | |||
| Static analyzer / linter | [command] |
clean (no new warnings) | |||
| Dependency audit (if deps changed) | [command] |
pass or documented exception | |||
| Full tests | [command] |
green | |||
| E2E runtime | [command] |
green | |||
| Build/boot | [command] |
boots cleanly | |||
| Smoke tests | [steps] |
all checked | |||
| Resource-bound verification | [bound + test] |
bound encoded; test exercises near-limit behavior | |||
| Invariant/assertion verification | [invariant + test] |
encoded; test triggers under fault injection if applicable | |||
| Debugger / state inspection | [what was inspected] |
hypothesis confirmed before code change | |||
| Test artifact cleanup | git status |
no untracked test artifacts | |||
| .gitignore review | review .gitignore |
patterns current, no stale entries | |||
| Compatibility checks | [checks] |
no regressions |
Before marking a milestone done, answer every question.
- Did I change only allowed files?
- Did I avoid unrelated refactors?
- Did I preserve all listed public interfaces and compatibility requirements?
- Did I add tests for failure modes, not just happy paths?
- Did I add or update assertions/invariants where assumptions matter?
- Did I bound new resource growth or document why it cannot be bounded?
- Did I run formatter, typecheck, and static analysis to a clean result (or document a local minimal waiver)?
- Did I use a debugger or state-inspection tool when failures were not explained by compiler/test/stack-trace?
- Did I remove temporary debug code, mocks, placeholders, and commented-out dead code?
- Did I update documentation to match the implementation?
- Is every assumption either verified or explicitly documented as unresolved?
- Do all tests clean up their output artifacts? Does
git statusshow a clean working tree? - Is
.gitignoreup to date with any new generated files or build outputs? - Is the milestone truly done according to its Definition of Done?
If any answer is "no", the milestone is not complete.
Path: docs/slo/lessons/<prefix>-m<N>.md
# Lessons Learned — <prefix> Milestone <N>
## What changed
- [summary]
## Design decisions and why
- [decision] — [reason]
## Assumptions verified
- [assumption] — [evidence]
## Assumptions still unresolved
- [assumption] — [risk / follow-up]
## Mistakes made
- [mistake]
## Root causes
- [root cause]
## What was harder than expected
- [note]
## Invariants/assertions added or strengthened
- [invariant]
## Resource bounds established or verified
- [bound]
## Debugging / inspection notes
- [what was inspected and what it revealed]
## Naming conventions established
- [types, files, tests, events, commands]
## Test patterns that worked well
- [pattern]
## Missing tests that should exist now
- [test]
## Rules for the next milestone
- [rule]
## Template improvements suggested
- [improvement]Path: docs/slo/completion/<prefix>-m<N>.md
# Completion Summary — <prefix> Milestone <N>
## Goal completed
- [what capability now exists]
## Files changed
- [file]
## Tests added
- [test file]
## Runtime validations added
- [e2e file]
## Static analysis and formatter evidence
- [command and result]
## Compatibility checks performed
- [check]
## Invariants/assertions added
- [invariant]
## Resource bounds added or verified
- [bound]
## Documentation updated
- [doc and section]
## .gitignore changes
- [patterns added or removed]
## Test artifact cleanup verified
- [confirmation that git status is clean after test run]
## Deferred follow-ups
- [follow-up]
## Known non-blocking limitations
- [limitation]Goal: [One-sentence description of what this milestone accomplishes. What capability exists at the end that did not exist before?]
Context: [2–4 sentences describing the current state relevant to this milestone. Reference specific files, comments, interfaces, and why this change is needed.]
Carmack-style reliability goal: [Which guardrail is strengthened — debugger visibility, static analysis, assertions, bounded resources, type/schema safety, compatibility, etc.]
Important design rule: [One key design decision that must guide implementation.]
Refactor budget: [No refactor permitted beyond direct implementation | Minimal local refactor permitted in listed files only | Targeted refactor permitted for ...]
| Field | Value |
|---|---|
| Inputs | [user input, command input, event input, state input] |
| Outputs | [UI state, return values, persisted state, events] |
| Interfaces touched | [commands, APIs, routes, events, structs, files] |
| Files allowed to change | [explicit list] |
| Files to read before changing anything | [explicit list] |
| New files allowed | [explicit list or none] |
| New dependencies allowed | [explicit list or none] |
| Migration allowed | [yes or no] |
| Compatibility commitments | [what must still work] |
| Resource bounds introduced/changed | [bounds and behavior at limit, per §4.4] |
| Invariants/assertions required | [list, per §4.3] |
| Debugger / inspection expectation | [what must be inspectable, per §4.1] |
| Static analysis gates | [formatter / typecheck / linter / audit commands, per §4.2] |
| Exemplar code to copy | [paths and code shapes to follow, or N/A — docs-only] |
| Anti-exemplar code not to copy | [paths or patterns to avoid, or N/A — no anti-exemplar identified, see <reason>] |
| Refactoring discipline | [cite skills/slo-plan/references/refactoring-discipline.md when refactor budget permits refactoring, or N/A — no refactoring performed, see <reason>] |
| AI tolerance contract | [required for AI/LLM behavior: accepted variance, deterministic boundary, eval evidence, retry / fallback, must-never outcomes, sample budget; or N/A — no AI component] |
| Forbidden shortcuts | [mocks in prod, TODOs, silent fallbacks, broad refactor, etc.] |
| Data classification (optional) | [Public / Internal / Confidential / Restricted — per project threat-model conventions] |
| Proactive controls in play (optional) | [OWASP Proactive Controls 2024 cited by name, e.g., C1 Implement Access Control, C4 Address Security from the Start, C9 Implement Security Logging and Monitoring — never a bare number (OWASP renumbered C1–C10 between 2018 and 2024)] |
| Abuse acceptance scenarios (optional) | [tm-<feature>-abuse-N: <description> — mitigation noted in BDD] |
| Measurement deliverables (required for value-bearing milestones) | [which named events / runtime metrics / saved queries this milestone ships, the guardrail owner, and the readout date — ties to the §5A Measurement Contract; or N/A — not value-bearing, see <reason>] |
| Outcome Validation deliverables (required for value-bearing milestones) | [the §5C Outcome statement + Success Criteria + per-layer Front-to-End Validation path; ties to §5C; or N/A — not value-bearing, see <reason>] |
| Critical user journeys (required for value-bearing milestones) | [the cuj-<slug>-N ids this milestone must keep working front-to-end; or N/A — not value-bearing, see <reason>] |
- [Explicit non-goal]
- [Explicit non-goal]
- Complete the Global Entry Rules (Section 7).
- Read
docs/slo/lessons/<prefix>-m<N-1>.mdand apply relevant corrections. - Read the allowed files before editing.
- Copy the Evidence Log template into this milestone section or working notes.
- Re-state the milestone constraints before coding (include resource bounds, invariants, static-analysis gates).
| File | Planned Change |
|---|---|
[existing file path] |
[summary of change] |
[new file path if allowed] |
NEW: [what this file does] |
.gitignore |
Add patterns for any new generated files, build outputs, or test artifacts |
- Write BDD test stubs first for all scenarios below.
- Write E2E runtime validation stubs first for all tests below.
- Encode declared invariants/assertions (§4.3) and resource bounds (§4.4) in tests or production code.
- Implement the smallest safe change that satisfies the contract.
- Make all BDD tests pass.
- Run formatter, typecheck, static analyzer.
- Run the full test suite.
- Run E2E runtime validation.
- Verify test artifact cleanup:
git statusconfirms no untracked test output remains. - Update .gitignore: add patterns for any new generated files; remove stale ones.
- Run smoke tests.
- Complete the Self-Review Gate.
Feature: [feature name]
| Scenario | Category | Given | When | Then |
|---|---|---|---|---|
| [Scenario name] | happy path | [Precondition] | [Action] | [Expected outcome] |
| [Scenario name] | invalid input | [Precondition] | [Action] | [Expected outcome] |
| [Scenario name] | empty state | [Precondition] | [Action] | [Expected outcome] |
| [Scenario name] | partial failure | [Precondition] | [Action] | [Expected outcome] |
| [Scenario name] | resource bound | [Near limit] | [Operation] | [Bounded behavior] |
| [Scenario name] | assertion violation | [Invalid invariant state] | [Operation] | [Visible failure / contract error] |
| [Scenario name] | compatibility | [Old behavior/state] | [Operation] | [Still works] |
Add more rows as needed. If a category does not apply, state why under Notes.
These are the primary Definition of Done — the user value this milestone promises (ties to §5C). All Outcome Scenarios MUST be automated, and /slo-verify's Outcome Validation pass runs them front-to-end at runtime. Each carries a frozen id oc-<slug>-N (contiguous from 1, never renumbered) and is outcome-shaped: one observable user outcome plus follow-on Ands. A single-And, trivially-true, or mock-only scenario is non-conformant (/slo-plan rejects it; /slo-critique flags it ask). For non-value-bearing milestones, write N/A — not value-bearing, see <reason>.
| ID | Type | Scenario (Given / When / Then + And…) |
|---|---|---|
oc-<slug>-1 |
user value | Given [precondition] When [action] Then [observable user outcome] And [severity/visibility shown] And [appears in history] And [survives restart] |
oc-<slug>-2 |
security (cite tm-<slug>-abuse-N) |
Given user A and user B When B requests A's data Then access is denied And an audit event is created |
oc-<slug>-3 |
reliability | Given [dependency outage] When the operation runs Then it completes with local data And the outage is visible |
The end-to-end paths this milestone must keep working — each a frozen id cuj-<slug>-N (contiguous from 1, never renumbered), each a mandatory automated test run front-to-end by /slo-verify's Outcome Validation pass. For non-value-bearing milestones, write N/A — not value-bearing, see <reason>.
| ID | Journey (ordered front-to-end path) |
|---|---|
cuj-<slug>-1 |
[e.g. sensitive file exists → scan runs → finding generated → risk visible → user remediates] |
cuj-<slug>-2 |
[e.g. false positive → user dismisses → finding stays dismissed across restart] |
"Did this milestone break anything important?" Every core capability resolves to exactly one of pass | not_applicable | waived_with_reason — never blank (mirrors the §5B Bundle discipline). Failure of any required row blocks milestone completion (§6.12).
| Capability | Must still pass | Evidence path | Resolution (pass | not_applicable | waived_with_reason) |
|---|---|---|---|
| Login / auth | yes | ||
| Device / data sync | yes | ||
| Findings / core feature | yes | ||
| Dashboard / primary view | yes | ||
| Notifications | yes |
Add the capabilities that actually exist in this product; remove rows that do not apply (mark them not_applicable rather than deleting if a reader might expect them).
- [Existing test suite or feature that must still pass]
- [Specific edge case to verify]
- [Backward compatibility check]
- [Persistence/config/state compatibility check if relevant]
- [Public API/command still behaves the same]
- [Existing route/page still renders correctly]
- [Persisted state remains readable]
- [Existing tests for related features still pass]
File: [backend E2E test file path]
| E2E Test | What It Proves | Pass Criteria |
|---|---|---|
[test_function_name] |
[Runtime behavior validated] | [Specific assertion criteria] |
[test_function_name] |
[Runtime behavior validated] | [Specific assertion criteria] |
File: [frontend E2E test file path]
| E2E Test | What It Proves | Pass Criteria |
|---|---|---|
[test name] |
[Runtime behavior validated] | [Specific assertion criteria] |
- [Manual verification step — what to do and what to observe]
- [Manual verification step]
-
[test command]passes - App launches without errors
- Static analysis passes
-
git statusshows no untracked test artifacts -
.gitignorecovers all new generated files and build outputs
| Step | Command / Check | Expected Result | Actual Result | Pass/Fail | Notes |
|---|---|---|---|---|---|
| Baseline tests | [command] |
all green | |||
| BDD tests created | [files] |
fail for expected reason | |||
| E2E stubs created | [files] |
fail for expected reason | |||
| Implementation | [summary] |
contract satisfied | |||
| Formatter | [command] |
clean | |||
| Typecheck / build check | [command] |
clean | |||
| Static analyzer / linter | [command] |
clean | |||
| Dependency audit (if deps changed) | [command] |
pass or documented exception | |||
| Full tests | [command] |
green | |||
| E2E runtime | [command] |
green | |||
| Build/boot | [command] |
boots cleanly | |||
| Smoke tests | [steps] |
all checked | |||
| Resource-bound verification | [bound + test] |
bound encoded; test exercises near-limit | |||
| Invariant/assertion verification | [invariant + test] |
encoded; test triggers under fault injection if applicable | |||
| Debugger / state inspection | [what was inspected] |
hypothesis confirmed before code change | |||
| Test artifact cleanup | git status |
no untracked test artifacts | |||
| .gitignore review | review .gitignore |
patterns current, no stale entries | |||
| Compatibility checks | [checks] |
no regressions |
The milestone is done only when all of the following are true:
- all Outcome Scenarios (
oc-<slug>-N) pass front-to-end at runtime — the primary Definition of Done (§5C, §6.12); or the milestone isN/A — not value-bearing - all Critical User Journeys (
cuj-<slug>-N) pass front-to-end - the Core Capability Regression Matrix has no blank rows and no required row failing (every row
pass | not_applicable | waived_with_reason) - all listed BDD scenarios pass
- all listed E2E runtime validations pass
- full existing test suite remains green
- formatter, typecheck, and static analyzer pass (or local minimal waiver justified)
- dependency audit passes if dependencies changed
- smoke tests are checked off
- compatibility checklist is complete
- declared resource bounds (§4.4) are encoded and tested
- declared invariants/assertions (§4.3) are encoded and tested
- no forbidden shortcuts remain in production code
- all tests clean up their output artifacts —
git statusis clean .gitignoreis up to date with any new generated files or build outputs- docs are updated to match implementation
- lessons file is written (including assumptions verified / unresolved, invariants, resource bounds, debugging notes)
- completion summary is written
- Milestone Tracker is updated
Complete the Global Exit Rules above. Key documentation updates:
- ARCHITECTURE.md: [What to document]
- README.md: [What to update]
- Other docs: [What to update]
- [Why certain coverage categories do not apply]
- [Any explicit deferred work for future milestone]
Track which documents need updating per milestone.
| Milestone | ARCHITECTURE.md Update | README.md Update | .gitignore Update | Other Docs |
|---|---|---|---|---|
| 1 | [Section to add/update] | [Section to add/update] | [Patterns to add/remove] | [Section/file] |
| 2 | [Section to add/update] | [Section to add/update] | [Patterns to add/remove] | [Section/file] |
| 3 | [Section to add/update] | [Section to add/update] | [Patterns to add/remove] | [Section/file] |
Use this before writing production code:
Restate the milestone goal, allowed files, forbidden changes, compatibility requirements, dependency/migration rules, required tests, required runtime validation, resource bounds, invariants/assertions, static-analysis gates, debugger expectation, and the exact Definition of Done. Then list the smallest implementation approach that satisfies the contract without widening scope, and explain how the user-facing result reduces user decisions or reviewer work.
This template is the v4 evolution of docs/slo/templates/runbook-template_v_3_template.md. It folds in language-independent Carmack-style reliability controls (debugger-first inspection, mandatory static analysis, assertion-driven invariants, bounded resource design, "make invalid states unrepresentable", stricter evidence capture) on top of v3's SunLit-specific structure (carry-forward from prior retros, abuse-acceptance scenarios, Data classification + Proactive controls + threat-model integration). v3 remains in place as a historical artifact for runbooks already authored against it; v4 is the canonical going-forward template that /slo-plan produces.