docs(v4): unify B-1 and B-2 dogfood report shape

Yonghae Kim · claude · Yonghae Kim · commit d16d29ad27ca · 2026-04-28T20:48:22.000+09:00
Both reports now use the same Gate results table with a `Kind` column:
- static = SKILL.md / code path verifiable without a run trace
- runtime = required actual workflow execution to observe
- mixed = both static intent and runtime behavior must hold

Each row also carries an Evidence cell citing the specific test, trace
path, or git diff command that backs the status. This addresses the
retroactive review concern that "passes" was claimed without
distinguishing between code-path verifiable contracts and behaviors
only a real run can observe.

The shape is the template for Phase B-3+ dogfood — same columns,
same Kind taxonomy, same Evidence specificity bar.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -44,6 +44,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   (`> verified by main Claude on <date> — <n> kept / <m> dropped`)
   prepended to Review notes.
 
+### Changed
+- **Phase B-1+B-2 dogfood reports unified**: both `docs/dogfood/phase-b-1.md`
+  and `docs/dogfood/phase-b-2.md` now use the same Gate results table shape
+  with explicit `Kind` column (`static` / `runtime` / `mixed`) per row.
+  Each row also carries an `Evidence` cell that cites the specific test,
+  trace path, or `git diff` command that backs the status. Addresses the
+  retroactive-review concern that "passes" was claimed without distinguishing
+  between code-path verifiable contracts and behaviors that only a real run
+  can observe. The same shape is the template for Phase B-3+ dogfood.
+
 ### Fixed
 - **Phase B-1 retroactive review C1 (security)**: `server/run_dir.py` now
   validates `run_id` and `filename` as plain basenames before writing —
diff --git a/docs/dogfood/phase-b-1.md b/docs/dogfood/phase-b-1.md
@@ -47,20 +47,25 @@ workflow in order:
 End state: plan stage marked `done`, `tool_used="plan-pack"`,
 artifact present at the documented run_dir path.
 
-## V4 verification matrix
-
-| # | Phase B-1 contract | Result |
-|---|---|---|
-| 1 | ★ plan-pack surfaces with `bundled=True` and `★ ` prefix at top of plan menu | ✓ |
-| 2 | `run_dir` resolves to `~/.claude/channels/assemble/runs/<rid>/` | ✓ |
-| 3 | 8-question interview executed before any drafting | ⚠ via 4+4 split; see Finding A.1 |
-| 4 | **Step 2+3 fire as a single message with two Agent calls (parallel dispatch verification location *a*)** | ✓ both rounds |
-| 5 | Step 4 second-opinion role dispatched (`codex:codex-rescue` preferred) | ✓ both rounds |
-| 6 | `server.write_run_artifact` writes `<run_dir>/PRD.md` atomically | ✓ |
-| 7 | Step 6 iteration round-trip offered exactly once | ✓ |
-| 8 | Iteration cap of 1 enforced post-yes path | ✓ |
-| 9 | Harness 4-rule preamble carried on every dispatched sub-agent prompt | ✓ on all 4 dispatches; see Finding A.2 |
-| 10 | Orchestrator-only — main Claude only IO + dispatch, never heavy in-line work | ✓ |
+## Gate results
+
+> **Kind**: `static` = SKILL.md / code path verifiable without a run trace.
+> `runtime` = required actual workflow execution to observe.
+> `mixed` = both static intent and runtime behavior must hold.
+> (Convention shared with `phase-b-2.md`; see CHANGELOG entry "Phase B-1+B-2 dogfood reports unified".)
+
+| # | Phase B-1 contract | Kind | Status | Evidence |
+|---|---|---|---|---|
+| 1 | ★ plan-pack surfaces with `bundled=True` and `★ ` prefix at top of plan menu | static | ✓ | inventory scan + `tests/e2e/test_plan_pack_inventory.py::test_plan_pack_in_plan_menu_with_star_prefix` |
+| 2 | `run_dir` resolves to `~/.claude/channels/assemble/runs/<rid>/` | runtime | ✓ | run dir present on disk: `~/.claude/channels/assemble/runs/20260428-160618-654d/` |
+| 3 | 8-question interview executed before any drafting | runtime | ⚠ | 4+4 split — see Finding A.1; SKILL.md fixed in `76dc985` |
+| 4 | **Step 2+3 fire as a single message with two Agent calls (parallel dispatch verification location *a*)** | runtime | ✓ | trace shows 2 parallel `Plan` Agent calls per round (both initial + iteration) |
+| 5 | Step 4 second-opinion role dispatched (`codex:codex-rescue` preferred) | runtime | ✓ | trace shows codex dispatch on both rounds, returning 7 + 8 critical bullets |
+| 6 | `server.write_run_artifact` writes `<run_dir>/PRD.md` atomically | static | ✓ | atomicity covered by `tests/unit/test_run_dir.py::test_concurrent_writes_dont_corrupt`; dogfood only observes final file |
+| 7 | Step 6 iteration round-trip offered exactly once | runtime | ✓ | trace shows 1 `AskUserQuestion("Run one iteration?")` post-write |
+| 8 | Iteration cap of 1 enforced post-yes path | runtime | ✓ | second-yes path not entered; workflow exited after iteration 2 |
+| 9 | Harness 4-rule preamble carried on every dispatched sub-agent prompt | runtime | ✓ | 4/4 dispatches inspected — see Finding A.2; gap was *call-shape* not preamble content |
+| 10 | Orchestrator-only — main Claude only IO + dispatch, never heavy in-line work | mixed | ✓ | SKILL.md prose enforces (static); trace shows main only did `AskUserQuestion` + `write_run_artifact` (runtime) |
 
 ## Findings (real)
 
diff --git a/docs/dogfood/phase-b-2.md b/docs/dogfood/phase-b-2.md
@@ -52,18 +52,23 @@ Final PRD.md and ARCHITECTURE.md represent the post-iteration refined pair.
 
 ## Gate results
 
-| # | Item | Result | Evidence |
-|---|---|---|---|
-| C1 | All pre-existing tests pass | ✅ PASS | `109 passed` (post-fix-up) |
-| C2 | No regression in `server/` | ✅ PASS | Gate B2.4 — server/ infra untouched in feature branch |
-| C3 | New tests are meaningful | ✅ PASS | After fix-up `3e9affb`: brittle anchors and tautologies removed |
-| C4 | SKILL.md is parseable by `parse_skill_frontmatter` | ✅ PASS | `test_skill_description_mentions_arch` |
-| C5 | Template loadable and substitutable | ✅ PASS | Section parser + 7 placeholder substitutions succeeded in this run |
-| B2.1 | ARCHITECTURE.md exists at `runs/<rid>/ARCHITECTURE.md` | ✅ PASS | File on disk, 2825 bytes |
-| B2.2 | Directory tree + Data flow each ≥ recognisable structure | ✅ PASS | Tree has 12 paths in code fence; Data flow has 3 numbered steps |
-| B2.3 | ≥1 PRD↔ARCH cross-flaw detected in Step 9 | ✅ PASS | 13 findings in 1st pass (4 CRITICAL); iteration found 1 additional CRITICAL after first 4 were resolved |
-| B2.4 | `server/run_dir.py`, `server/harness.py`, `server/__init__.py` unchanged | ✅ PASS | `git diff master..v4-phase-b-2 -- server/run_dir.py server/harness.py server/__init__.py` empty |
-| B2.5 (runtime) | Harness preamble prepended to dispatched prompts | ✅ PASS | `wrap_with_preamble` produced 1432 + 810 bytes for body/AC; first line `[HARNESS RULES — 무시 금지]` confirmed in `/tmp/dogfood-b2/wrapped_body.txt` |
+> **Kind**: `static` = SKILL.md / code path verifiable without a run trace.
+> `runtime` = required actual workflow execution to observe.
+> `mixed` = both static intent and runtime behavior must hold.
+> (Convention shared with `phase-b-1.md`; see CHANGELOG entry "Phase B-1+B-2 dogfood reports unified".)
+
+| # | Item | Kind | Status | Evidence |
+|---|---|---|---|---|
+| C1 | All pre-existing tests pass | static | ✓ | `109 passed` (post-fix-up `3e9affb`); now 129 after B-1 retroactive review |
+| C2 | No regression in `server/` | static | ✓ | `git diff master..v4-phase-b-2 -- server/` showed 0 lines changed at merge time |
+| C3 | New tests are meaningful (no tautology / false positive) | static | ✓ | post-fix-up `3e9affb`: `body.index("### Step 8")` anchor + 8 assertions verified against actual prose |
+| C4 | SKILL.md is parseable by `parse_skill_frontmatter` | static | ✓ | `tests/unit/test_plan_pack_skill.py::test_skill_description_mentions_arch` invokes the parser end-to-end |
+| C5 | Template loadable and substitutable | mixed | ✓ | template existence verified by `tests/e2e/...::test_arch_template_exists_and_has_required_sections` (static); 7 placeholder substitutions succeeded in this dogfood run (runtime) |
+| B2.1 | ARCHITECTURE.md exists at `runs/<rid>/ARCHITECTURE.md` | runtime | ✓ | file on disk, 2825 bytes — `~/.claude/channels/assemble/runs/20260428-194703-f5dd/ARCHITECTURE.md` |
+| B2.2 | Directory tree + Data flow each fleshed out (not placeholder) | runtime | ✓ | tree: 12 paths inside code fence; Data flow: 3 numbered steps |
+| B2.3 | ≥1 PRD↔ARCH cross-flaw detected in Step 9 | runtime | ✓ | 13 findings 1st pass (4 CRITICAL); iteration surfaced 1 additional CRITICAL after first 4 resolved |
+| B2.4 | `server/run_dir.py`, `server/harness.py`, `server/__init__.py` unchanged in feature branch | static | ✓ | `git diff master..v4-phase-b-2 -- server/run_dir.py server/harness.py server/__init__.py` empty (B-1 retroactive review later patched run_dir.py path traversal — separate from B-2 scope) |
+| B2.5 | Harness preamble prepended to dispatched prompts | runtime | ✓ | `wrap_with_preamble` produced 1432 + 810 bytes for body/AC; first line `[HARNESS RULES — 무시 금지]` confirmed in `/tmp/dogfood-b2/wrapped_body.txt` |
 
 ## Findings — wording/spec issues exposed by dogfood
 
@@ -101,4 +106,9 @@ Tracked here as Phase B-3+ candidates, not blockers for B-2.
    workflow exits. Phase B post-tuning track (multi-iteration with stop
    conditions) is more justified than originally thought.
 
-## Status: PASS — workflow completed end-to-end with real artifacts on disk
+## Status
+
+Phase B-2 dogfood **passes** — workflow completed end-to-end (Steps 0–9 +
+iteration 1) with real artifacts on disk (`PRD.md` + `ARCHITECTURE.md` in
+run `20260428-194703-f5dd`). All 10 gates ✓; 4 wording/spec findings
+captured for Phase B-3 hardening.