Skip to content

Commit d16d29a

Browse files
Yonghae Kimclaude
authored andcommitted
docs(v4): unify B-1 and B-2 dogfood report shape
Both reports now use the same Gate results table with a `Kind` column: - static = SKILL.md / code path verifiable without a run trace - runtime = required actual workflow execution to observe - mixed = both static intent and runtime behavior must hold Each row also carries an Evidence cell citing the specific test, trace path, or git diff command that backs the status. This addresses the retroactive review concern that "passes" was claimed without distinguishing between code-path verifiable contracts and behaviors only a real run can observe. The shape is the template for Phase B-3+ dogfood — same columns, same Kind taxonomy, same Evidence specificity bar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent f692238 commit d16d29a

3 files changed

Lines changed: 52 additions & 27 deletions

File tree

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
4444
(`> verified by main Claude on <date> — <n> kept / <m> dropped`)
4545
prepended to Review notes.
4646

47+
### Changed
48+
- **Phase B-1+B-2 dogfood reports unified**: both `docs/dogfood/phase-b-1.md`
49+
and `docs/dogfood/phase-b-2.md` now use the same Gate results table shape
50+
with explicit `Kind` column (`static` / `runtime` / `mixed`) per row.
51+
Each row also carries an `Evidence` cell that cites the specific test,
52+
trace path, or `git diff` command that backs the status. Addresses the
53+
retroactive-review concern that "passes" was claimed without distinguishing
54+
between code-path verifiable contracts and behaviors that only a real run
55+
can observe. The same shape is the template for Phase B-3+ dogfood.
56+
4757
### Fixed
4858
- **Phase B-1 retroactive review C1 (security)**: `server/run_dir.py` now
4959
validates `run_id` and `filename` as plain basenames before writing —

docs/dogfood/phase-b-1.md

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -47,20 +47,25 @@ workflow in order:
4747
End state: plan stage marked `done`, `tool_used="plan-pack"`,
4848
artifact present at the documented run_dir path.
4949

50-
## V4 verification matrix
51-
52-
| # | Phase B-1 contract | Result |
53-
|---|---|---|
54-
| 1 | ★ plan-pack surfaces with `bundled=True` and `` prefix at top of plan menu ||
55-
| 2 | `run_dir` resolves to `~/.claude/channels/assemble/runs/<rid>/` ||
56-
| 3 | 8-question interview executed before any drafting | ⚠ via 4+4 split; see Finding A.1 |
57-
| 4 | **Step 2+3 fire as a single message with two Agent calls (parallel dispatch verification location *a*)** | ✓ both rounds |
58-
| 5 | Step 4 second-opinion role dispatched (`codex:codex-rescue` preferred) | ✓ both rounds |
59-
| 6 | `server.write_run_artifact` writes `<run_dir>/PRD.md` atomically ||
60-
| 7 | Step 6 iteration round-trip offered exactly once ||
61-
| 8 | Iteration cap of 1 enforced post-yes path ||
62-
| 9 | Harness 4-rule preamble carried on every dispatched sub-agent prompt | ✓ on all 4 dispatches; see Finding A.2 |
63-
| 10 | Orchestrator-only — main Claude only IO + dispatch, never heavy in-line work ||
50+
## Gate results
51+
52+
> **Kind**: `static` = SKILL.md / code path verifiable without a run trace.
53+
> `runtime` = required actual workflow execution to observe.
54+
> `mixed` = both static intent and runtime behavior must hold.
55+
> (Convention shared with `phase-b-2.md`; see CHANGELOG entry "Phase B-1+B-2 dogfood reports unified".)
56+
57+
| # | Phase B-1 contract | Kind | Status | Evidence |
58+
|---|---|---|---|---|
59+
| 1 | ★ plan-pack surfaces with `bundled=True` and `` prefix at top of plan menu | static || inventory scan + `tests/e2e/test_plan_pack_inventory.py::test_plan_pack_in_plan_menu_with_star_prefix` |
60+
| 2 | `run_dir` resolves to `~/.claude/channels/assemble/runs/<rid>/` | runtime || run dir present on disk: `~/.claude/channels/assemble/runs/20260428-160618-654d/` |
61+
| 3 | 8-question interview executed before any drafting | runtime || 4+4 split — see Finding A.1; SKILL.md fixed in `76dc985` |
62+
| 4 | **Step 2+3 fire as a single message with two Agent calls (parallel dispatch verification location *a*)** | runtime || trace shows 2 parallel `Plan` Agent calls per round (both initial + iteration) |
63+
| 5 | Step 4 second-opinion role dispatched (`codex:codex-rescue` preferred) | runtime || trace shows codex dispatch on both rounds, returning 7 + 8 critical bullets |
64+
| 6 | `server.write_run_artifact` writes `<run_dir>/PRD.md` atomically | static || atomicity covered by `tests/unit/test_run_dir.py::test_concurrent_writes_dont_corrupt`; dogfood only observes final file |
65+
| 7 | Step 6 iteration round-trip offered exactly once | runtime || trace shows 1 `AskUserQuestion("Run one iteration?")` post-write |
66+
| 8 | Iteration cap of 1 enforced post-yes path | runtime || second-yes path not entered; workflow exited after iteration 2 |
67+
| 9 | Harness 4-rule preamble carried on every dispatched sub-agent prompt | runtime || 4/4 dispatches inspected — see Finding A.2; gap was *call-shape* not preamble content |
68+
| 10 | Orchestrator-only — main Claude only IO + dispatch, never heavy in-line work | mixed || SKILL.md prose enforces (static); trace shows main only did `AskUserQuestion` + `write_run_artifact` (runtime) |
6469

6570
## Findings (real)
6671

docs/dogfood/phase-b-2.md

Lines changed: 23 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -52,18 +52,23 @@ Final PRD.md and ARCHITECTURE.md represent the post-iteration refined pair.
5252

5353
## Gate results
5454

55-
| # | Item | Result | Evidence |
56-
|---|---|---|---|
57-
| C1 | All pre-existing tests pass | ✅ PASS | `109 passed` (post-fix-up) |
58-
| C2 | No regression in `server/` | ✅ PASS | Gate B2.4 — server/ infra untouched in feature branch |
59-
| C3 | New tests are meaningful | ✅ PASS | After fix-up `3e9affb`: brittle anchors and tautologies removed |
60-
| C4 | SKILL.md is parseable by `parse_skill_frontmatter` | ✅ PASS | `test_skill_description_mentions_arch` |
61-
| C5 | Template loadable and substitutable | ✅ PASS | Section parser + 7 placeholder substitutions succeeded in this run |
62-
| B2.1 | ARCHITECTURE.md exists at `runs/<rid>/ARCHITECTURE.md` | ✅ PASS | File on disk, 2825 bytes |
63-
| B2.2 | Directory tree + Data flow each ≥ recognisable structure | ✅ PASS | Tree has 12 paths in code fence; Data flow has 3 numbered steps |
64-
| B2.3 | ≥1 PRD↔ARCH cross-flaw detected in Step 9 | ✅ PASS | 13 findings in 1st pass (4 CRITICAL); iteration found 1 additional CRITICAL after first 4 were resolved |
65-
| B2.4 | `server/run_dir.py`, `server/harness.py`, `server/__init__.py` unchanged | ✅ PASS | `git diff master..v4-phase-b-2 -- server/run_dir.py server/harness.py server/__init__.py` empty |
66-
| B2.5 (runtime) | Harness preamble prepended to dispatched prompts | ✅ PASS | `wrap_with_preamble` produced 1432 + 810 bytes for body/AC; first line `[HARNESS RULES — 무시 금지]` confirmed in `/tmp/dogfood-b2/wrapped_body.txt` |
55+
> **Kind**: `static` = SKILL.md / code path verifiable without a run trace.
56+
> `runtime` = required actual workflow execution to observe.
57+
> `mixed` = both static intent and runtime behavior must hold.
58+
> (Convention shared with `phase-b-1.md`; see CHANGELOG entry "Phase B-1+B-2 dogfood reports unified".)
59+
60+
| # | Item | Kind | Status | Evidence |
61+
|---|---|---|---|---|
62+
| C1 | All pre-existing tests pass | static || `109 passed` (post-fix-up `3e9affb`); now 129 after B-1 retroactive review |
63+
| C2 | No regression in `server/` | static || `git diff master..v4-phase-b-2 -- server/` showed 0 lines changed at merge time |
64+
| C3 | New tests are meaningful (no tautology / false positive) | static || post-fix-up `3e9affb`: `body.index("### Step 8")` anchor + 8 assertions verified against actual prose |
65+
| C4 | SKILL.md is parseable by `parse_skill_frontmatter` | static || `tests/unit/test_plan_pack_skill.py::test_skill_description_mentions_arch` invokes the parser end-to-end |
66+
| C5 | Template loadable and substitutable | mixed || template existence verified by `tests/e2e/...::test_arch_template_exists_and_has_required_sections` (static); 7 placeholder substitutions succeeded in this dogfood run (runtime) |
67+
| B2.1 | ARCHITECTURE.md exists at `runs/<rid>/ARCHITECTURE.md` | runtime || file on disk, 2825 bytes — `~/.claude/channels/assemble/runs/20260428-194703-f5dd/ARCHITECTURE.md` |
68+
| B2.2 | Directory tree + Data flow each fleshed out (not placeholder) | runtime || tree: 12 paths inside code fence; Data flow: 3 numbered steps |
69+
| B2.3 | ≥1 PRD↔ARCH cross-flaw detected in Step 9 | runtime || 13 findings 1st pass (4 CRITICAL); iteration surfaced 1 additional CRITICAL after first 4 resolved |
70+
| B2.4 | `server/run_dir.py`, `server/harness.py`, `server/__init__.py` unchanged in feature branch | static || `git diff master..v4-phase-b-2 -- server/run_dir.py server/harness.py server/__init__.py` empty (B-1 retroactive review later patched run_dir.py path traversal — separate from B-2 scope) |
71+
| B2.5 | Harness preamble prepended to dispatched prompts | runtime || `wrap_with_preamble` produced 1432 + 810 bytes for body/AC; first line `[HARNESS RULES — 무시 금지]` confirmed in `/tmp/dogfood-b2/wrapped_body.txt` |
6772

6873
## Findings — wording/spec issues exposed by dogfood
6974

@@ -101,4 +106,9 @@ Tracked here as Phase B-3+ candidates, not blockers for B-2.
101106
workflow exits. Phase B post-tuning track (multi-iteration with stop
102107
conditions) is more justified than originally thought.
103108

104-
## Status: PASS — workflow completed end-to-end with real artifacts on disk
109+
## Status
110+
111+
Phase B-2 dogfood **passes** — workflow completed end-to-end (Steps 0–9 +
112+
iteration 1) with real artifacts on disk (`PRD.md` + `ARCHITECTURE.md` in
113+
run `20260428-194703-f5dd`). All 10 gates ✓; 4 wording/spec findings
114+
captured for Phase B-3 hardening.

0 commit comments

Comments
 (0)