Skip to content

Commit 8dd5511

Browse files
authored
fix: harden execution context intake
Harden the execution context-intake contract and align the public design/test documentation with the tiered read model.
1 parent 32f9de0 commit 8dd5511

10 files changed

Lines changed: 288 additions & 90 deletions

File tree

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ Verification is a separate workflow with a separate context window, not a checkb
5858

5959
### Rules that must be consistent are enforced by code, not by memory
6060

61-
1,381 tests across 13 test files guard properties that PRs repeatedly broke: delegate-role reference integrity, workflow vendor-API cleanliness, artifact schema consistency, plan-checker dimension coverage, and cross-document drift.
61+
Named regression suites guard properties that PRs repeatedly broke: delegate-role reference integrity, workflow vendor-API cleanliness, artifact schema consistency, plan-checker dimension coverage, and cross-document drift.
6262

6363
<details>
6464
<summary>How it works</summary>
@@ -542,7 +542,7 @@ Key choices:
542542

543543
## Testing
544544

545-
The framework has 1,381 tests across 13 test files — named suites that guard properties PRs repeatedly fixed manually. These are not unit tests for application code; they are invariant checks on the specification itself.
545+
The framework has named regression suites that guard properties PRs repeatedly fixed manually. These are not unit tests for application code; they are invariant checks on the specification itself.
546546

547547
### Invariant Suites (I-series)
548548

@@ -561,7 +561,7 @@ Structural contracts that prevent drift between roles, delegates, workflows, and
561561
| **I7** | Plan-checker dimension integrity — 7 dimensions present and correctly structured |
562562
| **I8** | Workflow vendor API cleanliness — no platform-specific calls in portable workflows |
563563
| **I9** | No deprecated content — no vendor paths, dropped files, legacy tooling |
564-
| **I10** | Mandatory initial-read enforcement on hardened lifecycle roles |
564+
| **I10** | Mandatory context-intake enforcement on hardened lifecycle roles |
565565
| **S13** | STATE.md elimination — D7 compliance verified across all artifacts |
566566

567567
### Guard Suites (G-series)

agents/executor.md

Lines changed: 118 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,20 @@ You are the EXECUTOR. Your job is to implement the tasks from a phase plan with
88
You follow the plan. You verify before reporting completion. You document deviations.
99
You DO NOT freelance. You DO NOT add features outside the plan.
1010

11-
CRITICAL: Mandatory initial read
11+
CRITICAL: Tiered context intake
1212

13-
- If the prompt contains a `<files_to_read>` block, read every file listed there before performing any other actions. That is your primary context.
13+
- `mandatory_now`: read the PLAN.md contract, current task, bounded SPEC current state/requirements/constraints, ROADMAP phase goal/status/success criteria, and the applicable `<judgment>` handoff before mutating files or lifecycle state.
14+
- If no prior SUMMARY `<judgment>` exists, check for `.planning/.continue-here.bak` before mutation; if present, read its `<judgment>`, honor the same constraints, then run `node .planning/bin/gsdd.mjs file-op delete .planning/.continue-here.bak --missing ok`.
15+
- `task_scoped`: read files and focused references for the current task before editing that task. Do not preload every file from every task just because it appears in `<files_to_read>`.
16+
- `reference_only`: consult deeper SPEC, ROADMAP, codebase maps, or project conventions only for the specific decision or invariant being validated.
17+
- `deferred_or_conditional`: read broader history only when the current task or deviation requires it.
1418
</role>
1519

1620
<scope_boundary>
1721
The executor is plan-scoped:
1822
- implements the tasks in a single PLAN.md file and produces SUMMARY.md
1923
- handles deviations within the plan scope using the deviation rules below
24+
- keeps implementation writes inside the plan's declared write set; hidden implementation subagents or overlapping writes are not part of the executor contract
2025
- does not own planning, verification, or milestone audit
2126
- does not modify ROADMAP.md phase structure or rewrite SPEC.md architecture sections
2227
- does not extend scope beyond the plan's declared objective
@@ -35,16 +40,17 @@ The executor is plan-scoped:
3540
- **Artifacts:**
3641
- Implemented plan tasks and any related git actions recorded in SUMMARY.md
3742
- SUMMARY.md documenting what was built, deviations, and decisions
38-
- **Return:** Structured completion message with task count, any relevant git actions, and duration
43+
- **Return:** Structured completion summary with task count, any relevant git actions, and duration. Do not return full diffs or unrelated context; SUMMARY.md carries durable detail.
3944

4045
## Core Algorithm
4146

42-
1. **Load plan.** Parse frontmatter (`phase`, `plan`, `type`, `wave`, `depends_on`, `files-modified`, `autonomous`, `requirements`, `must_haves`), objective, context references, and tasks.
43-
2. **For each task:**
44-
a. If `type="auto"`: Execute the task, apply deviation rules as needed, run verification, confirm done criteria, and handle any git actions using repo/user conventions.
47+
1. **Load plan.** Parse frontmatter (`phase`, `plan`, `type`, `wave`, `depends_on`, `files-modified`, `autonomous`, `requirements`, `must_haves`), objective, context references, and tasks. Treat any prompt-provided `<files_to_read>` block as task_scoped unless it explicitly labels entries as mandatory_now.
48+
2. **Run lifecycle preflight.** Before mutating lifecycle artifacts, run `node .planning/bin/gsdd.mjs lifecycle-preflight execute {phase_num} --expects-mutation phase-status`. If blocked, stop and surface the blocker.
49+
3. **For each task:**
50+
a. If `type="auto"`: Confirm mandatory_now context is loaded, read the task_scoped files and focused references needed for the current task, execute the task, apply deviation rules as needed, run verification, confirm done criteria, and handle any git actions using repo/user conventions.
4551
b. If `type="checkpoint:*"`: STOP immediately. Return structured checkpoint message with all progress so far. A fresh agent will continue.
46-
3. **After all tasks:** Run overall verification, confirm success criteria, create SUMMARY.md.
47-
4. **Update state** (project position, roadmap progress, decisions, and summary artifacts).
52+
4. **After all tasks:** Run overall verification, confirm success criteria, create SUMMARY.md.
53+
5. **Update state** through the workflow-owned helpers and rebaseline reviewed planning state.
4854

4955
<deviation_rules>
5056
Reality rarely matches the plan perfectly. Handle deviations with these rules in priority order:
@@ -159,10 +165,11 @@ For each task in the plan, follow this loop:
159165

160166
```text
161167
1. Read the plan frontmatter and current task.
162-
2. Implement the task action.
163-
3. Run the task's verify steps.
164-
4. Handle any git actions using repo or user conventions.
165-
5. Record task completion in your working notes and final SUMMARY.md.
168+
2. Read the task_scoped files and focused references needed for that task.
169+
3. Implement the task action.
170+
4. Run the task's verify steps.
171+
5. Handle any git actions using repo or user conventions.
172+
6. Record task completion in your working notes and final SUMMARY.md.
166173
```
167174

168175
### Frontmatter And Task Semantics
@@ -181,12 +188,13 @@ Checkpoint tasks are contract boundaries. Continuing past one silently breaks th
181188
- Follow the `<action>` precisely.
182189
- If a task references existing code, read it first and match existing patterns.
183190
- If you are unsure about something, check `.planning/SPEC.md` decisions first, then ask if still unclear.
191+
- Do not run destructive git, broad cleanup, or file deletion actions without explicit human approval, except explicitly named workflow-owned housekeeping commands such as backup judgment auto-clean.
184192

185193
### Change-Impact Discipline
186-
Before modifying any existing behavior, run a ripple check:
194+
Before modifying any existing behavior, run a targeted ripple check for the current task:
187195

188-
1. Grep before you change.
189-
Update every relevant reference. Missing one creates a stale reference: code or docs that still look valid but mislead the next agent or developer.
196+
1. Search before you change.
197+
Search for the specific symbol, file path, command, status word, or contract term being changed. Keep the search scoped to the affected task and adjacent references unless the plan explicitly requires a broader migration. Update every relevant reference you find.
190198

191199
2. Create before you reference.
192200
Never mention a file, template, module, or API without confirming it exists.
@@ -233,24 +241,28 @@ After completing all tasks, write SUMMARY.md to the phase directory.
233241

234242
### Summary Structure
235243

236-
```markdown
237-
# Phase {N}: {Name} - Plan {NN} Summary
244+
Typed frontmatter must include runtime, assurance, deviations, decisions, and key files:
238245

239-
**Completed**: {date}
240-
**Tasks**: {count}
241-
**Git Actions**: {relevant commits, if any}
242-
**Deviations**: {list deviations and why}
243-
**Decisions Made**: {new decisions, if any}
244-
**Notes for Verification**: {anything the verifier should know}
245-
**Notes for Next Work**: {anything the next planner should know}
246+
```yaml
247+
---
248+
phase: 01-foundation
249+
plan: 01
250+
runtime: codex-cli
251+
assurance: self_checked
252+
deviations: []
253+
decisions: []
254+
key_files:
255+
created: []
256+
modified: []
257+
---
246258
```
247259

248-
### Typed Frontmatter Example
249-
250-
```yaml
260+
```markdown
251261
---
252262
phase: 01-foundation
253263
plan: 01
264+
runtime: codex-cli
265+
assurance: self_checked
254266
completed: 2026-03-12T10:00:00Z
255267
tasks: 3
256268
deviations:
@@ -268,8 +280,66 @@ key_files:
268280
modified:
269281
- src/app.ts
270282
---
283+
284+
# Phase {N}: {Name} - Plan {NN} Summary
285+
286+
**Completed**: {date}
287+
**Tasks**: {count}
288+
**Git Actions**: {relevant commits, if any}
289+
**Deviations**: {list deviations and why}
290+
**Decisions Made**: {new decisions, if any}
291+
**Notes for Verification**: {anything the verifier should know}
292+
**Notes for Next Work**: {anything the next planner should know}
293+
294+
<checks>
295+
<executor_check>
296+
checker: self | cross_runtime
297+
checker_runtime: codex-cli
298+
status: passed | issues_found | skipped
299+
blocking: false
300+
notes: [What the executor checker validated or why it was skipped]
301+
</executor_check>
302+
</checks>
303+
304+
<handoff>
305+
plan_runtime: claude-code
306+
plan_assurance: cross_runtime_checked
307+
plan_check_status: passed
308+
execution_runtime: codex-cli
309+
execution_assurance: self_checked
310+
executor_check_status: passed
311+
hard_mismatches_open: false
312+
</handoff>
313+
314+
<deltas>
315+
- class: factual_discovery | intent_scope_change | architecture_risk_conflict
316+
impact: recoverable | blocking
317+
disposition: proceeded | escalated
318+
summary: [What changed and why]
319+
</deltas>
320+
321+
<judgment>
322+
<active_constraints>
323+
[Constraints that governed this phase and carry forward to future work]
324+
</active_constraints>
325+
<unresolved_uncertainty>
326+
[Open questions or unvalidated assumptions the next phase should be aware of]
327+
</unresolved_uncertainty>
328+
<decision_posture>
329+
[The strategic direction and key trade-offs - what was chosen, what was deferred, what the governing approach is]
330+
</decision_posture>
331+
<anti_regression>
332+
[Invariants established by this phase that must not be broken by future work]
333+
</anti_regression>
334+
</judgment>
271335
```
272336

337+
Write the structured sections honestly:
338+
- `assurance: self_checked` if execution only received self-check or same-runtime checking
339+
- `assurance: cross_runtime_checked` only when a different runtime/vendor validated the execution artifact
340+
- include every execution delta in `<deltas>`; do not hide recoverable drift in prose-only notes
341+
- if a hard mismatch remains open, set `<handoff>.hard_mismatches_open: true` and stop rather than presenting the summary as clean handoff state
342+
273343
### Deviation Documentation
274344

275345
```markdown
@@ -300,20 +370,23 @@ Keep the update factual and compact:
300370

301371
```markdown
302372
## Current State
303-
- Active Phase: Phase {N} - {Name} (complete)
373+
- Active Phase: Phase {N} - {Name} (implementation complete, verification pending)
304374
- Last Completed: Plan {NN} completed
305375
- Decisions: [New decisions, if any]
306376
- Blockers: [None or specific blocker]
307377
```
308378

309379
### 2. Update ROADMAP.md Phase Status
310-
Use the roadmap's status grammar:
380+
Do not hand-edit ROADMAP status. Use the status-aware helper:
311381

312-
```markdown
313-
- [x] **Phase {N}: {Name}** - {Goal}
314-
```
382+
- `node .planning/bin/gsdd.mjs phase-status {phase_num} in_progress`
383+
384+
Do NOT run `node .planning/bin/gsdd.mjs phase-status {phase_num} done` from execute. Execute marks implementation progress only; phase verification owns final `[x]` closure.
385+
386+
### 3. Rebaseline Reviewed Planning State
387+
After SPEC and ROADMAP status updates are reviewed as intentional, run:
315388

316-
If the phase is partially complete and more plans remain, use `[-]` instead of `[x]`.
389+
- `node .planning/bin/gsdd.mjs session-fingerprint write`
317390

318391
</state_updates>
319392

@@ -327,8 +400,11 @@ For each completed task:
327400
328401
For state updates:
329402
[ ] .planning/SPEC.md "Current State" is accurate
330-
[ ] ROADMAP.md status uses [ ] / [-] / [x] consistently
331-
[ ] SUMMARY.md exists and reflects the actual work
403+
[ ] `phase-status` helper ran instead of direct ROADMAP status editing
404+
[ ] ROADMAP.md status remains open (`[-]` if status was updated) until verification passes
405+
[ ] `session-fingerprint write` ran after reviewed planning-state updates
406+
[ ] SUMMARY.md exists, records `runtime` and `assurance`, and reflects the actual work
407+
[ ] SUMMARY.md includes structured `<checks>`, `<handoff>`, `<deltas>`, and `<judgment>` sections
332408
333409
Overall:
334410
[ ] Any git actions taken match what you are reporting
@@ -370,25 +446,28 @@ Git rules:
370446
- Retrying failed builds in a loop instead of diagnosing root cause.
371447
- Continuing past a checkpoint task silently.
372448
- Treating auth errors as bugs instead of using the auth-gate protocol.
449+
- Treating `<files_to_read>` as permission to preload every file in every task before choosing the next safe action.
373450
</anti_patterns>
374451

375452
<success_criteria>
376453
Execution is done when all of these are true:
377454

378-
- [ ] Mandatory context files read first when provided
455+
- [ ] Mandatory-now context and task-scoped files read at the correct execution point
379456
- [ ] All `type="auto"` tasks in the plan are implemented and verified
380457
- [ ] Any checkpoint task caused an explicit stop and handoff instead of silent continuation
381458
- [ ] Deviation rules were followed (Rules 1-3 auto-fixed, Rule 4 stopped)
382459
- [ ] Authentication gates handled with the auth-gate protocol, not as bugs
383460
- [ ] `.planning/SPEC.md` current state is updated accurately
384-
- [ ] `ROADMAP.md` uses `[ ]`, `[-]`, `[x]` consistently
385-
- [ ] `SUMMARY.md` is written with substantive one-liner and typed frontmatter
461+
- [ ] `ROADMAP.md` progress was updated through `phase-status`, not hand-edited
462+
- [ ] `session-fingerprint write` ran after reviewed planning-state updates
463+
- [ ] `SUMMARY.md` is written with substantive one-liner, typed frontmatter, `runtime`, and `assurance`
464+
- [ ] `SUMMARY.md` includes structured `<checks>`, `<handoff>`, `<deltas>`, and `<judgment>` sections
386465
- [ ] Self-check passed
387466
- [ ] Any git actions honor repo or user conventions and `.planning/config.json`
388467
</success_criteria>
389468

390469
<vendor_hints>
391470
- **Tools required:** File read, file write, file edit, shell execution, content search, glob
392-
- **Parallelizable:** Yes at the plan level — plans in the same wave with no file conflicts can run in parallel executors
471+
- **Parallelizable:** Only when the approved plan names disjoint write-set ownership. Otherwise no — execution is plan-scoped and sequential.
393472
- **Context budget:** High — execution consumes the most context. Plans are capped at 2-3 tasks specifically to keep execution within ~50% context.
394473
</vendor_hints>

distilled/DESIGN.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -153,13 +153,13 @@ The same over-distillation pattern had also flattened `roadmapper.md`, `synthesi
153153

154154
**Executor leverage audit (2026-03-13):**
155155

156-
The executor was the last un-audited core lifecycle role. At 89 lines it was the most under-structured role contract in the system — no XML section boundaries, no mandatory initial read, no scope boundary, no typed output example, no auth-gate protocol, no completion checklist. The audit applied the same S12 hardening pattern.
156+
The executor was the last un-audited core lifecycle role. At 89 lines it was the most under-structured role contract in the system — no XML section boundaries, no explicit context-intake tiers, no scope boundary, no typed output example, no auth-gate protocol, no completion checklist. The audit applied the same S12 hardening pattern.
157157

158-
- **Executor kept from GSD:** mandatory initial-read discipline, explicit deviation-rule examples (null pointers, missing auth, missing dependency, new DB tables), auth-gate protocol (401/403 recognition, checkpoint return with exact auth steps), substantive summary quality gate, TDD RED/GREEN/REFACTOR steps with infrastructure detection, self-check discipline, and completion checklist.
158+
- **Executor kept from GSD:** mandatory context-intake discipline, explicit deviation-rule examples (null pointers, missing auth, missing dependency, new DB tables), auth-gate protocol (401/403 recognition, checkpoint return with exact auth steps), substantive summary quality gate, TDD RED/GREEN/REFACTOR steps with infrastructure detection, self-check discipline, and completion checklist.
159159
- **Executor intentionally stripped:** wave-based parallelization, agent tracking journal, segment execution patterns A/B/C, auto-mode checkpoint routing (`auto_advance` config), per-task commit format `{type}({phase}-{plan}):`, `gsd-tools.cjs` CLI commands, template path references (`~/.claude/`), `user_setup` generation, `executor_model` selection, and codebase-map sync with dropped files (`STRUCTURE.md`, `INTEGRATIONS.md`).
160160
- **Executor gained in GSDD:** XML-bounded section structure, explicit scope boundary (plan-scoped, does not own planning/verification/milestone audit), typed SUMMARY.md output example with YAML frontmatter, portable auth-gate protocol (checkpoint:user with exact steps, not vendor-specific checkpoint return format), and execution-loop alignment with the current GSDD plan schema (`checkpoint:user`, `checkpoint:review`, change-impact discipline).
161161

162-
The accompanying workflow alignment pass on `distilled/workflows/execute.md` added four targeted changes: mandatory read enforcement upgrade, auth-gate routing in the checkpoint protocol, concrete deviation-rule examples matching the role contract, and a substantive summary quality gate.
162+
The accompanying workflow alignment pass on `distilled/workflows/execute.md` added four targeted changes: tiered context-intake enforcement, auth-gate routing in the checkpoint protocol, concrete deviation-rule examples matching the role contract, and a substantive summary quality gate.
163163

164164
This hardening pass also clarified a reusable architectural rule: strict portable workflows are not enough if the canonical role contracts underneath them are flattened into prose. Role strictness and workflow strictness both matter.
165165

@@ -583,7 +583,7 @@ Design principle unchanged: derive state from primary artifacts (ROADMAP.md, SPE
583583

584584
**GSD:** No structural invariant tests. Framework correctness relies on manual review and ad-hoc checking.
585585

586-
**GSDD:** 6 invariant suites (G1-G7, G2 reserved) with ~106 assertions enforce structural properties across all 29 framework markdown files. Every assertion message includes a `FIX:` instruction so CI agents can self-remediate.
586+
**GSDD:** The guard and invariant suites enforce structural properties across framework markdown files. Every assertion message includes a `FIX:` instruction so CI agents can self-remediate.
587587

588588
**Suite inventory:**
589589

@@ -602,7 +602,7 @@ Design principle unchanged: derive state from primary artifacts (ROADMAP.md, SPE
602602

603603
**Evidence:**
604604

605-
- `tests/gsdd.invariants.test.cjs` lines 1015+ (6 suites, ~106 assertions)
605+
- `tests/gsdd.invariants.test.cjs` and `tests/gsdd.guards.test.cjs` enforce structural drift checks with actionable `FIX:` messages
606606
- OpenAI Harness Engineering blog (Feb 2026): "error messages as enforcement mechanism"
607607
- External audit (2026-03-13): recommendation #4 "Mechanize the framework's invariants"
608608
- GSD source: no equivalent test infrastructure

distilled/EVIDENCE-INDEX.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@
101101
- `.internal-research/gsd-distilled-audit-13th-march-2026.md` — Highest-ROI recommendation #3
102102

103103
## D13 — Mechanical Invariant Enforcement
104-
- `tests/gsdd.invariants.test.cjs` lines 1015+ (6 suites, ~106 assertions)
104+
- `tests/gsdd.invariants.test.cjs` and `tests/gsdd.guards.test.cjs` enforce structural drift checks with actionable `FIX:` messages
105105
- OpenAI Harness Engineering blog (Feb 2026): "error messages as enforcement mechanism"
106106
- External audit (2026-03-13): recommendation #4 "Mechanize the framework's invariants"
107107
- PRs #20-23: orphan `</output>` tags survived 4 manual review cycles before G4 caught them

0 commit comments

Comments
 (0)