Skip to content

Commit d3473aa

Browse files
Your Nameclaude
andcommitted
feat: add Runtime Morphology Theory Draft v0.1 (Phase 4-2)
Consolidate all 12 evidence-graded theory candidates into a single structured draft with clear separation of supported claims, caveated claims, exploratory directions, and deferred claims. The draft explicitly states what it does NOT support: prediction, anomaly detection, auto-diagnosis, universal prompt policy, routing default strategy, safety-control automation, cross-lane aggregation without disclosure. Includes evidence upgrade path (deferred→exploratory→supported_with_caveat→ supported), known blocker inventory (10 blockers with affected candidates and resolution paths), and maintenance rules. Update Phase 4 README: Phase 4-1 complete, Phase 4-2 active. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 7fa9db2 commit d3473aa

2 files changed

Lines changed: 234 additions & 4 deletions

File tree

docs/research/phase4/README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ Every candidate must include:
6161

6262
## Documents
6363

64+
- [Runtime Morphology Theory Draft v0.1](runtime_morphology_theory_draft_v0.1.md) — Consolidated theory draft: supported claims, caveated claims, exploratory directions, deferred claims, boundaries, blockers, upgrade path
6465
- [Evidence Grading Matrix](evidence_grading_matrix_v0.2.5.md) — Systematic evidence-grade review of all 12 candidates with blockers, promotion conditions, and falsification conditions
6566
- [Theory Candidate Inventory](theory_candidate_inventory_v0.2.5.md) — All current theory candidates with evidence grades, supporting data, and caveats
6667
- [Safety-Control Runtime Morphology](safety_control_morphology_candidates_v0.2.5.md) — Phase 4 theory candidate direction studying runtime control morphology at safety boundaries (exploratory, not validated)
@@ -81,10 +82,12 @@ Every candidate must include:
8182

8283
## Current State
8384

84-
Phase 4-1 (evidence grading pass) is active. Three documents published:
85+
Phase 4-1 (evidence grading pass): **complete**. Phase 4-2 (theory draft skeleton): **active**.
8586

86-
- **Evidence grading matrix**: systematic review of all 12 candidates with blockers, promotion conditions, and falsification conditions
87+
Four documents published:
88+
- **Runtime Morphology Theory Draft v0.1**: consolidated draft with supported/caveated/exploratory/deferred claims, boundaries, blockers, and upgrade path
89+
- **Evidence grading matrix**: systematic review of all 12 candidates
8790
- **Theory candidate inventory**: 12 candidates across 5 domains
88-
- **Safety-control runtime morphology**: 5 exploratory candidates defining the safety-control domain
91+
- **Safety-control runtime morphology**: 5 exploratory candidates
8992

90-
Grade distribution: 2 `supported`, 1 `supported_with_caveat`, 6 `exploratory`, 3 `deferred`. Evidence ceiling: `supported`. No candidate exceeds its grade. No candidate carries policy recommendations. Phase 4 is in theory drafting and evidence grading, not final theory publication.
93+
Grade distribution: 2 `supported`, 1 `supported_with_caveat`, 6 `exploratory`, 3 `deferred`. Evidence ceiling: `supported`. No candidate exceeds its grade. Phase 4 is in theory drafting, not final theory publication.
Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
# Runtime Morphology Theory Draft v0.1
2+
3+
**Status**: Draft. Evidence-graded. Not final theory.
4+
5+
This document consolidates all 12 Phase 4 theory candidates into a single structured draft. It separates current supported claims from caveated claims, exploratory directions, and deferred claims. It does not publish conclusions. It does not recommend policy.
6+
7+
## Corpus Snapshot
8+
9+
2026-06-13. 992 metadata sessions, 1,517 data sessions, 131,952 events. 7 runtimes, 9 task types. Native strict: 100 sessions. Lanes: `direct_prompt_native` (101), `superpowers_workflow_intervention` (8), `controlled_prompt_morphology` (3), `routed_prompt_intervention` (0).
10+
11+
---
12+
13+
## 1. Current Supported Claims
14+
15+
These claims have the strongest evidence in the current corpus. They are scoped to the `direct_prompt_native` lane and the current corpus snapshot. They are not universal claims.
16+
17+
### T-RM-001: dominant_chain as Default Native Morphology
18+
19+
| Field | Value |
20+
|-------|-------|
21+
| **Claim** | In the current native strict lane, `dominant_chain` is the default runtime morphology. |
22+
| **Grade** | `supported` |
23+
| **Lane** | `direct_prompt_native` |
24+
| **Denominator** | 100 native strict sessions |
25+
| **Evidence** | 93/100 native strict sessions exhibit dominant_chain. 5 runtimes represented (claude-code, opencode, codex, aider, Sisyphus). 8 task types. |
26+
| **Caveats** | Runtime distribution uneven (claude-code + opencode = 96%). Aider and Sisyphus under-represented (1 session each). |
27+
| **Falsification** | If >=15% of sessions in a new runtime show non-dominant_chain default morphology, the claim must be qualified per-runtime. |
28+
29+
### T-RM-002: multi_root_exploration as Minority Morphology
30+
31+
| Field | Value |
32+
|-------|-------|
33+
| **Claim** | `multi_root_exploration` is a minority morphology in native real_work sessions, not a default path. |
34+
| **Grade** | `supported` |
35+
| **Lane** | `direct_prompt_native` |
36+
| **Denominator** | 100 native strict sessions |
37+
| **Evidence** | 1/100 native strict sessions exhibit multi_root_exploration. Single session is opencode. |
38+
| **Caveats** | Low incidence may be task-mix dependent (current corpus dominated by feature_add and exploration). Not necessarily a universal property. |
39+
| **Falsification** | If >=5% of exploration or review sessions show multi_root, the "minority" claim needs task-type qualification. |
40+
41+
---
42+
43+
## 2. Caveated Claims
44+
45+
These claims have supporting evidence but carry explicit limitations that prevent them from reaching `supported`.
46+
47+
### T-RM-003: feature_add Tendency Toward dominant_chain
48+
49+
| Field | Value |
50+
|-------|-------|
51+
| **Claim** | In the current native lane, `feature_add` tasks tend toward `dominant_chain` topology. |
52+
| **Grade** | `supported_with_caveat` |
53+
| **Lane** | `direct_prompt_native` |
54+
| **Denominator** | 37 feature_add sessions in native strict |
55+
| **Evidence** | 37/37 feature_add sessions exhibit dominant_chain. |
56+
| **Why caveated** | Branch_collapse claim could not be tested. Single topology outcome may be artifact of task simplicity, not structural property of feature_add. Other task types not systematically compared. |
57+
| **Falsification** | If a feature_add session with >=100 events shows non-dominant_chain, or a multi-file feature_add shows multi_root or branchy topology, the claim must be qualified. |
58+
59+
---
60+
61+
## 3. Exploratory Directions
62+
63+
These directions have visible signals or literature support, but denominators are insufficient for confidence. None support policy, product, or strategy recommendations.
64+
65+
### T-WI-001: Superpowers Workflow May Amplify Trace Volume
66+
67+
| Field | Value |
68+
|-------|-------|
69+
| **Claim** | `superpowers_workflow_intervention` sessions may exhibit amplified event density and long-chain structure compared to native direct-prompt sessions. |
70+
| **Grade** | `exploratory` |
71+
| **Denominator** | 8 SP sessions (5 tagged, 3 manual annotation) |
72+
| **Blocker** | Single runtime (claude-code). 3 outlier sessions dominate lane metrics. No task annotation. Cross-lane comparison restricted to trends. |
73+
| **Promotion** | >=10 SP sessions across >=2 runtimes with task annotation and formal within-lane event density distribution. |
74+
75+
### T-SC-001: Safety-Control Boundaries May Alter Runtime Topology
76+
77+
| Field | Value |
78+
|-------|-------|
79+
| **Claim** | Safety-control boundaries such as need_review, hard-stop, and fallback rules may alter runtime morphology by increasing explicit stopping, clarification requests, or branch-collapse behavior. |
80+
| **Grade** | `exploratory` |
81+
| **Denominator** | TBD — no sessions annotated with safety-control boundary markers |
82+
| **Blocker** | No annotated safety-boundary sessions. need_review_triggered not instrumented. Safety-relevant task types not isolated. |
83+
| **Promotion** | >=10 sessions with annotated safety-control boundaries; comparison with matched non-safety tasks shows detectable difference. |
84+
85+
### T-SC-002: Task-Completion Pressure May Produce Safety-Control Collapse
86+
87+
| Field | Value |
88+
|-------|-------|
89+
| **Claim** | When task-completion pressure conflicts with safety-control boundaries, agents may exhibit safety-control collapse: continuing toward completion despite uncertainty, missing evidence, or required human review. |
90+
| **Grade** | `exploratory` |
91+
| **Denominator** | TBD — no operational collapse definition or annotated collapse sessions |
92+
| **Blocker** | No operational definition of safety-control collapse at tool-call level. No annotated collapse sessions. No baseline rate. |
93+
| **Promotion** | Operational collapse definition validated on >=5 sessions; collapse rate compared across >=2 lanes. |
94+
95+
### T-SC-003: Workflow Intervention May Reduce Unsafe Continuation
96+
97+
| Field | Value |
98+
|-------|-------|
99+
| **Claim** | Workflow interventions such as staged verification, routed constrained prompts, or superpowers-style workflows may reduce unsafe continuation, but may increase event_count and trace length. |
100+
| **Grade** | `exploratory` |
101+
| **Denominator** | 8 SP vs 101 native (asymmetric; trend reporting only) |
102+
| **Blocker** | Single-runtime SP lane. No safety-control signal annotation. Cross-lane comparison restricted to trends. |
103+
| **Promotion** | Safety-control signal annotation on >=10 sessions per lane; >=2 runtimes in SP lane. |
104+
105+
### T-SC-004: Human Intervention as External Safety-Control Signal
106+
107+
| Field | Value |
108+
|-------|-------|
109+
| **Claim** | Human intervention may function as an external safety-control signal that induces topology regime shifts distinguishable from self-correction patterns. |
110+
| **Grade** | `exploratory` |
111+
| **Denominator** | 5 native sessions with human_intervention=True |
112+
| **Blocker** | 5-session denominator. No intervention type annotation (safety-correction vs task-correction). No pre/post topology comparison. |
113+
| **Promotion** | >=10 human_intervention sessions with annotated intervention type; detectable regime shift in pre/post comparison. |
114+
115+
### T-SC-005: Near-Failure More Informative Than Final Labels
116+
117+
| Field | Value |
118+
|-------|-------|
119+
| **Claim** | Near-failure and safety-control recovery patterns may be more informative than final success/failure labels for understanding agent safety behavior. |
120+
| **Grade** | `exploratory` |
121+
| **Denominator** | 5 near-failure (human_intervention=True), 1 failure (success=False) |
122+
| **Blocker** | Near-failure definition limited to human_intervention=True. Small denominator. No internal pattern comparison with clean-success sessions. |
123+
| **Promotion** | Expanded near-failure definition; >=10 near-failure sessions; internal pattern comparison with matched clean-success sessions. |
124+
125+
---
126+
127+
## 4. Deferred Claims
128+
129+
These claims cannot be evaluated against the current corpus. They are deferred, not falsified.
130+
131+
### T-FM-001: Failure Morphology Underdetermined
132+
133+
| Field | Value |
134+
|-------|-------|
135+
| **Claim** | Current failure and near-failure sample density is insufficient to characterize failure morphology. |
136+
| **Grade** | `deferred` |
137+
| **Why deferred** | Tier 2 criteria not met: native failure 1/10, near-failure 5/10. |
138+
| **Reopen condition** | Native failure >= 10 AND near-failure >= 10 AND multi-runtime failure coverage >= 3. |
139+
140+
### T-RP-001: Routed-Prompt Morphology Unobserved
141+
142+
| Field | Value |
143+
|-------|-------|
144+
| **Claim** | `routed_prompt_intervention` morphology is currently unobserved. No theory statement can be made. |
145+
| **Grade** | `deferred` |
146+
| **Why deferred** | 0 tagged routed sessions. Parser detection gate BLOCKED. |
147+
| **Reopen condition** | >=5 tagged routed sessions with causetrace_tags; gate OPEN. |
148+
149+
### T-PM-001: Controlled Prompt Morphology at Pilot-Level Evidence
150+
151+
| Field | Value |
152+
|-------|-------|
153+
| **Claim** | Controlled prompt morphology comparison is at pilot-level evidence only. Prompt posture effects on topology are not characterized. |
154+
| **Grade** | `deferred` |
155+
| **Why deferred** | 3 pilot sessions with no variant tagging. Controlled benchmark protocol not operational. Gate BLOCKED. |
156+
| **Reopen condition** | Controlled benchmark protocol operational; >=5 sessions per variant with prompt tags. |
157+
158+
---
159+
160+
## 5. Theory Boundaries
161+
162+
This draft explicitly does NOT support:
163+
164+
| Exclusion | Rationale |
165+
|-----------|-----------|
166+
| Prediction of agent behavior | No model trained; claims are descriptive, not predictive |
167+
| Anomaly detection | No baseline distribution for "normal" morphology across all conditions |
168+
| Automatic diagnosis | Morphology interpretation is contextual, not automatable |
169+
| Universal prompt policy | No evidence that isolated findings generalize across runtimes/tasks |
170+
| Routing default strategy | Routed lane has 0 sessions; no comparison basis |
171+
| Safety-control automation | All T-SC candidates are `exploratory`; no operational definitions validated |
172+
| Cross-lane aggregation without disclosure | Violates Phase 3E lane separation rules |
173+
| Promotion of `exploratory` or `deferred` claims | Violates evidence grading rules |
174+
175+
---
176+
177+
## 6. Known Blockers
178+
179+
| Blocker | Affected Candidates | Resolution |
180+
|---------|---------------------|------------|
181+
| Failure sample scarcity | T-FM-001 | Background acquisition; no timeline commitment |
182+
| Near-failure density | T-FM-001, T-SC-005 | Expand near-failure definition; pilot annotation |
183+
| Routed lane absence | T-RP-001 | Upstream prompt-routing-skill tag emission testing |
184+
| Controlled benchmark gap | T-PM-001 | Define and operationalize benchmark protocol |
185+
| Intervention lane small denominators | T-WI-001, T-SC-003 | Natural accumulation; no forced data generation |
186+
| Single-runtime SP lane | T-WI-001, T-SC-003 | Natural diversification |
187+
| No safety-control annotation | T-SC-001 through T-SC-005 | Pilot annotation on safety-relevant session subset |
188+
| No operational collapse definition | T-SC-002 | Define using observable tool-call patterns |
189+
| No task annotation for intervention lanes | T-WI-001, T-PM-001 | Task type annotation pass (low priority) |
190+
| Per-runtime imbalance | T-RM-001, T-RM-002 | Natural diversification; monitor per-runtime |
191+
192+
---
193+
194+
## 7. Evidence Upgrade Path
195+
196+
| From | To | Required |
197+
|------|----|----------|
198+
| `deferred` | `exploratory` | Gate condition met (denominator threshold, tag accumulation, or protocol operational) |
199+
| `exploratory` | `supported_with_caveat` | Denominator >= 10 in lane, >= 2 runtimes, task annotation present, blocker resolved |
200+
| `supported_with_caveat` | `supported` | Caveat resolved with additional evidence; multi-condition comparison complete |
201+
| `supported` || Ceiling. No higher grade under Phase 4 rules. |
202+
203+
Upgrades are triggered by corpus growth, not calendar. No upgrade is automatic.
204+
205+
---
206+
207+
## 8. Maintenance
208+
209+
This draft is a living document. Update when:
210+
211+
- Corpus snapshot changes materially (next grading pass trigger)
212+
- A gate opens (routed or controlled lanes reach threshold)
213+
- A new blocker is identified or an existing blocker is resolved
214+
- A candidate is promoted, demoted, or falsified
215+
- A structural evidence gap is filled (safety-control annotation, task annotation, runtime diversification)
216+
217+
Do NOT update to add new theory candidates without a grading pass that identifies a genuine structural gap.
218+
219+
---
220+
221+
## References
222+
223+
- [Evidence Grading Matrix](evidence_grading_matrix_v0.2.5.md) — Full candidate details with individual blocker/promotion/falsification records
224+
- [Theory Candidate Inventory](theory_candidate_inventory_v0.2.5.md) — Original candidate definitions and domain map
225+
- [Safety-Control Runtime Morphology](safety_control_morphology_candidates_v0.2.5.md) — Full T-SC candidate definitions, observable signals, non-goals
226+
- [Phase 3D Closure Report](../phase3d/closure_report_v0.2.5.md) — Hypothesis registry and Tier 1 validation
227+
- [Phase 3E Closure Report](../phase3e/closure_report_v0.2.5.md) — Intervention lane infrastructure and Tier 2 deferral

0 commit comments

Comments
 (0)