|
| 1 | +# Runtime Morphology Theory Draft v0.1 |
| 2 | + |
| 3 | +**Status**: Draft. Evidence-graded. Not final theory. |
| 4 | + |
| 5 | +This document consolidates all 12 Phase 4 theory candidates into a single structured draft. It separates current supported claims from caveated claims, exploratory directions, and deferred claims. It does not publish conclusions. It does not recommend policy. |
| 6 | + |
| 7 | +## Corpus Snapshot |
| 8 | + |
| 9 | +2026-06-13. 992 metadata sessions, 1,517 data sessions, 131,952 events. 7 runtimes, 9 task types. Native strict: 100 sessions. Lanes: `direct_prompt_native` (101), `superpowers_workflow_intervention` (8), `controlled_prompt_morphology` (3), `routed_prompt_intervention` (0). |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## 1. Current Supported Claims |
| 14 | + |
| 15 | +These claims have the strongest evidence in the current corpus. They are scoped to the `direct_prompt_native` lane and the current corpus snapshot. They are not universal claims. |
| 16 | + |
| 17 | +### T-RM-001: dominant_chain as Default Native Morphology |
| 18 | + |
| 19 | +| Field | Value | |
| 20 | +|-------|-------| |
| 21 | +| **Claim** | In the current native strict lane, `dominant_chain` is the default runtime morphology. | |
| 22 | +| **Grade** | `supported` | |
| 23 | +| **Lane** | `direct_prompt_native` | |
| 24 | +| **Denominator** | 100 native strict sessions | |
| 25 | +| **Evidence** | 93/100 native strict sessions exhibit dominant_chain. 5 runtimes represented (claude-code, opencode, codex, aider, Sisyphus). 8 task types. | |
| 26 | +| **Caveats** | Runtime distribution uneven (claude-code + opencode = 96%). Aider and Sisyphus under-represented (1 session each). | |
| 27 | +| **Falsification** | If >=15% of sessions in a new runtime show non-dominant_chain default morphology, the claim must be qualified per-runtime. | |
| 28 | + |
| 29 | +### T-RM-002: multi_root_exploration as Minority Morphology |
| 30 | + |
| 31 | +| Field | Value | |
| 32 | +|-------|-------| |
| 33 | +| **Claim** | `multi_root_exploration` is a minority morphology in native real_work sessions, not a default path. | |
| 34 | +| **Grade** | `supported` | |
| 35 | +| **Lane** | `direct_prompt_native` | |
| 36 | +| **Denominator** | 100 native strict sessions | |
| 37 | +| **Evidence** | 1/100 native strict sessions exhibit multi_root_exploration. Single session is opencode. | |
| 38 | +| **Caveats** | Low incidence may be task-mix dependent (current corpus dominated by feature_add and exploration). Not necessarily a universal property. | |
| 39 | +| **Falsification** | If >=5% of exploration or review sessions show multi_root, the "minority" claim needs task-type qualification. | |
| 40 | + |
| 41 | +--- |
| 42 | + |
| 43 | +## 2. Caveated Claims |
| 44 | + |
| 45 | +These claims have supporting evidence but carry explicit limitations that prevent them from reaching `supported`. |
| 46 | + |
| 47 | +### T-RM-003: feature_add Tendency Toward dominant_chain |
| 48 | + |
| 49 | +| Field | Value | |
| 50 | +|-------|-------| |
| 51 | +| **Claim** | In the current native lane, `feature_add` tasks tend toward `dominant_chain` topology. | |
| 52 | +| **Grade** | `supported_with_caveat` | |
| 53 | +| **Lane** | `direct_prompt_native` | |
| 54 | +| **Denominator** | 37 feature_add sessions in native strict | |
| 55 | +| **Evidence** | 37/37 feature_add sessions exhibit dominant_chain. | |
| 56 | +| **Why caveated** | Branch_collapse claim could not be tested. Single topology outcome may be artifact of task simplicity, not structural property of feature_add. Other task types not systematically compared. | |
| 57 | +| **Falsification** | If a feature_add session with >=100 events shows non-dominant_chain, or a multi-file feature_add shows multi_root or branchy topology, the claim must be qualified. | |
| 58 | + |
| 59 | +--- |
| 60 | + |
| 61 | +## 3. Exploratory Directions |
| 62 | + |
| 63 | +These directions have visible signals or literature support, but denominators are insufficient for confidence. None support policy, product, or strategy recommendations. |
| 64 | + |
| 65 | +### T-WI-001: Superpowers Workflow May Amplify Trace Volume |
| 66 | + |
| 67 | +| Field | Value | |
| 68 | +|-------|-------| |
| 69 | +| **Claim** | `superpowers_workflow_intervention` sessions may exhibit amplified event density and long-chain structure compared to native direct-prompt sessions. | |
| 70 | +| **Grade** | `exploratory` | |
| 71 | +| **Denominator** | 8 SP sessions (5 tagged, 3 manual annotation) | |
| 72 | +| **Blocker** | Single runtime (claude-code). 3 outlier sessions dominate lane metrics. No task annotation. Cross-lane comparison restricted to trends. | |
| 73 | +| **Promotion** | >=10 SP sessions across >=2 runtimes with task annotation and formal within-lane event density distribution. | |
| 74 | + |
| 75 | +### T-SC-001: Safety-Control Boundaries May Alter Runtime Topology |
| 76 | + |
| 77 | +| Field | Value | |
| 78 | +|-------|-------| |
| 79 | +| **Claim** | Safety-control boundaries such as need_review, hard-stop, and fallback rules may alter runtime morphology by increasing explicit stopping, clarification requests, or branch-collapse behavior. | |
| 80 | +| **Grade** | `exploratory` | |
| 81 | +| **Denominator** | TBD — no sessions annotated with safety-control boundary markers | |
| 82 | +| **Blocker** | No annotated safety-boundary sessions. need_review_triggered not instrumented. Safety-relevant task types not isolated. | |
| 83 | +| **Promotion** | >=10 sessions with annotated safety-control boundaries; comparison with matched non-safety tasks shows detectable difference. | |
| 84 | + |
| 85 | +### T-SC-002: Task-Completion Pressure May Produce Safety-Control Collapse |
| 86 | + |
| 87 | +| Field | Value | |
| 88 | +|-------|-------| |
| 89 | +| **Claim** | When task-completion pressure conflicts with safety-control boundaries, agents may exhibit safety-control collapse: continuing toward completion despite uncertainty, missing evidence, or required human review. | |
| 90 | +| **Grade** | `exploratory` | |
| 91 | +| **Denominator** | TBD — no operational collapse definition or annotated collapse sessions | |
| 92 | +| **Blocker** | No operational definition of safety-control collapse at tool-call level. No annotated collapse sessions. No baseline rate. | |
| 93 | +| **Promotion** | Operational collapse definition validated on >=5 sessions; collapse rate compared across >=2 lanes. | |
| 94 | + |
| 95 | +### T-SC-003: Workflow Intervention May Reduce Unsafe Continuation |
| 96 | + |
| 97 | +| Field | Value | |
| 98 | +|-------|-------| |
| 99 | +| **Claim** | Workflow interventions such as staged verification, routed constrained prompts, or superpowers-style workflows may reduce unsafe continuation, but may increase event_count and trace length. | |
| 100 | +| **Grade** | `exploratory` | |
| 101 | +| **Denominator** | 8 SP vs 101 native (asymmetric; trend reporting only) | |
| 102 | +| **Blocker** | Single-runtime SP lane. No safety-control signal annotation. Cross-lane comparison restricted to trends. | |
| 103 | +| **Promotion** | Safety-control signal annotation on >=10 sessions per lane; >=2 runtimes in SP lane. | |
| 104 | + |
| 105 | +### T-SC-004: Human Intervention as External Safety-Control Signal |
| 106 | + |
| 107 | +| Field | Value | |
| 108 | +|-------|-------| |
| 109 | +| **Claim** | Human intervention may function as an external safety-control signal that induces topology regime shifts distinguishable from self-correction patterns. | |
| 110 | +| **Grade** | `exploratory` | |
| 111 | +| **Denominator** | 5 native sessions with human_intervention=True | |
| 112 | +| **Blocker** | 5-session denominator. No intervention type annotation (safety-correction vs task-correction). No pre/post topology comparison. | |
| 113 | +| **Promotion** | >=10 human_intervention sessions with annotated intervention type; detectable regime shift in pre/post comparison. | |
| 114 | + |
| 115 | +### T-SC-005: Near-Failure More Informative Than Final Labels |
| 116 | + |
| 117 | +| Field | Value | |
| 118 | +|-------|-------| |
| 119 | +| **Claim** | Near-failure and safety-control recovery patterns may be more informative than final success/failure labels for understanding agent safety behavior. | |
| 120 | +| **Grade** | `exploratory` | |
| 121 | +| **Denominator** | 5 near-failure (human_intervention=True), 1 failure (success=False) | |
| 122 | +| **Blocker** | Near-failure definition limited to human_intervention=True. Small denominator. No internal pattern comparison with clean-success sessions. | |
| 123 | +| **Promotion** | Expanded near-failure definition; >=10 near-failure sessions; internal pattern comparison with matched clean-success sessions. | |
| 124 | + |
| 125 | +--- |
| 126 | + |
| 127 | +## 4. Deferred Claims |
| 128 | + |
| 129 | +These claims cannot be evaluated against the current corpus. They are deferred, not falsified. |
| 130 | + |
| 131 | +### T-FM-001: Failure Morphology Underdetermined |
| 132 | + |
| 133 | +| Field | Value | |
| 134 | +|-------|-------| |
| 135 | +| **Claim** | Current failure and near-failure sample density is insufficient to characterize failure morphology. | |
| 136 | +| **Grade** | `deferred` | |
| 137 | +| **Why deferred** | Tier 2 criteria not met: native failure 1/10, near-failure 5/10. | |
| 138 | +| **Reopen condition** | Native failure >= 10 AND near-failure >= 10 AND multi-runtime failure coverage >= 3. | |
| 139 | + |
| 140 | +### T-RP-001: Routed-Prompt Morphology Unobserved |
| 141 | + |
| 142 | +| Field | Value | |
| 143 | +|-------|-------| |
| 144 | +| **Claim** | `routed_prompt_intervention` morphology is currently unobserved. No theory statement can be made. | |
| 145 | +| **Grade** | `deferred` | |
| 146 | +| **Why deferred** | 0 tagged routed sessions. Parser detection gate BLOCKED. | |
| 147 | +| **Reopen condition** | >=5 tagged routed sessions with causetrace_tags; gate OPEN. | |
| 148 | + |
| 149 | +### T-PM-001: Controlled Prompt Morphology at Pilot-Level Evidence |
| 150 | + |
| 151 | +| Field | Value | |
| 152 | +|-------|-------| |
| 153 | +| **Claim** | Controlled prompt morphology comparison is at pilot-level evidence only. Prompt posture effects on topology are not characterized. | |
| 154 | +| **Grade** | `deferred` | |
| 155 | +| **Why deferred** | 3 pilot sessions with no variant tagging. Controlled benchmark protocol not operational. Gate BLOCKED. | |
| 156 | +| **Reopen condition** | Controlled benchmark protocol operational; >=5 sessions per variant with prompt tags. | |
| 157 | + |
| 158 | +--- |
| 159 | + |
| 160 | +## 5. Theory Boundaries |
| 161 | + |
| 162 | +This draft explicitly does NOT support: |
| 163 | + |
| 164 | +| Exclusion | Rationale | |
| 165 | +|-----------|-----------| |
| 166 | +| Prediction of agent behavior | No model trained; claims are descriptive, not predictive | |
| 167 | +| Anomaly detection | No baseline distribution for "normal" morphology across all conditions | |
| 168 | +| Automatic diagnosis | Morphology interpretation is contextual, not automatable | |
| 169 | +| Universal prompt policy | No evidence that isolated findings generalize across runtimes/tasks | |
| 170 | +| Routing default strategy | Routed lane has 0 sessions; no comparison basis | |
| 171 | +| Safety-control automation | All T-SC candidates are `exploratory`; no operational definitions validated | |
| 172 | +| Cross-lane aggregation without disclosure | Violates Phase 3E lane separation rules | |
| 173 | +| Promotion of `exploratory` or `deferred` claims | Violates evidence grading rules | |
| 174 | + |
| 175 | +--- |
| 176 | + |
| 177 | +## 6. Known Blockers |
| 178 | + |
| 179 | +| Blocker | Affected Candidates | Resolution | |
| 180 | +|---------|---------------------|------------| |
| 181 | +| Failure sample scarcity | T-FM-001 | Background acquisition; no timeline commitment | |
| 182 | +| Near-failure density | T-FM-001, T-SC-005 | Expand near-failure definition; pilot annotation | |
| 183 | +| Routed lane absence | T-RP-001 | Upstream prompt-routing-skill tag emission testing | |
| 184 | +| Controlled benchmark gap | T-PM-001 | Define and operationalize benchmark protocol | |
| 185 | +| Intervention lane small denominators | T-WI-001, T-SC-003 | Natural accumulation; no forced data generation | |
| 186 | +| Single-runtime SP lane | T-WI-001, T-SC-003 | Natural diversification | |
| 187 | +| No safety-control annotation | T-SC-001 through T-SC-005 | Pilot annotation on safety-relevant session subset | |
| 188 | +| No operational collapse definition | T-SC-002 | Define using observable tool-call patterns | |
| 189 | +| No task annotation for intervention lanes | T-WI-001, T-PM-001 | Task type annotation pass (low priority) | |
| 190 | +| Per-runtime imbalance | T-RM-001, T-RM-002 | Natural diversification; monitor per-runtime | |
| 191 | + |
| 192 | +--- |
| 193 | + |
| 194 | +## 7. Evidence Upgrade Path |
| 195 | + |
| 196 | +| From | To | Required | |
| 197 | +|------|----|----------| |
| 198 | +| `deferred` | `exploratory` | Gate condition met (denominator threshold, tag accumulation, or protocol operational) | |
| 199 | +| `exploratory` | `supported_with_caveat` | Denominator >= 10 in lane, >= 2 runtimes, task annotation present, blocker resolved | |
| 200 | +| `supported_with_caveat` | `supported` | Caveat resolved with additional evidence; multi-condition comparison complete | |
| 201 | +| `supported` | — | Ceiling. No higher grade under Phase 4 rules. | |
| 202 | + |
| 203 | +Upgrades are triggered by corpus growth, not calendar. No upgrade is automatic. |
| 204 | + |
| 205 | +--- |
| 206 | + |
| 207 | +## 8. Maintenance |
| 208 | + |
| 209 | +This draft is a living document. Update when: |
| 210 | + |
| 211 | +- Corpus snapshot changes materially (next grading pass trigger) |
| 212 | +- A gate opens (routed or controlled lanes reach threshold) |
| 213 | +- A new blocker is identified or an existing blocker is resolved |
| 214 | +- A candidate is promoted, demoted, or falsified |
| 215 | +- A structural evidence gap is filled (safety-control annotation, task annotation, runtime diversification) |
| 216 | + |
| 217 | +Do NOT update to add new theory candidates without a grading pass that identifies a genuine structural gap. |
| 218 | + |
| 219 | +--- |
| 220 | + |
| 221 | +## References |
| 222 | + |
| 223 | +- [Evidence Grading Matrix](evidence_grading_matrix_v0.2.5.md) — Full candidate details with individual blocker/promotion/falsification records |
| 224 | +- [Theory Candidate Inventory](theory_candidate_inventory_v0.2.5.md) — Original candidate definitions and domain map |
| 225 | +- [Safety-Control Runtime Morphology](safety_control_morphology_candidates_v0.2.5.md) — Full T-SC candidate definitions, observable signals, non-goals |
| 226 | +- [Phase 3D Closure Report](../phase3d/closure_report_v0.2.5.md) — Hypothesis registry and Tier 1 validation |
| 227 | +- [Phase 3E Closure Report](../phase3e/closure_report_v0.2.5.md) — Intervention lane infrastructure and Tier 2 deferral |
0 commit comments