|
| 1 | +# Phase 4 Theory Candidate Inventory v0.2.5 |
| 2 | + |
| 3 | +This document lists all current runtime morphology theory candidates with evidence grades, supporting data, caveats, and falsification conditions. It consolidates Phase 3D hypothesis validation results and Phase 3E intervention-aware findings. |
| 4 | + |
| 5 | +No candidate here is a finalized theory. All are drafts with explicit evidence boundaries. |
| 6 | + |
| 7 | +## Corpus Snapshot |
| 8 | + |
| 9 | +- Date: 2026-06-13 |
| 10 | +- Metadata sessions: 992 |
| 11 | +- Data sessions: 1,517 |
| 12 | +- Events: 131,952 |
| 13 | +- Runtime breadth: 7 |
| 14 | +- Task breadth: 9 |
| 15 | +- Native strict sessions: 100 |
| 16 | +- Lanes: direct_prompt_native (101), superpowers_workflow_intervention (8), controlled_prompt_morphology (3), routed_prompt_intervention (0) |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## T-RM-001: Dominant Chain as Default Native Morphology |
| 21 | + |
| 22 | +| Field | Value | |
| 23 | +|-------|-------| |
| 24 | +| **Claim** | In the current native strict lane, `dominant_chain` is the default runtime morphology. | |
| 25 | +| **Evidence grade** | `supported` | |
| 26 | +| **Lane** | `direct_prompt_native` | |
| 27 | +| **Denominator** | 100 native strict sessions | |
| 28 | +| **Supporting data** | 93/100 native strict sessions exhibit dominant_chain topology. | |
| 29 | +| **Runtime distribution** | claude-code (50), opencode (46), codex (3), aider (1), Sisyphus (1) — 5 runtimes | |
| 30 | +| **Task distribution** | 8 task types represented; feature_add (37), exploration (28), bug_fix (12) are top 3 | |
| 31 | +| **Caveats** | Runtime distribution is uneven (claude-code + opencode = 96%). Aider and Sisyphus under-represented. | |
| 32 | +| **Falsification condition** | If >=15% of native strict sessions in a new runtime show non-dominant_chain default morphology, this candidate must be qualified per-runtime. | |
| 33 | +| **Status** | `active` | |
| 34 | +| **Source hypotheses** | H-RM-001 (Phase 3D Tier 1, supported) | |
| 35 | + |
| 36 | +## T-RM-002: Multi-Root Exploration as Minority Morphology |
| 37 | + |
| 38 | +| Field | Value | |
| 39 | +|-------|-------| |
| 40 | +| **Claim** | `multi_root_exploration` is a minority morphology in native real_work sessions, not a default path. | |
| 41 | +| **Evidence grade** | `supported` | |
| 42 | +| **Lane** | `direct_prompt_native` | |
| 43 | +| **Denominator** | 100 native strict sessions | |
| 44 | +| **Supporting data** | 1/100 native strict sessions exhibit multi_root_exploration. | |
| 45 | +| **Runtime distribution** | The single multi_root session is opencode. | |
| 46 | +| **Task distribution** | N/A (single session) | |
| 47 | +| **Caveats** | Low incidence rate may be a property of the current task mix (dominated by feature_add and exploration), not a universal property. | |
| 48 | +| **Falsification condition** | If >=5% of native sessions in exploration or review task types show multi_root_exploration, the "minority" claim needs qualification. | |
| 49 | +| **Status** | `active` | |
| 50 | +| **Source hypotheses** | H-RM-003 (Phase 3D Tier 1, supported) | |
| 51 | + |
| 52 | +## T-RM-003: Feature_Add Tendency Toward Dominant Chain |
| 53 | + |
| 54 | +| Field | Value | |
| 55 | +|-------|-------| |
| 56 | +| **Claim** | In the current native lane, `feature_add` tasks tend toward `dominant_chain` topology. | |
| 57 | +| **Evidence grade** | `supported_with_caveat` | |
| 58 | +| **Lane** | `direct_prompt_native` | |
| 59 | +| **Denominator** | 37 feature_add sessions in native strict | |
| 60 | +| **Supporting data** | 37/37 feature_add sessions exhibit dominant_chain. Branch collapse was not testable (insufficient collapse samples). | |
| 61 | +| **Runtime distribution** | Primarily claude-code and opencode | |
| 62 | +| **Caveats** | Single topology outcome may be an artifact of task simplicity in the current corpus, not a structural property of feature_add. Branch collapse claim could not be evaluated. | |
| 63 | +| **Falsification condition** | If a feature_add session with >=100 events shows non-dominant_chain topology, or if a multi-file feature_add session shows multi_root or branchy topology, the claim must be qualified. | |
| 64 | +| **Status** | `active` | |
| 65 | +| **Source hypotheses** | H-TT-002 (Phase 3D Tier 1, supported with caveat) | |
| 66 | + |
| 67 | +## T-WI-001: Superpowers Workflow May Amplify Trace Volume |
| 68 | + |
| 69 | +| Field | Value | |
| 70 | +|-------|-------| |
| 71 | +| **Claim** | `superpowers_workflow_intervention` sessions may exhibit amplified event density and long-chain structure compared to native direct-prompt sessions, but sample size is insufficient for stable comparison. | |
| 72 | +| **Evidence grade** | `exploratory` | |
| 73 | +| **Lane** | `superpowers_workflow_intervention` | |
| 74 | +| **Denominator** | 8 sessions (5 tagged, 3 manual annotation) | |
| 75 | +| **Supporting data** | 3 large SP sessions account for 41,221 of 42,465 lane events (avg ~13,740 events/session). Native lane avg: 318 events/session. No formal comparison performed (cross-lane comparison restricted to trend reporting only). | |
| 76 | +| **Runtime distribution** | claude-code only (8/8) | |
| 77 | +| **Task distribution** | Not annotated for SP lane sessions | |
| 78 | +| **Caveats** | Single-runtime. 3 outlier sessions dominate lane metrics. Not a validated finding — exploratory observation only. Must not be generalized to "superpowers always amplifies trace volume." | |
| 79 | +| **Falsification condition** | If 10+ additional SP sessions across >=2 runtimes show event density within native range (200-500 events/session), the amplification signal may be an artifact of the 3 large annotation sessions. | |
| 80 | +| **Status** | `active` | |
| 81 | +| **Source hypotheses** | None direct; derived from Phase 3E-1 lane baseline observation | |
| 82 | + |
| 83 | +## T-FM-001: Failure Morphology Underdetermined |
| 84 | + |
| 85 | +| Field | Value | |
| 86 | +|-------|-------| |
| 87 | +| **Claim** | Current failure and near-failure sample density is insufficient to characterize failure morphology. Failure topology cannot be typed. | |
| 88 | +| **Evidence grade** | `deferred` | |
| 89 | +| **Lane** | `direct_prompt_native` | |
| 90 | +| **Denominator** | 1 native failure (success=False), 5 near-failure (human_intervention=True) out of 101 native sessions | |
| 91 | +| **Supporting data** | 1/101 native failure, 5/101 near-failure. Tier 2 readiness: failure 1/10 NOT MET, near-failure 5/10 NOT MET. | |
| 92 | +| **Runtime distribution** | N/A | |
| 93 | +| **Task distribution** | N/A | |
| 94 | +| **Caveats** | Low failure rate may reflect genuine agent effectiveness for current task types, or insufficient coverage of failure-prone task categories. | |
| 95 | +| **Falsification condition** | When native failure >= 10 and near-failure >= 10, re-evaluate. If failure topology is then characterizable, this deferral is resolved. | |
| 96 | +| **Status** | `active` | |
| 97 | +| **Source hypotheses** | H-FM-001, H-FM-002, H-EV-004, H-EV-005 (Phase 3D Tier 2, all deferred) | |
| 98 | + |
| 99 | +## T-RP-001: Routed-Prompt Morphology Unobserved |
| 100 | + |
| 101 | +| Field | Value | |
| 102 | +|-------|-------| |
| 103 | +| **Claim** | `routed_prompt_intervention` morphology is currently unobserved. No theory statement can be made about the effect of prompt routing on topology. | |
| 104 | +| **Evidence grade** | `deferred` | |
| 105 | +| **Lane** | `routed_prompt_intervention` | |
| 106 | +| **Denominator** | 0 sessions | |
| 107 | +| **Supporting data** | prompt-routing-skill tag emission spec is defined. Capture path exists. 0 tagged sessions in corpus. Parser detection gate BLOCKED. | |
| 108 | +| **Runtime distribution** | N/A | |
| 109 | +| **Task distribution** | N/A | |
| 110 | +| **Caveats** | Absence is a corpus gap, not evidence that routing has no effect. | |
| 111 | +| **Falsification condition** | When >=5 routed sessions carry causetrace_tags, gate opens and basic lane characterization can begin. | |
| 112 | +| **Status** | `active` | |
| 113 | +| **Source hypotheses** | None (lane unpopulated) | |
| 114 | + |
| 115 | +## T-PM-001: Controlled Prompt Morphology at Pilot-Level Evidence |
| 116 | + |
| 117 | +| Field | Value | |
| 118 | +|-------|-------| |
| 119 | +| **Claim** | Controlled prompt morphology comparison is at pilot-level evidence only. Prompt posture effects on topology are not characterized. | |
| 120 | +| **Evidence grade** | `deferred` | |
| 121 | +| **Lane** | `controlled_prompt_morphology` | |
| 122 | +| **Denominator** | 3 pilot sessions | |
| 123 | +| **Supporting data** | 3 sessions, 135 events total, avg 45 events/session. No prompt variant labeling. Parser detection gate BLOCKED. | |
| 124 | +| **Runtime distribution** | claude-code only | |
| 125 | +| **Task distribution** | Not annotated | |
| 126 | +| **Caveats** | Pilot sessions are minimal and lack variant tagging. Cannot distinguish A/B/C prompt postures. | |
| 127 | +| **Falsification condition** | When controlled benchmark protocol is operational and >=5 sessions per variant carry prompt tags, re-evaluate. | |
| 128 | +| **Status** | `active` | |
| 129 | +| **Source hypotheses** | H-EG-001 (Phase 3D Tier 3, deferred) | |
| 130 | + |
| 131 | +--- |
| 132 | + |
| 133 | +## Evidence Grade Distribution |
| 134 | + |
| 135 | +| Grade | Count | Candidates | |
| 136 | +|-------|-------|------------| |
| 137 | +| `supported` | 2 | T-RM-001, T-RM-002 | |
| 138 | +| `supported_with_caveat` | 1 | T-RM-003 | |
| 139 | +| `exploratory` | 1 | T-WI-001 | |
| 140 | +| `deferred` | 3 | T-FM-001, T-RP-001, T-PM-001 | |
| 141 | +| `inconclusive` | 0 | — | |
| 142 | + |
| 143 | +## Theory Domain Map |
| 144 | + |
| 145 | +``` |
| 146 | +Runtime Morphology (T-RM) |
| 147 | +├── T-RM-001: dominant_chain as default [supported] |
| 148 | +├── T-RM-002: multi_root as minority [supported] |
| 149 | +└── T-RM-003: feature_add → dominant_chain [supported_with_caveat] |
| 150 | +
|
| 151 | +Workflow Intervention (T-WI) |
| 152 | +└── T-WI-001: SP may amplify trace volume [exploratory] |
| 153 | +
|
| 154 | +Failure Morphology (T-FM) |
| 155 | +└── T-FM-001: failure morphology underdetermined [deferred] |
| 156 | +
|
| 157 | +Routed Prompt (T-RP) |
| 158 | +└── T-RP-001: routed-prompt unobserved [deferred] |
| 159 | +
|
| 160 | +Prompt Morphology (T-PM) |
| 161 | +└── T-PM-001: controlled prompt pilot-only [deferred] |
| 162 | +``` |
| 163 | + |
| 164 | +## Operating Rules |
| 165 | + |
| 166 | +- Do not promote a candidate beyond its evidence grade without new corpus evidence. |
| 167 | +- Do not remove deferred candidates — they document gaps, not failures. |
| 168 | +- Do not merge T-WI-001 into native morphology conclusions. |
| 169 | +- Do not use T-RM-001 as a universal claim — it is scoped to the current native strict lane. |
| 170 | +- All deferred candidates carry explicit re-evaluation criteria. |
| 171 | +- Negative spaces (T-FM-001, T-RP-001, T-PM-001) are first-class entries. |
| 172 | + |
| 173 | +## What Is NOT Here |
| 174 | + |
| 175 | +- Prediction models or anomaly scorers |
| 176 | +- Automatic diagnosis rules |
| 177 | +- Universal prompt policy recommendations |
| 178 | +- Cross-lane aggregated claims |
| 179 | +- Claims without denominators |
| 180 | +- Claims without falsification conditions |
| 181 | +- Tool-specific topology prescriptions |
0 commit comments