|
| 1 | +# Phase 3E: Controlled Transition & Intervention-aware Validation |
| 2 | + |
| 3 | +Phase 3E validates selected runtime morphology hypotheses under controlled or intervention-aware conditions. It does not enter Phase 4 theory finalization. |
| 4 | + |
| 5 | +## Position |
| 6 | + |
| 7 | +- Phase 3D: complete |
| 8 | +- Phase 3E: active |
| 9 | +- Phase 4: not open |
| 10 | + |
| 11 | +## Mission |
| 12 | + |
| 13 | +Validate the relationship between events, observations, interventions, workflow conditions and topology transitions. Specifically: |
| 14 | + |
| 15 | +``` |
| 16 | +event / observation / intervention / workflow condition |
| 17 | +→ topology transition |
| 18 | +``` |
| 19 | + |
| 20 | +## Scope |
| 21 | + |
| 22 | +### Active lanes (all kept separate) |
| 23 | + |
| 24 | +| Lane | Description | Merge into native? | |
| 25 | +|------|-------------|-------------------| |
| 26 | +| `direct_prompt_native` | User gave the agent a task directly | Baseline | |
| 27 | +| `routed_prompt_intervention` | `prompt-routing-skill` selected posture first | No | |
| 28 | +| `superpowers_workflow_intervention` | Structured workflow plugin changed execution shape | No | |
| 29 | +| `controlled_prompt_morphology` | Controlled prompt comparison or pilot run | No | |
| 30 | +| `external_trajectory` | External data source | No | |
| 31 | + |
| 32 | +### Validation targets |
| 33 | + |
| 34 | +- controlled benchmark protocol activation |
| 35 | +- intervention lane comparison (routed vs superpowers vs controlled) |
| 36 | +- correction-trigger studies (test failure, tool error, human correction, explicit correction mark) |
| 37 | +- observation-triggered transition studies (contradictory outputs, test failures, shell errors) |
| 38 | +- prompt posture / routing impact on retry, branch, and convergence |
| 39 | +- workflow intervention impact on topology (superpowers, subagent dispatching, structured workflows) |
| 40 | +- Tier 2 sample natural accumulation (failure, near-failure, human-intervention) with opportunistic validation |
| 41 | + |
| 42 | +### Evidence gates |
| 43 | + |
| 44 | +Every Phase 3E claim must report: |
| 45 | + |
| 46 | +- lane |
| 47 | +- corpus snapshot |
| 48 | +- denominator |
| 49 | +- runtime distribution |
| 50 | +- task_type distribution |
| 51 | +- intervention type (if applicable) |
| 52 | +- whether the result is exploratory or validation-grade |
| 53 | + |
| 54 | +## Non-goals |
| 55 | + |
| 56 | +- Phase 4 theory finalization |
| 57 | +- Prediction of agent behavior |
| 58 | +- Anomaly scoring or detection |
| 59 | +- Automatic diagnosis |
| 60 | +- Universal prompt policy recommendations |
| 61 | +- Cross-lane aggregation without lane disclosure |
| 62 | +- Promoting Tier 2 hypotheses to conclusions without sufficient evidence |
| 63 | +- Changing topology taxonomy |
| 64 | +- Changing readiness gates without explicit justification |
| 65 | + |
| 66 | +## Deferred Hypotheses Carried Forward |
| 67 | + |
| 68 | +### From Phase 3D Tier 2 (failure / intervention morphology) |
| 69 | + |
| 70 | +Registry entries, not validated. Validation deferred until corpus naturally accumulates more samples. |
| 71 | + |
| 72 | +- H-FM-001: failure/near-failure sessions enriched for retry_heavy or branchy topology |
| 73 | +- H-FM-002: failed sessions less likely to show branch_collapse |
| 74 | +- H-IM-001: human intervention acts as external correction trigger |
| 75 | +- H-IM-002: post-intervention traces show topology regime shifts |
| 76 | +- H-EV-004: failure sessions may contain silent divergence-like patterns |
| 77 | +- H-EV-005: human intervention may produce topology regime shifts |
| 78 | + |
| 79 | +Target: opportunistic validation when native failure >= 10, near-failure >= 10, multi-runtime failure coverage >= 3. |
| 80 | + |
| 81 | +### From Phase 3D Tier 3 (controlled benchmark / external lane) |
| 82 | + |
| 83 | +Activate when controlled benchmark protocol is operational. |
| 84 | + |
| 85 | +- H-OT-001: test failures trigger corrective branch exploration |
| 86 | +- H-OT-002: contradictory tool observations precede branch_collapse |
| 87 | +- H-EG-001: controlled benchmark lanes show lower branch entropy after task normalization |
| 88 | +- H-EG-002: external trajectories over-represent retry-heavy and branchy morphologies |
| 89 | +- H-EV-002: external tool observations may substitute for epistemic verbalization as correction triggers |
| 90 | +- H-EV-003: branch collapse may occur after uncertainty resolution signals |
| 91 | + |
| 92 | +### From Phase 3D Tier 4 (literature-inspired, registry-only) |
| 93 | + |
| 94 | +Maintain in registry for future corpus expansion. |
| 95 | + |
| 96 | +- H-EV-001: uncertainty verbalization may precede exploratory topology |
| 97 | +- H-LH-001: long-horizon tasks produce more fan-in and branch-collapse |
| 98 | +- H-LH-002: multi-file tasks increase root spawning and transition entropy |
| 99 | + |
| 100 | +## Operating Rules |
| 101 | + |
| 102 | +- All claims must bind to a specific corpus snapshot and lane. |
| 103 | +- Every percentage must include its denominator. |
| 104 | +- Every runtime conclusion must disclose runtime distribution. |
| 105 | +- Negative results are first-class entries and must not be deleted. |
| 106 | +- Do not promote hypotheses to conclusions without corpus-backed validation. |
| 107 | +- Do not enter Phase 4. |
| 108 | +- Do not implement prediction, anomaly detection, or automatic diagnosis. |
| 109 | +- Do not merge intervention lanes into the native direct-prompt baseline. |
| 110 | +- Do not change topology taxonomy or readiness gates unless explicitly justified. |
| 111 | +- Cross-lane comparison may report trends only. |
| 112 | +- Intervention-lane findings do not become universal policy without additional validation. |
| 113 | + |
| 114 | +## Background Processes |
| 115 | + |
| 116 | +- Intervention-aware acquisition continues (formerly Phase 3D-T2B). |
| 117 | +- Native lane maintained as a living baseline. |
| 118 | +- Tier 2 failure/intervention opportunistic validation. |
| 119 | + |
| 120 | +## Current State |
| 121 | + |
| 122 | +Phase 3E is newly opened. The first action is to activate the controlled benchmark protocol and begin lane-separated intervention comparisons. No hypotheses in the carried-forward set are yet validation-ready. |
0 commit comments