Skip to content

Commit f2c55ad

Browse files
Your Nameclaude
andcommitted
Close Phase 3D, open Phase 3E
Phase 3D is complete: - Hypothesis registry established (19 hypotheses, 8 categories) - Tier 1 validation complete (3 supported, 1 inconclusive, 1 not supported) - Tier 2 honestly deferred (failure samples genuinely rare: 1/100) - Closure report at phase3d/closure_report_v0.2.5.md Phase 3E is now active: - Controlled Transition & Intervention-aware Validation - Lanes: direct_prompt_native, routed_prompt_intervention, superpowers_workflow_intervention, controlled_prompt_morphology - Deferred Tier 2/3/4 hypotheses carried forward - Phase 4 remains not open - Background acquisition continues, not a phase gate Research index updated to reflect current phase roadmap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent fceb756 commit f2c55ad

3 files changed

Lines changed: 158 additions & 25 deletions

File tree

docs/research/README.md

Lines changed: 31 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2,33 +2,43 @@
22

33
This directory groups the research tracks and branch studies that sit alongside the main `causetrace` runtime-morphology work.
44

5-
## Active Research Tracks
6-
7-
- [Phase 3A](phase3a/README.md)
8-
- [Phase 3B](phase3b/README.md)
9-
- [Phase 3C](phase3c/README.md)
10-
- [Phase 3D](phase3d/README.md)
11-
12-
## Current Research Status
5+
## Research Phase Status
6+
7+
| Phase | Status | Summary |
8+
|-------|--------|---------|
9+
| Phase 2.5 | complete | Baseline infrastructure |
10+
| Phase 3A | complete | Descriptive corpus |
11+
| Phase 3B | complete | Topology taxonomy |
12+
| Phase 3C | complete | Metadata & provenance |
13+
| [Phase 3D](phase3d/README.md) | **complete** | Hypothesis registry + Tier 1 validation |
14+
| [Phase 3E](phase3e/README.md) | **active** | Controlled transition & intervention-aware validation |
15+
| Phase 4 | **not open** | Theory finalization |
16+
17+
## Current Corpus Snapshot
18+
19+
- sessions: `1351`
20+
- events: `128,552`
21+
- strict research-grade sessions: `157`
22+
- native strict sessions: `100`
23+
- agent field coverage: `100%` (inline)
24+
- provider field coverage: `99.8%` (inline)
25+
- runtime breadth: `7`
26+
- task breadth: `9`
1327

14-
`causetrace` now has enough corpus scale to support validation-oriented work, but not enough metadata density to support theory finalization or default automation policy.
28+
## Phase 3D Closure Summary
1529

16-
Current snapshot:
30+
Phase 3D delivered the hypothesis registry (19 hypotheses, 8 categories), completed Tier 1 validation (3 supported, 1 inconclusive, 1 not supported), and honestly deferred Tier 2 (failure samples genuinely rare in real agent behavior: 1/100 native failure, 0/100 near-failure). See [closure report](phase3d/closure_report_v0.2.5.md).
1731

18-
- sessions: `1315`
19-
- events: `64429`
20-
- strict research-grade sessions: `157`
21-
- dominant_chain: `1111`
22-
- mixed: `195`
23-
- retry-heavy: `541`
24-
- branchy sessions: `179`
25-
- long sessions >=100 events: `53`
32+
## Phase 3E Active Scope
2633

27-
The next mainline stage is:
34+
Controlled transition and intervention-aware validation. Lanes kept separate:
2835

29-
`Phase 3D-T2B: Intervention-aware Acquisition`
36+
- `direct_prompt_native`
37+
- `routed_prompt_intervention`
38+
- `superpowers_workflow_intervention`
39+
- `controlled_prompt_morphology`
3040

31-
This stage continues Tier 2 acquisition while keeping workflow-intervention lanes separate from the native direct-prompt baseline.
41+
Deferred hypotheses from Phase 3D Tier 2/3/4 carried forward. Tier 2 validation is opportunistic (background acquisition), not a phase gate. See [Phase 3E README](phase3e/README.md).
3242

3343
## Cross-project Branch Studies
3444

docs/research/phase3d/status.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# Phase 3D Status (v0.2.5)
1+
# Phase 3D Status (v0.2.5) — CLOSED
22

3-
Phase 3D is recommended for graduation. See [closure report](closure_report_v0.2.5.md) for full assessment.
3+
Phase 3D is complete. See [closure report](closure_report_v0.2.5.md) for full assessment.
44

55
It delivered the hypothesis registry layer for runtime morphology research. Tier 1 validation is complete. Tier 2 is deferred honestly (failure samples genuinely rare in real agent behavior, not an execution failure).
66

@@ -10,8 +10,9 @@ It delivered the hypothesis registry layer for runtime morphology research. Tier
1010
- Phase 3A: complete
1111
- Phase 3B: complete
1212
- Phase 3C: complete
13-
- Phase 3D: recommended for graduation
14-
- Phase 3E: preparing
13+
- Phase 3D: complete
14+
- Phase 3E: active
15+
- Phase 4: not open
1516

1617
## Current Corpus Baseline
1718

docs/research/phase3e/README.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Phase 3E: Controlled Transition & Intervention-aware Validation
2+
3+
Phase 3E validates selected runtime morphology hypotheses under controlled or intervention-aware conditions. It does not enter Phase 4 theory finalization.
4+
5+
## Position
6+
7+
- Phase 3D: complete
8+
- Phase 3E: active
9+
- Phase 4: not open
10+
11+
## Mission
12+
13+
Validate the relationship between events, observations, interventions, workflow conditions and topology transitions. Specifically:
14+
15+
```
16+
event / observation / intervention / workflow condition
17+
→ topology transition
18+
```
19+
20+
## Scope
21+
22+
### Active lanes (all kept separate)
23+
24+
| Lane | Description | Merge into native? |
25+
|------|-------------|-------------------|
26+
| `direct_prompt_native` | User gave the agent a task directly | Baseline |
27+
| `routed_prompt_intervention` | `prompt-routing-skill` selected posture first | No |
28+
| `superpowers_workflow_intervention` | Structured workflow plugin changed execution shape | No |
29+
| `controlled_prompt_morphology` | Controlled prompt comparison or pilot run | No |
30+
| `external_trajectory` | External data source | No |
31+
32+
### Validation targets
33+
34+
- controlled benchmark protocol activation
35+
- intervention lane comparison (routed vs superpowers vs controlled)
36+
- correction-trigger studies (test failure, tool error, human correction, explicit correction mark)
37+
- observation-triggered transition studies (contradictory outputs, test failures, shell errors)
38+
- prompt posture / routing impact on retry, branch, and convergence
39+
- workflow intervention impact on topology (superpowers, subagent dispatching, structured workflows)
40+
- Tier 2 sample natural accumulation (failure, near-failure, human-intervention) with opportunistic validation
41+
42+
### Evidence gates
43+
44+
Every Phase 3E claim must report:
45+
46+
- lane
47+
- corpus snapshot
48+
- denominator
49+
- runtime distribution
50+
- task_type distribution
51+
- intervention type (if applicable)
52+
- whether the result is exploratory or validation-grade
53+
54+
## Non-goals
55+
56+
- Phase 4 theory finalization
57+
- Prediction of agent behavior
58+
- Anomaly scoring or detection
59+
- Automatic diagnosis
60+
- Universal prompt policy recommendations
61+
- Cross-lane aggregation without lane disclosure
62+
- Promoting Tier 2 hypotheses to conclusions without sufficient evidence
63+
- Changing topology taxonomy
64+
- Changing readiness gates without explicit justification
65+
66+
## Deferred Hypotheses Carried Forward
67+
68+
### From Phase 3D Tier 2 (failure / intervention morphology)
69+
70+
Registry entries, not validated. Validation deferred until corpus naturally accumulates more samples.
71+
72+
- H-FM-001: failure/near-failure sessions enriched for retry_heavy or branchy topology
73+
- H-FM-002: failed sessions less likely to show branch_collapse
74+
- H-IM-001: human intervention acts as external correction trigger
75+
- H-IM-002: post-intervention traces show topology regime shifts
76+
- H-EV-004: failure sessions may contain silent divergence-like patterns
77+
- H-EV-005: human intervention may produce topology regime shifts
78+
79+
Target: opportunistic validation when native failure >= 10, near-failure >= 10, multi-runtime failure coverage >= 3.
80+
81+
### From Phase 3D Tier 3 (controlled benchmark / external lane)
82+
83+
Activate when controlled benchmark protocol is operational.
84+
85+
- H-OT-001: test failures trigger corrective branch exploration
86+
- H-OT-002: contradictory tool observations precede branch_collapse
87+
- H-EG-001: controlled benchmark lanes show lower branch entropy after task normalization
88+
- H-EG-002: external trajectories over-represent retry-heavy and branchy morphologies
89+
- H-EV-002: external tool observations may substitute for epistemic verbalization as correction triggers
90+
- H-EV-003: branch collapse may occur after uncertainty resolution signals
91+
92+
### From Phase 3D Tier 4 (literature-inspired, registry-only)
93+
94+
Maintain in registry for future corpus expansion.
95+
96+
- H-EV-001: uncertainty verbalization may precede exploratory topology
97+
- H-LH-001: long-horizon tasks produce more fan-in and branch-collapse
98+
- H-LH-002: multi-file tasks increase root spawning and transition entropy
99+
100+
## Operating Rules
101+
102+
- All claims must bind to a specific corpus snapshot and lane.
103+
- Every percentage must include its denominator.
104+
- Every runtime conclusion must disclose runtime distribution.
105+
- Negative results are first-class entries and must not be deleted.
106+
- Do not promote hypotheses to conclusions without corpus-backed validation.
107+
- Do not enter Phase 4.
108+
- Do not implement prediction, anomaly detection, or automatic diagnosis.
109+
- Do not merge intervention lanes into the native direct-prompt baseline.
110+
- Do not change topology taxonomy or readiness gates unless explicitly justified.
111+
- Cross-lane comparison may report trends only.
112+
- Intervention-lane findings do not become universal policy without additional validation.
113+
114+
## Background Processes
115+
116+
- Intervention-aware acquisition continues (formerly Phase 3D-T2B).
117+
- Native lane maintained as a living baseline.
118+
- Tier 2 failure/intervention opportunistic validation.
119+
120+
## Current State
121+
122+
Phase 3E is newly opened. The first action is to activate the controlled benchmark protocol and begin lane-separated intervention comparisons. No hypotheses in the carried-forward set are yet validation-ready.

0 commit comments

Comments
 (0)