Skip to content

Commit a4e8641

Browse files
authored
Merge branch 'main' into feat/persistent-shared-domains
2 parents 6ca0ed6 + a7c972b commit a4e8641

90 files changed

Lines changed: 6312 additions & 1780 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
---
2+
name: autoresearch
3+
description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports optional loop count via Claude Code's /loop command.
4+
version: 1.0.1
5+
---
6+
7+
# Claude Autoresearch — Autonomous Goal-directed Iteration
8+
9+
Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research.
10+
11+
**Core idea:** You are an autonomous agent. Modify → Verify → Keep/Discard → Repeat.
12+
13+
## When to Activate
14+
15+
- User invokes `/autoresearch` or `/ug:autoresearch`
16+
- User says "work autonomously", "iterate until done", "keep improving", "run overnight"
17+
- Any task requiring repeated iteration cycles with measurable outcomes
18+
19+
## Optional: Controlled Loop Count
20+
21+
By default, autoresearch loops **forever** until manually interrupted. However, users can optionally specify a **loop count** to limit iterations using Claude Code's built-in `/loop` command.
22+
23+
> **Requires:** Claude Code v1.0.32+ (the `/loop` command was introduced in this version)
24+
25+
### Usage
26+
27+
**Unlimited (default):**
28+
```
29+
/autoresearch
30+
Goal: Increase test coverage to 90%
31+
```
32+
33+
**Bounded (N iterations):**
34+
```
35+
/loop 25 /autoresearch
36+
Goal: Increase test coverage to 90%
37+
```
38+
39+
This chains `/autoresearch` with `/loop 25`, running exactly 25 iteration cycles. After 25 iterations, Claude stops and prints a final summary.
40+
41+
### When to Use Bounded Loops
42+
43+
| Scenario | Recommendation |
44+
|----------|---------------|
45+
| Run overnight, review in morning | Unlimited (default) |
46+
| Quick 30-min improvement session | `/loop 10 /autoresearch` |
47+
| Targeted fix with known scope | `/loop 5 /autoresearch` |
48+
| Exploratory — see if approach works | `/loop 15 /autoresearch` |
49+
| CI/CD pipeline integration | `/loop N /autoresearch` (set N based on time budget) |
50+
51+
### Behavior with Loop Count
52+
53+
When a loop count is specified:
54+
- Claude runs exactly N iterations through the autoresearch loop
55+
- After iteration N, Claude prints a **final summary** with baseline → current best, keeps/discards/crashes
56+
- If the goal is achieved before N iterations, Claude prints early completion and stops
57+
- All other rules (atomic changes, mechanical verification, auto-rollback) still apply
58+
59+
## Setup Phase (Do Once)
60+
61+
1. **Read all in-scope files** for full context before any modification
62+
2. **Define the goal** — What does "better" mean? Extract or ask for a mechanical metric:
63+
- Code: tests pass, build succeeds, performance benchmark improves
64+
- Content: word count target hit, SEO score improves, readability score
65+
- Design: lighthouse score, accessibility audit passes
66+
- If no metric exists → define one with user, or use simplest proxy (e.g. "compiles without errors")
67+
3. **Define scope constraints** — Which files can you modify? Which are read-only?
68+
4. **Create a results log** — Track every iteration (see `references/results-logging.md`)
69+
5. **Establish baseline** — Run verification on current state. Record as iteration #0
70+
6. **Confirm and go** — Show user the setup, get confirmation, then BEGIN THE LOOP
71+
72+
## The Loop
73+
74+
Read `references/autonomous-loop-protocol.md` for full protocol details.
75+
76+
```
77+
LOOP (FOREVER or N times):
78+
1. Review: Read current state + git history + results log
79+
2. Ideate: Pick next change based on goal, past results, what hasn't been tried
80+
3. Modify: Make ONE focused change to in-scope files
81+
4. Commit: Git commit the change (before verification)
82+
5. Verify: Run the mechanical metric (tests, build, benchmark, etc.)
83+
6. Decide:
84+
- IMPROVED → Keep commit, log "keep", advance
85+
- SAME/WORSE → Git revert, log "discard"
86+
- CRASHED → Try to fix (max 3 attempts), else log "crash" and move on
87+
7. Log: Record result in results log
88+
8. Repeat: Go to step 1.
89+
- If unbounded: NEVER STOP. NEVER ASK "should I continue?"
90+
- If bounded (N): Stop after N iterations, print final summary
91+
```
92+
93+
## Critical Rules
94+
95+
1. **Loop until done** — Unbounded: loop until interrupted. Bounded: loop N times then summarize.
96+
2. **Read before write** — Always understand full context before modifying
97+
3. **One change per iteration** — Atomic changes. If it breaks, you know exactly why
98+
4. **Mechanical verification only** — No subjective "looks good". Use metrics
99+
5. **Automatic rollback** — Failed changes revert instantly. No debates
100+
6. **Simplicity wins** — Equal results + less code = KEEP. Tiny improvement + ugly complexity = DISCARD
101+
7. **Git is memory** — Every kept change committed. Agent reads history to learn patterns
102+
8. **When stuck, think harder** — Re-read files, re-read goal, combine near-misses, try radical changes. Don't ask for help unless truly blocked by missing access/permissions
103+
104+
## Principles Reference
105+
106+
See `references/core-principles.md` for the 7 generalizable principles from autoresearch.
107+
108+
## Adapting to Different Domains
109+
110+
| Domain | Metric | Scope | Verify Command |
111+
|--------|--------|-------|----------------|
112+
| Backend code | Tests pass + coverage % | `src/**/*.ts` | `npm test` |
113+
| Frontend UI | Lighthouse score | `src/components/**` | `npx lighthouse` |
114+
| ML training | val_bpb / loss | `train.py` | `uv run train.py` |
115+
| Blog/content | Word count + readability | `content/*.md` | Custom script |
116+
| Performance | Benchmark time (ms) | Target files | `npm run bench` |
117+
| Refactoring | Tests pass + LOC reduced | Target module | `npm test && wc -l` |
118+
119+
Adapt the loop to your domain. The PRINCIPLES are universal; the METRICS are domain-specific.
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Autonomous Loop Protocol
2+
3+
Detailed protocol for the autoresearch iteration loop. SKILL.md has the summary; this file has the full rules.
4+
5+
## Loop Modes
6+
7+
Autoresearch supports two loop modes:
8+
9+
- **Unbounded (default):** Loop forever until manually interrupted (`Ctrl+C`)
10+
- **Bounded:** Loop exactly N times when chained with `/loop N` (requires Claude Code v1.0.32+)
11+
12+
When bounded, track `current_iteration` against `max_iterations`. After the final iteration, print a summary and stop.
13+
14+
## Phase 1: Review (30 seconds)
15+
16+
Before each iteration, build situational awareness:
17+
18+
```
19+
1. Read current state of in-scope files (full context)
20+
2. Read last 10-20 entries from results log
21+
3. Read git log --oneline -20 to see recent changes
22+
4. Identify: what worked, what failed, what's untried
23+
5. If bounded: check current_iteration vs max_iterations
24+
```
25+
26+
**Why read every time?** After rollbacks, state may differ from what you expect. Never assume — always verify.
27+
28+
## Phase 2: Ideate (Strategic)
29+
30+
Pick the NEXT change. Priority order:
31+
32+
1. **Fix crashes/failures** from previous iteration first
33+
2. **Exploit successes** — if last change improved metric, try variants in same direction
34+
3. **Explore new approaches** — try something the results log shows hasn't been attempted
35+
4. **Combine near-misses** — two changes that individually didn't help might work together
36+
5. **Simplify** — remove code while maintaining metric. Simpler = better
37+
6. **Radical experiments** — when incremental changes stall, try something dramatically different
38+
39+
**Anti-patterns:**
40+
- Don't repeat exact same change that was already discarded
41+
- Don't make multiple unrelated changes at once (can't attribute improvement)
42+
- Don't chase marginal gains with ugly complexity
43+
44+
**Bounded mode consideration:** If remaining iterations are limited (<3 left), prioritize exploiting successes over exploration.
45+
46+
## Phase 3: Modify (One Atomic Change)
47+
48+
- Make ONE focused change to in-scope files
49+
- The change should be explainable in one sentence
50+
- Write the description BEFORE making the change (forces clarity)
51+
52+
## Phase 4: Commit (Before Verification)
53+
54+
```bash
55+
git add <changed-files>
56+
git commit -m "experiment: <one-sentence description>"
57+
```
58+
59+
Commit BEFORE running verification so rollback is clean: `git reset --hard HEAD~1`
60+
61+
## Phase 5: Verify (Mechanical Only)
62+
63+
Run the agreed-upon verification command. Capture output.
64+
65+
**Timeout rule:** If verification exceeds 2x normal time, kill and treat as crash.
66+
67+
**Extract metric:** Parse the verification output for the specific metric number.
68+
69+
## Phase 6: Decide (No Ambiguity)
70+
71+
```
72+
IF metric_improved:
73+
STATUS = "keep"
74+
# Do nothing — commit stays
75+
ELIF metric_same_or_worse:
76+
STATUS = "discard"
77+
git reset --hard HEAD~1
78+
ELIF crashed:
79+
# Attempt fix (max 3 tries)
80+
IF fixable:
81+
Fix → re-commit → re-verify
82+
ELSE:
83+
STATUS = "crash"
84+
git reset --hard HEAD~1
85+
```
86+
87+
**Simplicity override:** If metric barely improved (+<0.1%) but change adds significant complexity, treat as "discard". If metric unchanged but code is simpler, treat as "keep".
88+
89+
## Phase 7: Log Results
90+
91+
Append to results log (TSV format):
92+
93+
```
94+
iteration commit metric status description
95+
42 a1b2c3d 0.9821 keep increase attention heads from 8 to 12
96+
43 - 0.9845 discard switch optimizer to SGD
97+
44 - 0.0000 crash double batch size (OOM)
98+
```
99+
100+
## Phase 8: Repeat
101+
102+
### Unbounded Mode (default)
103+
104+
Go to Phase 1. **NEVER STOP. NEVER ASK IF YOU SHOULD CONTINUE.**
105+
106+
### Bounded Mode (with /loop N)
107+
108+
```
109+
IF current_iteration < max_iterations:
110+
Go to Phase 1
111+
ELIF goal_achieved:
112+
Print: "Goal achieved at iteration {N}! Final metric: {value}"
113+
Print final summary
114+
STOP
115+
ELSE:
116+
Print final summary
117+
STOP
118+
```
119+
120+
**Final summary format:**
121+
```
122+
=== Autoresearch Complete (N/N iterations) ===
123+
Baseline: {baseline} → Final: {current} ({delta})
124+
Keeps: X | Discards: Y | Crashes: Z
125+
Best iteration: #{n} — {description}
126+
```
127+
128+
### When Stuck (>5 consecutive discards)
129+
130+
Applies to both modes:
131+
1. Re-read ALL in-scope files from scratch
132+
2. Re-read the original goal/direction
133+
3. Review entire results log for patterns
134+
4. Try combining 2-3 previously successful changes
135+
5. Try the OPPOSITE of what hasn't been working
136+
6. Try a radical architectural change
137+
138+
## Crash Recovery
139+
140+
- Syntax error → fix immediately, don't count as separate iteration
141+
- Runtime error → attempt fix (max 3 tries), then move on
142+
- Resource exhaustion (OOM) → revert, try smaller variant
143+
- Infinite loop/hang → kill after timeout, revert, avoid that approach
144+
- External dependency failure → skip, log, try different approach
145+
146+
## Communication
147+
148+
- **DO NOT** ask "should I keep going?" — in unbounded mode, YES. ALWAYS. In bounded mode, continue until N is reached.
149+
- **DO NOT** summarize after each iteration — just log and continue
150+
- **DO** print a brief one-line status every ~5 iterations (e.g., "Iteration 25: metric at 0.95, 8 keeps / 17 discards")
151+
- **DO** alert if you discover something surprising or game-changing
152+
- **DO** print a final summary when bounded loop completes
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Core Principles — From Karpathy's Autoresearch
2+
3+
7 universal principles extracted from autoresearch, applicable to ANY autonomous work.
4+
5+
## 1. Constraint = Enabler
6+
7+
Autonomy succeeds through intentional constraint, not despite it.
8+
9+
| Autoresearch | Generalized |
10+
|--------------|-------------|
11+
| 630-line codebase | Bounded scope that fits agent context |
12+
| 5-minute time budget | Fixed iteration cost |
13+
| One metric (val_bpb) | Single mechanical success criterion |
14+
15+
**Why:** Constraints enable agent confidence (full context understood), verification simplicity (no ambiguity), iteration velocity (low cost = rapid feedback loops).
16+
17+
**Apply:** Before starting, define: what files are in-scope? What's the ONE metric? What's the time budget per iteration?
18+
19+
## 2. Separate Strategy from Tactics
20+
21+
Humans set direction. Agents execute iterations.
22+
23+
| Strategic (Human) | Tactical (Agent) |
24+
|-------------------|------------------|
25+
| "Improve page load speed" | "Lazy-load images, code-split routes" |
26+
| "Increase test coverage" | "Add tests for uncovered edge cases" |
27+
| "Refactor auth module" | "Extract middleware, simplify handlers" |
28+
29+
**Why:** Humans understand WHY. Agents handle HOW. Mixing these roles wastes both human creativity and agent iteration speed.
30+
31+
**Apply:** Get clear direction from user (or program.md). Then iterate autonomously on implementation.
32+
33+
## 3. Metrics Must Be Mechanical
34+
35+
If you can't verify with a command, you can't iterate autonomously.
36+
37+
- Tests pass/fail (exit code 0)
38+
- Benchmark time in milliseconds
39+
- Coverage percentage
40+
- Lighthouse score
41+
- File size in bytes
42+
- Lines of code count
43+
44+
**Anti-pattern:** "Looks better", "probably improved", "seems cleaner" → these KILL autonomous loops because there's no decision function.
45+
46+
**Apply:** Define the `grep` command (or equivalent) that extracts your metric BEFORE starting.
47+
48+
## 4. Verification Must Be Fast
49+
50+
If verification takes longer than the work itself, incentives misalign.
51+
52+
| Fast (enables iteration) | Slow (kills iteration) |
53+
|-------------------------|----------------------|
54+
| Unit tests (seconds) | Full E2E suite (minutes) |
55+
| Type check (seconds) | Manual QA (hours) |
56+
| Lint check (instant) | Code review (async) |
57+
58+
**Apply:** Use the FASTEST verification that still catches real problems. Save slow verification for after the loop.
59+
60+
## 5. Iteration Cost Shapes Behavior
61+
62+
- Cheap iteration → bold exploration, many experiments
63+
- Expensive iteration → conservative, few experiments
64+
65+
Autoresearch: 5-minute cost → 100 experiments/night.
66+
Software: 10-second test → 360 experiments/hour.
67+
68+
**Apply:** Minimize iteration cost. Use fast tests, incremental builds, targeted verification. Every minute saved = more experiments run.
69+
70+
## 6. Git as Memory and Audit Trail
71+
72+
Every successful change is committed. This enables:
73+
- **Causality tracking** — which change drove improvement?
74+
- **Stacking wins** — each commit builds on prior successes
75+
- **Pattern learning** — agent sees what worked in THIS codebase
76+
- **Human review** — researcher inspects agent's decision sequence
77+
78+
**Apply:** Commit before verify. Revert on failure. Agent reads its own git history to inform next experiment.
79+
80+
## 7. Honest Limitations
81+
82+
State what the system can and cannot do. Don't oversell.
83+
84+
Autoresearch CANNOT: change tokenizer, replace human direction, guarantee meaningful improvements.
85+
86+
**Apply:** At setup, explicitly state constraints. If agent hits a wall it can't solve (missing permissions, external dependency, needs human judgment), say so clearly instead of guessing.
87+
88+
## The Meta-Principle
89+
90+
> Autonomy scales when you constrain scope, clarify success, mechanize verification, and let agents optimize tactics while humans optimize strategy.
91+
92+
This isn't "removing humans." It's reassigning human effort from execution to direction. Humans become MORE valuable by focusing on irreducibly creative/strategic work.

0 commit comments

Comments
 (0)