|
| 1 | +--- |
| 2 | +name: autoresearch |
| 3 | +description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports optional loop count via Claude Code's /loop command. |
| 4 | +version: 1.0.1 |
| 5 | +--- |
| 6 | + |
| 7 | +# Claude Autoresearch — Autonomous Goal-directed Iteration |
| 8 | + |
| 9 | +Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research. |
| 10 | + |
| 11 | +**Core idea:** You are an autonomous agent. Modify → Verify → Keep/Discard → Repeat. |
| 12 | + |
| 13 | +## When to Activate |
| 14 | + |
| 15 | +- User invokes `/autoresearch` or `/ug:autoresearch` |
| 16 | +- User says "work autonomously", "iterate until done", "keep improving", "run overnight" |
| 17 | +- Any task requiring repeated iteration cycles with measurable outcomes |
| 18 | + |
| 19 | +## Optional: Controlled Loop Count |
| 20 | + |
| 21 | +By default, autoresearch loops **forever** until manually interrupted. However, users can optionally specify a **loop count** to limit iterations using Claude Code's built-in `/loop` command. |
| 22 | + |
| 23 | +> **Requires:** Claude Code v1.0.32+ (the `/loop` command was introduced in this version) |
| 24 | +
|
| 25 | +### Usage |
| 26 | + |
| 27 | +**Unlimited (default):** |
| 28 | +``` |
| 29 | +/autoresearch |
| 30 | +Goal: Increase test coverage to 90% |
| 31 | +``` |
| 32 | + |
| 33 | +**Bounded (N iterations):** |
| 34 | +``` |
| 35 | +/loop 25 /autoresearch |
| 36 | +Goal: Increase test coverage to 90% |
| 37 | +``` |
| 38 | + |
| 39 | +This chains `/autoresearch` with `/loop 25`, running exactly 25 iteration cycles. After 25 iterations, Claude stops and prints a final summary. |
| 40 | + |
| 41 | +### When to Use Bounded Loops |
| 42 | + |
| 43 | +| Scenario | Recommendation | |
| 44 | +|----------|---------------| |
| 45 | +| Run overnight, review in morning | Unlimited (default) | |
| 46 | +| Quick 30-min improvement session | `/loop 10 /autoresearch` | |
| 47 | +| Targeted fix with known scope | `/loop 5 /autoresearch` | |
| 48 | +| Exploratory — see if approach works | `/loop 15 /autoresearch` | |
| 49 | +| CI/CD pipeline integration | `/loop N /autoresearch` (set N based on time budget) | |
| 50 | + |
| 51 | +### Behavior with Loop Count |
| 52 | + |
| 53 | +When a loop count is specified: |
| 54 | +- Claude runs exactly N iterations through the autoresearch loop |
| 55 | +- After iteration N, Claude prints a **final summary** with baseline → current best, keeps/discards/crashes |
| 56 | +- If the goal is achieved before N iterations, Claude prints early completion and stops |
| 57 | +- All other rules (atomic changes, mechanical verification, auto-rollback) still apply |
| 58 | + |
| 59 | +## Setup Phase (Do Once) |
| 60 | + |
| 61 | +1. **Read all in-scope files** for full context before any modification |
| 62 | +2. **Define the goal** — What does "better" mean? Extract or ask for a mechanical metric: |
| 63 | + - Code: tests pass, build succeeds, performance benchmark improves |
| 64 | + - Content: word count target hit, SEO score improves, readability score |
| 65 | + - Design: lighthouse score, accessibility audit passes |
| 66 | + - If no metric exists → define one with user, or use simplest proxy (e.g. "compiles without errors") |
| 67 | +3. **Define scope constraints** — Which files can you modify? Which are read-only? |
| 68 | +4. **Create a results log** — Track every iteration (see `references/results-logging.md`) |
| 69 | +5. **Establish baseline** — Run verification on current state. Record as iteration #0 |
| 70 | +6. **Confirm and go** — Show user the setup, get confirmation, then BEGIN THE LOOP |
| 71 | + |
| 72 | +## The Loop |
| 73 | + |
| 74 | +Read `references/autonomous-loop-protocol.md` for full protocol details. |
| 75 | + |
| 76 | +``` |
| 77 | +LOOP (FOREVER or N times): |
| 78 | + 1. Review: Read current state + git history + results log |
| 79 | + 2. Ideate: Pick next change based on goal, past results, what hasn't been tried |
| 80 | + 3. Modify: Make ONE focused change to in-scope files |
| 81 | + 4. Commit: Git commit the change (before verification) |
| 82 | + 5. Verify: Run the mechanical metric (tests, build, benchmark, etc.) |
| 83 | + 6. Decide: |
| 84 | + - IMPROVED → Keep commit, log "keep", advance |
| 85 | + - SAME/WORSE → Git revert, log "discard" |
| 86 | + - CRASHED → Try to fix (max 3 attempts), else log "crash" and move on |
| 87 | + 7. Log: Record result in results log |
| 88 | + 8. Repeat: Go to step 1. |
| 89 | + - If unbounded: NEVER STOP. NEVER ASK "should I continue?" |
| 90 | + - If bounded (N): Stop after N iterations, print final summary |
| 91 | +``` |
| 92 | + |
| 93 | +## Critical Rules |
| 94 | + |
| 95 | +1. **Loop until done** — Unbounded: loop until interrupted. Bounded: loop N times then summarize. |
| 96 | +2. **Read before write** — Always understand full context before modifying |
| 97 | +3. **One change per iteration** — Atomic changes. If it breaks, you know exactly why |
| 98 | +4. **Mechanical verification only** — No subjective "looks good". Use metrics |
| 99 | +5. **Automatic rollback** — Failed changes revert instantly. No debates |
| 100 | +6. **Simplicity wins** — Equal results + less code = KEEP. Tiny improvement + ugly complexity = DISCARD |
| 101 | +7. **Git is memory** — Every kept change committed. Agent reads history to learn patterns |
| 102 | +8. **When stuck, think harder** — Re-read files, re-read goal, combine near-misses, try radical changes. Don't ask for help unless truly blocked by missing access/permissions |
| 103 | + |
| 104 | +## Principles Reference |
| 105 | + |
| 106 | +See `references/core-principles.md` for the 7 generalizable principles from autoresearch. |
| 107 | + |
| 108 | +## Adapting to Different Domains |
| 109 | + |
| 110 | +| Domain | Metric | Scope | Verify Command | |
| 111 | +|--------|--------|-------|----------------| |
| 112 | +| Backend code | Tests pass + coverage % | `src/**/*.ts` | `npm test` | |
| 113 | +| Frontend UI | Lighthouse score | `src/components/**` | `npx lighthouse` | |
| 114 | +| ML training | val_bpb / loss | `train.py` | `uv run train.py` | |
| 115 | +| Blog/content | Word count + readability | `content/*.md` | Custom script | |
| 116 | +| Performance | Benchmark time (ms) | Target files | `npm run bench` | |
| 117 | +| Refactoring | Tests pass + LOC reduced | Target module | `npm test && wc -l` | |
| 118 | + |
| 119 | +Adapt the loop to your domain. The PRINCIPLES are universal; the METRICS are domain-specific. |
0 commit comments