erigontech
diff --git a/‎.claude/skills/autoresearch/SKILL.md‎
Lines changed: 119 additions & 0 deletions b/‎.claude/skills/autoresearch/SKILL.md‎
Lines changed: 119 additions & 0 deletions
diff --git a/‎.claude/skills/autoresearch/references/autonomous-loop-protocol.md‎
Lines changed: 152 additions & 0 deletions b/‎.claude/skills/autoresearch/references/autonomous-loop-protocol.md‎
Lines changed: 152 additions & 0 deletions
diff --git a/‎.claude/skills/autoresearch/references/core-principles.md‎
Lines changed: 92 additions & 0 deletions b/‎.claude/skills/autoresearch/references/core-principles.md‎
Lines changed: 92 additions & 0 deletions
@@ -0,0 +1,119 @@
+---
+name: autoresearch
+description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Supports optional loop count via Claude Code's /loop command.
+version: 1.0.1
+---
+
+# Claude Autoresearch — Autonomous Goal-directed Iteration
+
+Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research.
+
+**Core idea:** You are an autonomous agent. Modify → Verify → Keep/Discard → Repeat.
+
+## When to Activate
+
+- User invokes `/autoresearch` or `/ug:autoresearch`
+- User says "work autonomously", "iterate until done", "keep improving", "run overnight"
+- Any task requiring repeated iteration cycles with measurable outcomes
+
+## Optional: Controlled Loop Count
+
+By default, autoresearch loops **forever** until manually interrupted. However, users can optionally specify a **loop count** to limit iterations using Claude Code's built-in `/loop` command.
+
+> **Requires:** Claude Code v1.0.32+ (the `/loop` command was introduced in this version)
+
+### Usage
+
+**Unlimited (default):**
+```
+/autoresearch
+Goal: Increase test coverage to 90%
+```
+
+**Bounded (N iterations):**
+```
+/loop 25 /autoresearch
+Goal: Increase test coverage to 90%
+```
+
+This chains `/autoresearch` with `/loop 25`, running exactly 25 iteration cycles. After 25 iterations, Claude stops and prints a final summary.
+
+### When to Use Bounded Loops
+
+| Scenario | Recommendation |
+|----------|---------------|
+| Run overnight, review in morning | Unlimited (default) |
+| Quick 30-min improvement session | `/loop 10 /autoresearch` |
+| Targeted fix with known scope | `/loop 5 /autoresearch` |
+| Exploratory — see if approach works | `/loop 15 /autoresearch` |
+| CI/CD pipeline integration | `/loop N /autoresearch` (set N based on time budget) |
+
+### Behavior with Loop Count
+
+When a loop count is specified:
+- Claude runs exactly N iterations through the autoresearch loop
+- After iteration N, Claude prints a **final summary** with baseline → current best, keeps/discards/crashes
+- If the goal is achieved before N iterations, Claude prints early completion and stops
+- All other rules (atomic changes, mechanical verification, auto-rollback) still apply
+
+## Setup Phase (Do Once)
+
+1. **Read all in-scope files** for full context before any modification
+2. **Define the goal** — What does "better" mean? Extract or ask for a mechanical metric:
+   - Code: tests pass, build succeeds, performance benchmark improves
+   - Content: word count target hit, SEO score improves, readability score
+   - Design: lighthouse score, accessibility audit passes
+   - If no metric exists → define one with user, or use simplest proxy (e.g. "compiles without errors")
+3. **Define scope constraints** — Which files can you modify? Which are read-only?
+4. **Create a results log** — Track every iteration (see `references/results-logging.md`)
+5. **Establish baseline** — Run verification on current state. Record as iteration #0
+6. **Confirm and go** — Show user the setup, get confirmation, then BEGIN THE LOOP
+
+## The Loop
+
+Read `references/autonomous-loop-protocol.md` for full protocol details.
+
+```
+LOOP (FOREVER or N times):
+  1. Review: Read current state + git history + results log
+  2. Ideate: Pick next change based on goal, past results, what hasn't been tried
+  3. Modify: Make ONE focused change to in-scope files
+  4. Commit: Git commit the change (before verification)
+  5. Verify: Run the mechanical metric (tests, build, benchmark, etc.)
+  6. Decide:
+     - IMPROVED → Keep commit, log "keep", advance
+     - SAME/WORSE → Git revert, log "discard"
+     - CRASHED → Try to fix (max 3 attempts), else log "crash" and move on
+  7. Log: Record result in results log
+  8. Repeat: Go to step 1.
+     - If unbounded: NEVER STOP. NEVER ASK "should I continue?"
+     - If bounded (N): Stop after N iterations, print final summary
+```
+
+## Critical Rules
+
+1. **Loop until done** — Unbounded: loop until interrupted. Bounded: loop N times then summarize.
+2. **Read before write** — Always understand full context before modifying
+3. **One change per iteration** — Atomic changes. If it breaks, you know exactly why
+4. **Mechanical verification only** — No subjective "looks good". Use metrics
+5. **Automatic rollback** — Failed changes revert instantly. No debates
+6. **Simplicity wins** — Equal results + less code = KEEP. Tiny improvement + ugly complexity = DISCARD
+7. **Git is memory** — Every kept change committed. Agent reads history to learn patterns
+8. **When stuck, think harder** — Re-read files, re-read goal, combine near-misses, try radical changes. Don't ask for help unless truly blocked by missing access/permissions
+
+## Principles Reference
+
+See `references/core-principles.md` for the 7 generalizable principles from autoresearch.
+
+## Adapting to Different Domains
+
+| Domain | Metric | Scope | Verify Command |
+|--------|--------|-------|----------------|
+| Backend code | Tests pass + coverage % | `src/**/*.ts` | `npm test` |
+| Frontend UI | Lighthouse score | `src/components/**` | `npx lighthouse` |
+| ML training | val_bpb / loss | `train.py` | `uv run train.py` |
+| Blog/content | Word count + readability | `content/*.md` | Custom script |
+| Performance | Benchmark time (ms) | Target files | `npm run bench` |
+| Refactoring | Tests pass + LOC reduced | Target module | `npm test && wc -l` |
+
+Adapt the loop to your domain. The PRINCIPLES are universal; the METRICS are domain-specific.
@@ -0,0 +1,152 @@
+# Autonomous Loop Protocol
+
+Detailed protocol for the autoresearch iteration loop. SKILL.md has the summary; this file has the full rules.
+
+## Loop Modes
+
+Autoresearch supports two loop modes:
+
+- **Unbounded (default):** Loop forever until manually interrupted (`Ctrl+C`)
+- **Bounded:** Loop exactly N times when chained with `/loop N` (requires Claude Code v1.0.32+)
+
+When bounded, track `current_iteration` against `max_iterations`. After the final iteration, print a summary and stop.
+
+## Phase 1: Review (30 seconds)
+
+Before each iteration, build situational awareness:
+
+```
+1. Read current state of in-scope files (full context)
+2. Read last 10-20 entries from results log
+3. Read git log --oneline -20 to see recent changes
+4. Identify: what worked, what failed, what's untried
+5. If bounded: check current_iteration vs max_iterations
+```
+
+**Why read every time?** After rollbacks, state may differ from what you expect. Never assume — always verify.
+
+## Phase 2: Ideate (Strategic)
+
+Pick the NEXT change. Priority order:
+
+1. **Fix crashes/failures** from previous iteration first
+2. **Exploit successes** — if last change improved metric, try variants in same direction
+3. **Explore new approaches** — try something the results log shows hasn't been attempted
+4. **Combine near-misses** — two changes that individually didn't help might work together
+5. **Simplify** — remove code while maintaining metric. Simpler = better
+6. **Radical experiments** — when incremental changes stall, try something dramatically different
+
+**Anti-patterns:**
+- Don't repeat exact same change that was already discarded
+- Don't make multiple unrelated changes at once (can't attribute improvement)
+- Don't chase marginal gains with ugly complexity
+
+**Bounded mode consideration:** If remaining iterations are limited (<3 left), prioritize exploiting successes over exploration.
+
+## Phase 3: Modify (One Atomic Change)
+
+- Make ONE focused change to in-scope files
+- The change should be explainable in one sentence
+- Write the description BEFORE making the change (forces clarity)
+
+## Phase 4: Commit (Before Verification)
+
+```bash
+git add <changed-files>
+git commit -m "experiment: <one-sentence description>"
+```
+
+Commit BEFORE running verification so rollback is clean: `git reset --hard HEAD~1`
+
+## Phase 5: Verify (Mechanical Only)
+
+Run the agreed-upon verification command. Capture output.
+
+**Timeout rule:** If verification exceeds 2x normal time, kill and treat as crash.
+
+**Extract metric:** Parse the verification output for the specific metric number.
+
+## Phase 6: Decide (No Ambiguity)
+
+```
+IF metric_improved:
+    STATUS = "keep"
+    # Do nothing — commit stays
+ELIF metric_same_or_worse:
+    STATUS = "discard"
+    git reset --hard HEAD~1
+ELIF crashed:
+    # Attempt fix (max 3 tries)
+    IF fixable:
+        Fix → re-commit → re-verify
+    ELSE:
+        STATUS = "crash"
+        git reset --hard HEAD~1
+```
+
+**Simplicity override:** If metric barely improved (+<0.1%) but change adds significant complexity, treat as "discard". If metric unchanged but code is simpler, treat as "keep".
+
+## Phase 7: Log Results
+
+Append to results log (TSV format):
+
+```
+iteration  commit   metric   status   description
+42         a1b2c3d  0.9821   keep     increase attention heads from 8 to 12
+43         -        0.9845   discard  switch optimizer to SGD
+44         -        0.0000   crash    double batch size (OOM)
+```
+
+## Phase 8: Repeat
+
+### Unbounded Mode (default)
+
+Go to Phase 1. **NEVER STOP. NEVER ASK IF YOU SHOULD CONTINUE.**
+
+### Bounded Mode (with /loop N)
+
+```
+IF current_iteration < max_iterations:
+    Go to Phase 1
+ELIF goal_achieved:
+    Print: "Goal achieved at iteration {N}! Final metric: {value}"
+    Print final summary
+    STOP
+ELSE:
+    Print final summary
+    STOP
+```
+
+**Final summary format:**
+```
+=== Autoresearch Complete (N/N iterations) ===
+Baseline: {baseline} → Final: {current} ({delta})
+Keeps: X | Discards: Y | Crashes: Z
+Best iteration: #{n} — {description}
+```
+
+### When Stuck (>5 consecutive discards)
+
+Applies to both modes:
+1. Re-read ALL in-scope files from scratch
+2. Re-read the original goal/direction
+3. Review entire results log for patterns
+4. Try combining 2-3 previously successful changes
+5. Try the OPPOSITE of what hasn't been working
+6. Try a radical architectural change
+
+## Crash Recovery
+
+- Syntax error → fix immediately, don't count as separate iteration
+- Runtime error → attempt fix (max 3 tries), then move on
+- Resource exhaustion (OOM) → revert, try smaller variant
+- Infinite loop/hang → kill after timeout, revert, avoid that approach
+- External dependency failure → skip, log, try different approach
+
+## Communication
+
+- **DO NOT** ask "should I keep going?" — in unbounded mode, YES. ALWAYS. In bounded mode, continue until N is reached.
+- **DO NOT** summarize after each iteration — just log and continue
+- **DO** print a brief one-line status every ~5 iterations (e.g., "Iteration 25: metric at 0.95, 8 keeps / 17 discards")
+- **DO** alert if you discover something surprising or game-changing
+- **DO** print a final summary when bounded loop completes
@@ -0,0 +1,92 @@
+# Core Principles — From Karpathy's Autoresearch
+
+7 universal principles extracted from autoresearch, applicable to ANY autonomous work.
+
+## 1. Constraint = Enabler
+
+Autonomy succeeds through intentional constraint, not despite it.
+
+| Autoresearch | Generalized |
+|--------------|-------------|
+| 630-line codebase | Bounded scope that fits agent context |
+| 5-minute time budget | Fixed iteration cost |
+| One metric (val_bpb) | Single mechanical success criterion |
+
+**Why:** Constraints enable agent confidence (full context understood), verification simplicity (no ambiguity), iteration velocity (low cost = rapid feedback loops).
+
+**Apply:** Before starting, define: what files are in-scope? What's the ONE metric? What's the time budget per iteration?
+
+## 2. Separate Strategy from Tactics
+
+Humans set direction. Agents execute iterations.
+
+| Strategic (Human) | Tactical (Agent) |
+|-------------------|------------------|
+| "Improve page load speed" | "Lazy-load images, code-split routes" |
+| "Increase test coverage" | "Add tests for uncovered edge cases" |
+| "Refactor auth module" | "Extract middleware, simplify handlers" |
+
+**Why:** Humans understand WHY. Agents handle HOW. Mixing these roles wastes both human creativity and agent iteration speed.
+
+**Apply:** Get clear direction from user (or program.md). Then iterate autonomously on implementation.
+
+## 3. Metrics Must Be Mechanical
+
+If you can't verify with a command, you can't iterate autonomously.
+
+- Tests pass/fail (exit code 0)
+- Benchmark time in milliseconds
+- Coverage percentage
+- Lighthouse score
+- File size in bytes
+- Lines of code count
+
+**Anti-pattern:** "Looks better", "probably improved", "seems cleaner" → these KILL autonomous loops because there's no decision function.
+
+**Apply:** Define the `grep` command (or equivalent) that extracts your metric BEFORE starting.
+
+## 4. Verification Must Be Fast
+
+If verification takes longer than the work itself, incentives misalign.
+
+| Fast (enables iteration) | Slow (kills iteration) |
+|-------------------------|----------------------|
+| Unit tests (seconds) | Full E2E suite (minutes) |
+| Type check (seconds) | Manual QA (hours) |
+| Lint check (instant) | Code review (async) |
+
+**Apply:** Use the FASTEST verification that still catches real problems. Save slow verification for after the loop.
+
+## 5. Iteration Cost Shapes Behavior
+
+- Cheap iteration → bold exploration, many experiments
+- Expensive iteration → conservative, few experiments
+
+Autoresearch: 5-minute cost → 100 experiments/night.
+Software: 10-second test → 360 experiments/hour.
+
+**Apply:** Minimize iteration cost. Use fast tests, incremental builds, targeted verification. Every minute saved = more experiments run.
+
+## 6. Git as Memory and Audit Trail
+
+Every successful change is committed. This enables:
+- **Causality tracking** — which change drove improvement?
+- **Stacking wins** — each commit builds on prior successes
+- **Pattern learning** — agent sees what worked in THIS codebase
+- **Human review** — researcher inspects agent's decision sequence
+
+**Apply:** Commit before verify. Revert on failure. Agent reads its own git history to inform next experiment.
+
+## 7. Honest Limitations
+
+State what the system can and cannot do. Don't oversell.
+
+Autoresearch CANNOT: change tokenizer, replace human direction, guarantee meaningful improvements.
+
+**Apply:** At setup, explicitly state constraints. If agent hits a wall it can't solve (missing permissions, external dependency, needs human judgment), say so clearly instead of guessing.
+
+## The Meta-Principle
+
+> Autonomy scales when you constrain scope, clarify success, mechanize verification, and let agents optimize tactics while humans optimize strategy.
+
+This isn't "removing humans." It's reassigning human effort from execution to direction. Humans become MORE valuable by focusing on irreducibly creative/strategic work.