claygeo
diff --git a/‎LOOP-LOG.md‎
Lines changed: 67 additions & 0 deletions b/‎LOOP-LOG.md‎
Lines changed: 67 additions & 0 deletions
diff --git a/‎docs/ARCHITECTURE.md‎
Lines changed: 184 additions & 34 deletions b/‎docs/ARCHITECTURE.md‎
Lines changed: 184 additions & 34 deletions
@@ -424,6 +424,73 @@ Tier 2 (if time):
 
 ---
 
+---
+
+## Iteration #7 — no-wait mode (operator: "just keep building")
+
+**Wake reason:** Operator interrupted iter #7 ScheduleWakeup wait with "why dont you just keep working, why wait till the morning, just keep building." Executed iter #7 work continuously without waiting for the 20:08 fallback.
+
+### Built this iteration
+
+**1. /plan-eng-review on V2 benchmark plan (parallel agent)**
+- Verdict: **APPROVE WITH CHANGES**
+- 3 blockers identified:
+  1. No reproducibility protocol (single-run-per-contract on stochastic agent ≠ benchmark)
+  2. Tier C N=5 too small (1/5 = 20% ± enormous CI; need ≥10 with severity-weighted scoring + near-miss controls)
+  3. Tier C false-find disclosure flow undefined (must close BEFORE sourcing, not after)
+- Other gaps: model version pinning, seed/temp spec, no held-out set, no inter-rater rubric
+- Phase 1 schema work + Phase 2 sourcing → run in parallel (saves 1 day)
+
+**2. /codex outside-voice on V2 plan (parallel agent)**
+- Verdict: **GO with constraints**
+- Tier C downside math is asymmetric in operator's favor IF pre-committed to publish-whatever-happens
+- Real failure mode: operator chickens out post-result and silently cuts Tier C (PRE-COMMIT THE DISCLOSURE POST)
+- Inspect-AI integration: sequential not parallel (corpus first, framework second)
+- Operator's 1.5 YOE filter: v2 only flips outcomes for ~15-20% of funnel (the high-ceiling AI-safety path)
+- 6-day hard cap on v2 work; if budget runs out, ship Tier C standalone — gravy is gravy
+
+**3. Integrated both reviews into PLAN-V2-BENCHMARK-EXPANSION.md**
+- Tier C: 5 → 10 contracts (statistical, not vibe). Includes 2-3 near-miss controls. Severity-weighted scoring.
+- Reproducibility protocol added: n=3, model version `claude-sonnet-4-20250514` pinned, temperature/top-p specified, prompt git SHA logged
+- Tier C disclosure pre-commitment (drafted text written into the plan; operator can't quietly cut Tier C post-result)
+- 6-day hard cap with kill criteria (Phase 2 standalone if Phase 3+ runs out of time)
+- Pause/reassess gate: if Tier C FPR > 80%, halt before Phase 3 spend
+- Phase 1 schema parallel with Phase 2 sourcing
+- Total revised budget: $110-150 API (was $30-50; bumped for n=3 × 57 contracts)
+
+**4. ARCHITECTURE.md polish** — recruiter-readable rewrite
+- Added 60-second TL;DR at top with cross-links to leaderboard, PROOF.md, V2 plan
+- Added "Two Projects" comparison table (Solhunt vs Solhunt-Duel)
+- New Solhunt-Duel architecture section: full mermaid sequence diagram of Red↔Blue duel + verifyPatch flow + 4 gates table
+- Convergence taxonomy (HARDENED, BLUE_FAILED, RED_FAILED, SAME_CLASS_ESCAPED, TIMEOUT) with definitions
+- Original Solhunt sections preserved (renamed for clarity)
+- Updated data model ER diagram with duel_runs + duel_rounds tables
+- New "Why fresh-address bytecode cloning" design decision (Solhunt-Duel-specific gotcha)
+- Honest limitations section: small-N, sandbox limits, MEV invisibility, multi-contract chain failures
+- Added Claude Code CLI backend to model-abstraction diagram (Max-subscription overnight runs)
+
+**5. INTERVIEW-WALKTHROUGH-5MIN.md** — screen-call architecture script
+- 5-beat structure (premise / setup / 4 gates / honest failure / what's next)
+- ~750-900 words spoken, fits 5 minutes with interviewer-interrupt buffer
+- Variants: 3-min screening, 15-min deep technical, behavioral/hiring-manager
+- "What NOT to say" list (no SCONE direct compare, no oversell convergence, no startup pitch, no trash adjacent work)
+- Practice notes: read aloud twice, time yourself, the 67%/13% beat is most underdone on first try
+- Honest meta: match artifact to audience (Curaleaf for payments roles, Solhunt-Duel for AI safety / agent eval)
+
+### Iter #8 plan (continuing without wait)
+
+Tier 1:
+- [ ] Tier C contract candidate list seed — research 7-10 audited-clean mainnet contracts for V2 sourcing (no execution, just candidate research)
+- [ ] Code4rena Tier B candidate research — find 3-5 high-confidence Code4rena finding candidates (verified mainnet contracts, single-contract attack vectors, vuln class our corpus lacks)
+
+Tier 2 (if time):
+- [ ] Inspect-AI PR opportunity research — find ONE substantive PR target on Anthropic's eval framework
+- [ ] README.md polish — recruiter-front-door optimization
+
+**Strict skip:** any operator-only task (Substack publish, Sourcegraph submit, etc.).
+
+---
+
 ## Operator wake-up summary (will be in last iteration before morning)
 
 [empty — will populate near morning]
 
@@ -1,6 +1,97 @@
-# solhunt Architecture
+# Architecture
 
-## End-to-end scan flow
+> **TL;DR (60-second read):** This repo holds two related but separate projects.
+> **Solhunt** is a single-agent scanner: give it a contract, it writes a Foundry exploit if one exists, otherwise it emits a structured no-find report. Numbers: 67.7% on a curated 32-contract DeFiHackLabs subset, 13.7% on a 95-contract random draw — both published honestly. **Solhunt-Duel** sits on top: Red writes the exploit, Blue writes a Solidity patch, a server-side harness enforces four gates the LLMs cannot see or modify (`exploitNeutralized`, `benignPassed`, `freshAttackerNeutralized`, `storageLayoutPreserved`) before declaring convergence. The premise: agents will lie about success if you let them, so the verdict lives outside the agent.
+>
+> **Live artifacts:** [leaderboard](https://solhunt-duel.netlify.app/leaderboard/) · [gate verifier walkthrough (PROOF.md)](PROOF.md) · [v2 corpus expansion plan](PLAN-V2-BENCHMARK-EXPANSION.md)
+
+---
+
+## The two projects
+
+| | Solhunt (predecessor) | Solhunt-Duel (current) |
+|---|---|---|
+| **Mode** | Single-agent: Red writes exploits | Adversarial: Red writes exploits, Blue writes patches |
+| **Verdict source** | Foundry forge_test exit code | Foundry forge_test against four server-side gates |
+| **Convergence claim** | "Found exploit" iff forge passes | "Hardened" iff Red exploits AND Blue patches AND all 4 gates green |
+| **Output** | Per-contract exploit-or-no-find report | Per-duel round-by-round trace + final convergence label |
+| **Headline numbers** | 67.7% curated / 13.7% random (32 / 95 contracts) | 1 hardened / 3 red-failed / 3 blue-failed / 1 same-class-escaped / 2 timeout (10 contracts in Phase 4) |
+
+The two share docker sandbox + anvil fork + supabase persistence. They differ in agent loop and verifier.
+
+---
+
+## Solhunt-Duel — adversarial loop with server-side gates
+
+```mermaid
+sequenceDiagram
+    autonumber
+    participant Op as Operator
+    participant Orch as Orchestrator
+    participant Red as Red Agent
+    participant Blue as Blue Agent
+    participant Verify as verifyPatch (harness)
+    participant Anvil as Anvil Fork
+    participant SB as Supabase
+
+    Op->>Orch: duel --target dexible
+    Orch->>Anvil: Start fork at historical block
+    Orch->>Anvil: Clone runtime bytecode to fresh address
+    Note over Anvil: Fresh addr means no constructor state<br/>(catches uninitialized-storage bugs)
+
+    loop Up to N rounds
+        Orch->>Red: Source + prior round context
+        Red->>Anvil: Iterative tool calls (read/edit/forge_test)
+        Red-->>Orch: Exploit.t.sol (or no-find)
+
+        alt Red found exploit
+            Orch->>Blue: Source + Red's exploit
+            Blue->>Anvil: Iterative patch attempts
+            Blue->>Verify: verify_patch (after each patch)
+            Verify->>Anvil: Build patched + extract bytecode + storage layout
+            Verify->>Anvil: vm.etch patched bytecode at fresh addr
+            Verify->>Anvil: Run sanity (exploit on ORIGINAL = PASS expected)
+            Verify->>Anvil: Run exploit on PATCHED = FAIL expected
+            Verify->>Anvil: Run exploit with FRESH attacker label = FAIL expected
+            Verify->>Anvil: Run benign suite on PATCHED = PASS expected
+            Verify->>Verify: Compare storage layouts = unchanged expected
+            Verify-->>Blue: { exploitNeutralized, benignPassed, freshAttackerNeutralized, storageLayoutChanged, regressions, error? }
+            Blue-->>Orch: Final patch (when 4 gates green) or budget exhausted
+        end
+
+        Orch->>SB: Round audit trail (red turns, blue turns, verify verdicts)
+    end
+
+    Orch->>SB: Duel result (HARDENED / BLUE_FAILED / RED_FAILED / SAME_CLASS_ESCAPED / TIMEOUT)
+    Orch->>Op: Convergence label + leaderboard row
+```
+
+The four gates and what they catch:
+
+| Gate | Computed in `verifyPatch()` | Defeats |
+|---|---|---|
+| `exploitNeutralized` | exploit FAILS on patched bytecode | "patch did nothing" |
+| `freshAttackerNeutralized` | exploit FAILS from a different EOA | "patch only blocks the original attacker address" |
+| `benignPassed` | benign happy-path tests still PASS | "patch deleted the function entirely" |
+| `storageLayoutChanged == false` | original vs patched storage layout slots/offsets/types match | "patch silently bricks existing state" |
+
+Source: [`src/sandbox/patch-harness.ts`](https://github.com/claygeo/solhunt-duel/blob/master/src/sandbox/patch-harness.ts) · Full walkthrough: [PROOF.md](PROOF.md)
+
+**Convergence taxonomy** (what each leaderboard label means):
+
+- `HARDENED` — Red found an exploit, Blue produced a patch, all 4 gates green
+- `BLUE_FAILED` — Red found exploit, Blue exhausted budget without all 4 gates green
+- `RED_FAILED` — Red emitted no-find within budget; contract may be safe OR our agent missed
+- `SAME_CLASS_ESCAPED` — Blue's patch passed gates, Red pivoted to a DIFFERENT vulnerability of the same class — escape, not hardening
+- `TIMEOUT` — wall-clock cap hit before any agent emitted a final verdict
+
+The taxonomy is deliberately five-way, not two-way. "Did Blue patch successfully" and "is the contract now safe" are different questions; the labels keep them separate.
+
+---
+
+## Solhunt (single-agent scanner) — predecessor
+
+The original loop. Used to produce the headline 67.7% / 13.7% numbers.
 
 ```mermaid
 sequenceDiagram
@@ -10,7 +101,7 @@ sequenceDiagram
     participant ES as Etherscan API
     participant Docker as Docker Sandbox
     participant Anvil as Anvil Fork
-    participant LLM as LLM (Claude/Qwen)
+    participant LLM as LLM (Claude Sonnet 4)
     participant SB as Supabase
 
     User->>CLI: benchmark --dataset X --model Y
@@ -41,14 +132,42 @@ sequenceDiagram
     CLI->>User: Results table + cost summary
 ```
 
-## Data model
+### Agent loop state machine
+
+```mermaid
+stateDiagram-v2
+    [*] --> Read: initial prompt
+    Read --> Identify: source read
+    Identify --> Write: pattern detected
+    Write --> Test: exploit.t.sol created
+    Test --> Write: compile error<br/>(str_replace)
+    Test --> Test: forge fails<br/>(retry different vector)
+    Test --> Report: forge passes
+    Identify --> Report: no exploit found<br/>after iter N
+    Write --> Report: iteration budget hit
+    Report --> [*]
+
+    Read: read source files
+    Identify: identify attack vector
+    Write: write exploit test
+    Test: run forge_test
+    Report: emit SOLHUNT_REPORT
+```
+
+---
+
+## Shared infrastructure
+
+### Data model
 
 ```mermaid
 erDiagram
     benchmark_runs ||--o{ scan_runs : produces
     scan_runs ||--o{ tool_calls : logs
     contracts ||--o{ scan_runs : scanned
     scan_runs ||--o{ artifacts : generates
+    duel_runs ||--o{ duel_rounds : has
+    duel_rounds ||--o{ scan_runs : reuses
 
     benchmark_runs {
         uuid id PK
@@ -77,6 +196,26 @@ erDiagram
         string conversation_path
     }
 
+    duel_runs {
+        uuid id PK
+        uuid contract_id FK
+        string convergence
+        int rounds
+        float wall_time_seconds
+        float notional_cost
+        timestamp created_at
+    }
+
+    duel_rounds {
+        uuid id PK
+        uuid duel_run_id FK
+        int round_index
+        bool exploit_neutralized
+        bool benign_passed
+        bool fresh_attacker_neutralized
+        bool storage_layout_changed
+    }
+
     tool_calls {
         int id PK
         uuid scan_run_id FK
@@ -100,30 +239,6 @@ erDiagram
     }
 ```
 
-## Agent loop state machine
-
-```mermaid
-stateDiagram-v2
-    [*] --> Read: initial prompt
-    Read --> Identify: source read
-    Identify --> Write: pattern detected
-    Write --> Test: exploit.t.sol created
-    Test --> Write: compile error<br/>(str_replace)
-    Test --> Test: forge fails<br/>(retry different vector)
-    Test --> Report: forge passes
-    Identify --> Report: no exploit found<br/>after iter N
-    Write --> Report: iteration budget hit
-    Report --> [*]
-
-    Read: read source files
-    Identify: identify attack vector
-    Write: write exploit test
-    Test: run forge_test
-    Report: emit SOLHUNT_REPORT
-```
-
-## Key design decisions
-
 ### Why Docker sandbox per scan
 - Isolation: agent can't escape or affect host
 - Reproducibility: every scan starts from identical state
@@ -153,29 +268,39 @@ stateDiagram-v2
 - LLMs emit lowercase hex addresses frequently
 - Forge rejects with EIP-55 checksum errors
 - Agent wastes 5-10 iterations fighting this
-- Fix: regex replace all 0x[40 hex] with keccak256-computed checksums on every .sol file write
+- Fix: regex replace all `0x[40 hex]` with keccak256-computed checksums on every .sol file write
 
 ### Why vm.prank false-positive guard
 - `vm.prank(admin)` makes next call appear from admin
 - Agent discovered it could "exploit" access-controlled functions this way
 - But pranking as owner to call owner functions proves nothing
 - System prompt now lists valid uses (whale, EOA, governance-after-vote) and flags invalid use
 
-## Cost circuit breaker
+### Why fresh-address bytecode cloning (Solhunt-Duel only)
+- Original contract address has constructor state, immutables, possibly initializer storage
+- vm.etch swaps runtime bytecode but does NOT replay the constructor
+- If we just etched on the original address, Blue's patches that depend on initializer storage would silently fail
+- Fix: clone bytecode to a fresh deterministic address with anvil_setCode, then run all verify stages against that address. Each stage anvil_setCode's the correct variant (original / patched) before running.
+
+---
+
+## Cost controls
 
 Two layers of protection:
 
-1. **Failure circuit breaker** (existing):
+1. **Failure circuit breaker** (existing, pre-Duel):
    - If last 3 contracts all failed without producing a report → stop
    - If last 3 contracts all hit the same error → stop
 
-2. **Budget circuit breaker** (new):
+2. **Budget circuit breaker**:
    - `--max-budget <usd>` global cap
    - Checks cumulative cost between batches
    - Stops immediately if cap exceeded
    - Warns at 75% usage
 
-Without these, a stuck agent in a 30-iteration loop at $3+ per contract could burn through a $80 budget in the first ~27 contracts.
+Without these, a stuck agent in a 30-iteration loop at $3+ per contract could burn through an $80 budget in the first ~27 contracts.
+
+---
 
 ## Model abstraction
 
@@ -188,6 +313,31 @@ flowchart LR
     Provider -->|openai format| Gemini[Gemini 2.0 Flash]
     Provider -->|anthropic SDK| Direct[Direct Anthropic]
     Provider -->|localhost| Ollama[Ollama local models]
+    Provider -->|claude-cli| ClaudeCLI[Claude Code CLI]
 ```
 
-One provider abstraction, multiple backends. Qwen-specific handling: append `/no_think` to disable reasoning on local models. Cost calculated per-token against PRICING table.
+One provider abstraction, multiple backends. Qwen-specific handling: append `/no_think` to disable reasoning on local models. Cost calculated per-token against PRICING table. Solhunt-Duel adds a Claude Code CLI backend (uses Max subscription, not API metered) for autonomous overnight runs.
+
+---
+
+## Where to read the code
+
+For the gate verifier (the load-bearing claim of Solhunt-Duel): start at [`src/sandbox/patch-harness.ts:89`](https://github.com/claygeo/solhunt-duel/blob/master/src/sandbox/patch-harness.ts#L89) — the `verifyPatch()` function is the entire gate-checking pipeline in 130 lines.
+
+For the duel orchestration: [`src/duel/orchestrator.ts`](https://github.com/claygeo/solhunt-duel/blob/master/src/duel/orchestrator.ts) — Red/Blue round coordination, fresh-address bytecode cloning, audit trail.
+
+For the single-agent scanner: [`src/agent/`](https://github.com/claygeo/solhunt-duel/tree/master/src/agent) — read loop, prompts, tool definitions.
+
+For the benchmark runner: [`src/bench/`](https://github.com/claygeo/solhunt-duel/tree/master/src/bench) — phase0 (single-agent) and phase1 (duel) runner entry points.
+
+---
+
+## Honest limitations
+
+- The Phase 4 duel set is 10 contracts. That's small-N. The v2 expansion plan in [PLAN-V2-BENCHMARK-EXPANSION.md](PLAN-V2-BENCHMARK-EXPANSION.md) is how we get to 50+ with vuln-class diversity and adversarial-clean baseline.
+- The 67.7% / 13.7% gap is real and unresolved. Sandbox limitations dominate the failure modes on random samples (compiler version mismatches, cross-contract dependencies the harness doesn't pre-load). See [README §The numbers](https://github.com/claygeo/solhunt-duel/blob/master/README.md#the-numbers--be-precise) for the failure breakdown.
+- The gates falsify positive claims, not negative ones. A `HARDENED` verdict means "we ran the four checks and they all passed." It does NOT mean "the contract is now bulletproof." Different claim, different scope.
+- MEV and front-running are invisible to the verifier. Deterministic block timing on a frozen fork. Real-world transaction ordering is not modeled.
+- Multi-contract attack chains that need specific pool / oracle state at specific blocks fail on our harness fork even when the exploit works on mainnet. We don't pre-load arbitrary DeFi state. Those collapse to RED_FAILED, not falsely-hardened runs.
+
+If you find a counter-example to any of these, please file an issue. Honest negative results compound; spin destroys.