Skip to content

Commit d995263

Browse files
committed
docs: ARCHITECTURE recruiter-readable + V2 plan integrated reviews + interview walkthrough
Iteration #7 — review feedback integrated + new artifacts: PLAN-V2-BENCHMARK-EXPANSION.md UPDATES (post-review) Both /plan-eng-review (APPROVE WITH CHANGES) and /codex outside-voice (GO with constraints) reviewed the v1 plan. Integrated their feedback: - Tier C: 5 -> 10 contracts. 1/5 = 20% +/- enormous CI is a vibe, not a measurement. 10 with severity-weighted scoring + 2-3 near-miss controls (audited-clean contracts that had a finding patched in audit). - Reproducibility protocol: n=3 runs/contract, median + IQR reported, model version pinned (claude-sonnet-4-20250514), temperature/top-p specified, prompt git SHA logged. - Tier C disclosure PRE-COMMITTED — text written into the leaderboard before any run. Real failure mode is operator silently cutting Tier C post-result; pre-commit prevents that. - 6-day hard cap on v2 work; kill criteria if Phase 3+ runs out. - Pause/reassess gate: halt before Phase 3 spend if Tier C FPR > 80%. - Phase 1 schema parallel with Phase 2 sourcing (saves 1 day). - Total revised budget: $110-150 API (was $30-50; bumped for n=3 x 57 contracts). ARCHITECTURE.md (recruiter-readable rewrite) - 60-second TL;DR at top with leaderboard / PROOF / V2 plan cross-links - "Two Projects" comparison table (Solhunt single-agent vs Solhunt-Duel adversarial) - New Solhunt-Duel architecture section: mermaid sequence diagram of Red<->Blue + verifyPatch + 4 gates table - Convergence taxonomy: HARDENED / BLUE_FAILED / RED_FAILED / SAME_CLASS_ESCAPED / TIMEOUT (definitions) - Updated ER diagram: duel_runs + duel_rounds tables - New "Why fresh-address bytecode cloning" design decision - Honest limitations: small-N, sandbox limits, MEV invisibility, multi-contract chain failures - Code-pointer section: where to read each subsystem INTERVIEW-WALKTHROUGH-5MIN.md (NEW) - 5-beat structure for screen calls: premise / setup / 4 gates / honest failure / what's next, ~750-900 words, fits 5 min with interruption buffer - Variants: 3-min screening, 15-min deep technical, behavioral/hiring-manager - "What NOT to say" list (no SCONE direct compare, no oversell convergence, no startup pitch, no trashing adjacent work) - Practice notes: read aloud twice, time yourself, the 67%/13% beat is the most underdone on first practice - Honest meta: match artifact to audience
1 parent ef7df9a commit d995263

4 files changed

Lines changed: 476 additions & 61 deletions

File tree

LOOP-LOG.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -424,6 +424,73 @@ Tier 2 (if time):
424424

425425
---
426426

427+
---
428+
429+
## Iteration #7 — no-wait mode (operator: "just keep building")
430+
431+
**Wake reason:** Operator interrupted iter #7 ScheduleWakeup wait with "why dont you just keep working, why wait till the morning, just keep building." Executed iter #7 work continuously without waiting for the 20:08 fallback.
432+
433+
### Built this iteration
434+
435+
**1. /plan-eng-review on V2 benchmark plan (parallel agent)**
436+
- Verdict: **APPROVE WITH CHANGES**
437+
- 3 blockers identified:
438+
1. No reproducibility protocol (single-run-per-contract on stochastic agent ≠ benchmark)
439+
2. Tier C N=5 too small (1/5 = 20% ± enormous CI; need ≥10 with severity-weighted scoring + near-miss controls)
440+
3. Tier C false-find disclosure flow undefined (must close BEFORE sourcing, not after)
441+
- Other gaps: model version pinning, seed/temp spec, no held-out set, no inter-rater rubric
442+
- Phase 1 schema work + Phase 2 sourcing → run in parallel (saves 1 day)
443+
444+
**2. /codex outside-voice on V2 plan (parallel agent)**
445+
- Verdict: **GO with constraints**
446+
- Tier C downside math is asymmetric in operator's favor IF pre-committed to publish-whatever-happens
447+
- Real failure mode: operator chickens out post-result and silently cuts Tier C (PRE-COMMIT THE DISCLOSURE POST)
448+
- Inspect-AI integration: sequential not parallel (corpus first, framework second)
449+
- Operator's 1.5 YOE filter: v2 only flips outcomes for ~15-20% of funnel (the high-ceiling AI-safety path)
450+
- 6-day hard cap on v2 work; if budget runs out, ship Tier C standalone — gravy is gravy
451+
452+
**3. Integrated both reviews into PLAN-V2-BENCHMARK-EXPANSION.md**
453+
- Tier C: 5 → 10 contracts (statistical, not vibe). Includes 2-3 near-miss controls. Severity-weighted scoring.
454+
- Reproducibility protocol added: n=3, model version `claude-sonnet-4-20250514` pinned, temperature/top-p specified, prompt git SHA logged
455+
- Tier C disclosure pre-commitment (drafted text written into the plan; operator can't quietly cut Tier C post-result)
456+
- 6-day hard cap with kill criteria (Phase 2 standalone if Phase 3+ runs out of time)
457+
- Pause/reassess gate: if Tier C FPR > 80%, halt before Phase 3 spend
458+
- Phase 1 schema parallel with Phase 2 sourcing
459+
- Total revised budget: $110-150 API (was $30-50; bumped for n=3 × 57 contracts)
460+
461+
**4. ARCHITECTURE.md polish** — recruiter-readable rewrite
462+
- Added 60-second TL;DR at top with cross-links to leaderboard, PROOF.md, V2 plan
463+
- Added "Two Projects" comparison table (Solhunt vs Solhunt-Duel)
464+
- New Solhunt-Duel architecture section: full mermaid sequence diagram of Red↔Blue duel + verifyPatch flow + 4 gates table
465+
- Convergence taxonomy (HARDENED, BLUE_FAILED, RED_FAILED, SAME_CLASS_ESCAPED, TIMEOUT) with definitions
466+
- Original Solhunt sections preserved (renamed for clarity)
467+
- Updated data model ER diagram with duel_runs + duel_rounds tables
468+
- New "Why fresh-address bytecode cloning" design decision (Solhunt-Duel-specific gotcha)
469+
- Honest limitations section: small-N, sandbox limits, MEV invisibility, multi-contract chain failures
470+
- Added Claude Code CLI backend to model-abstraction diagram (Max-subscription overnight runs)
471+
472+
**5. INTERVIEW-WALKTHROUGH-5MIN.md** — screen-call architecture script
473+
- 5-beat structure (premise / setup / 4 gates / honest failure / what's next)
474+
- ~750-900 words spoken, fits 5 minutes with interviewer-interrupt buffer
475+
- Variants: 3-min screening, 15-min deep technical, behavioral/hiring-manager
476+
- "What NOT to say" list (no SCONE direct compare, no oversell convergence, no startup pitch, no trash adjacent work)
477+
- Practice notes: read aloud twice, time yourself, the 67%/13% beat is most underdone on first try
478+
- Honest meta: match artifact to audience (Curaleaf for payments roles, Solhunt-Duel for AI safety / agent eval)
479+
480+
### Iter #8 plan (continuing without wait)
481+
482+
Tier 1:
483+
- [ ] Tier C contract candidate list seed — research 7-10 audited-clean mainnet contracts for V2 sourcing (no execution, just candidate research)
484+
- [ ] Code4rena Tier B candidate research — find 3-5 high-confidence Code4rena finding candidates (verified mainnet contracts, single-contract attack vectors, vuln class our corpus lacks)
485+
486+
Tier 2 (if time):
487+
- [ ] Inspect-AI PR opportunity research — find ONE substantive PR target on Anthropic's eval framework
488+
- [ ] README.md polish — recruiter-front-door optimization
489+
490+
**Strict skip:** any operator-only task (Substack publish, Sourcegraph submit, etc.).
491+
492+
---
493+
427494
## Operator wake-up summary (will be in last iteration before morning)
428495

429496
[empty — will populate near morning]

docs/ARCHITECTURE.md

Lines changed: 184 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,97 @@
1-
# solhunt Architecture
1+
# Architecture
22

3-
## End-to-end scan flow
3+
> **TL;DR (60-second read):** This repo holds two related but separate projects.
4+
> **Solhunt** is a single-agent scanner: give it a contract, it writes a Foundry exploit if one exists, otherwise it emits a structured no-find report. Numbers: 67.7% on a curated 32-contract DeFiHackLabs subset, 13.7% on a 95-contract random draw — both published honestly. **Solhunt-Duel** sits on top: Red writes the exploit, Blue writes a Solidity patch, a server-side harness enforces four gates the LLMs cannot see or modify (`exploitNeutralized`, `benignPassed`, `freshAttackerNeutralized`, `storageLayoutPreserved`) before declaring convergence. The premise: agents will lie about success if you let them, so the verdict lives outside the agent.
5+
>
6+
> **Live artifacts:** [leaderboard](https://solhunt-duel.netlify.app/leaderboard/) · [gate verifier walkthrough (PROOF.md)](PROOF.md) · [v2 corpus expansion plan](PLAN-V2-BENCHMARK-EXPANSION.md)
7+
8+
---
9+
10+
## The two projects
11+
12+
| | Solhunt (predecessor) | Solhunt-Duel (current) |
13+
|---|---|---|
14+
| **Mode** | Single-agent: Red writes exploits | Adversarial: Red writes exploits, Blue writes patches |
15+
| **Verdict source** | Foundry forge_test exit code | Foundry forge_test against four server-side gates |
16+
| **Convergence claim** | "Found exploit" iff forge passes | "Hardened" iff Red exploits AND Blue patches AND all 4 gates green |
17+
| **Output** | Per-contract exploit-or-no-find report | Per-duel round-by-round trace + final convergence label |
18+
| **Headline numbers** | 67.7% curated / 13.7% random (32 / 95 contracts) | 1 hardened / 3 red-failed / 3 blue-failed / 1 same-class-escaped / 2 timeout (10 contracts in Phase 4) |
19+
20+
The two share docker sandbox + anvil fork + supabase persistence. They differ in agent loop and verifier.
21+
22+
---
23+
24+
## Solhunt-Duel — adversarial loop with server-side gates
25+
26+
```mermaid
27+
sequenceDiagram
28+
autonumber
29+
participant Op as Operator
30+
participant Orch as Orchestrator
31+
participant Red as Red Agent
32+
participant Blue as Blue Agent
33+
participant Verify as verifyPatch (harness)
34+
participant Anvil as Anvil Fork
35+
participant SB as Supabase
36+
37+
Op->>Orch: duel --target dexible
38+
Orch->>Anvil: Start fork at historical block
39+
Orch->>Anvil: Clone runtime bytecode to fresh address
40+
Note over Anvil: Fresh addr means no constructor state<br/>(catches uninitialized-storage bugs)
41+
42+
loop Up to N rounds
43+
Orch->>Red: Source + prior round context
44+
Red->>Anvil: Iterative tool calls (read/edit/forge_test)
45+
Red-->>Orch: Exploit.t.sol (or no-find)
46+
47+
alt Red found exploit
48+
Orch->>Blue: Source + Red's exploit
49+
Blue->>Anvil: Iterative patch attempts
50+
Blue->>Verify: verify_patch (after each patch)
51+
Verify->>Anvil: Build patched + extract bytecode + storage layout
52+
Verify->>Anvil: vm.etch patched bytecode at fresh addr
53+
Verify->>Anvil: Run sanity (exploit on ORIGINAL = PASS expected)
54+
Verify->>Anvil: Run exploit on PATCHED = FAIL expected
55+
Verify->>Anvil: Run exploit with FRESH attacker label = FAIL expected
56+
Verify->>Anvil: Run benign suite on PATCHED = PASS expected
57+
Verify->>Verify: Compare storage layouts = unchanged expected
58+
Verify-->>Blue: { exploitNeutralized, benignPassed, freshAttackerNeutralized, storageLayoutChanged, regressions, error? }
59+
Blue-->>Orch: Final patch (when 4 gates green) or budget exhausted
60+
end
61+
62+
Orch->>SB: Round audit trail (red turns, blue turns, verify verdicts)
63+
end
64+
65+
Orch->>SB: Duel result (HARDENED / BLUE_FAILED / RED_FAILED / SAME_CLASS_ESCAPED / TIMEOUT)
66+
Orch->>Op: Convergence label + leaderboard row
67+
```
68+
69+
The four gates and what they catch:
70+
71+
| Gate | Computed in `verifyPatch()` | Defeats |
72+
|---|---|---|
73+
| `exploitNeutralized` | exploit FAILS on patched bytecode | "patch did nothing" |
74+
| `freshAttackerNeutralized` | exploit FAILS from a different EOA | "patch only blocks the original attacker address" |
75+
| `benignPassed` | benign happy-path tests still PASS | "patch deleted the function entirely" |
76+
| `storageLayoutChanged == false` | original vs patched storage layout slots/offsets/types match | "patch silently bricks existing state" |
77+
78+
Source: [`src/sandbox/patch-harness.ts`](https://github.com/claygeo/solhunt-duel/blob/master/src/sandbox/patch-harness.ts) · Full walkthrough: [PROOF.md](PROOF.md)
79+
80+
**Convergence taxonomy** (what each leaderboard label means):
81+
82+
- `HARDENED` — Red found an exploit, Blue produced a patch, all 4 gates green
83+
- `BLUE_FAILED` — Red found exploit, Blue exhausted budget without all 4 gates green
84+
- `RED_FAILED` — Red emitted no-find within budget; contract may be safe OR our agent missed
85+
- `SAME_CLASS_ESCAPED` — Blue's patch passed gates, Red pivoted to a DIFFERENT vulnerability of the same class — escape, not hardening
86+
- `TIMEOUT` — wall-clock cap hit before any agent emitted a final verdict
87+
88+
The taxonomy is deliberately five-way, not two-way. "Did Blue patch successfully" and "is the contract now safe" are different questions; the labels keep them separate.
89+
90+
---
91+
92+
## Solhunt (single-agent scanner) — predecessor
93+
94+
The original loop. Used to produce the headline 67.7% / 13.7% numbers.
495

596
```mermaid
697
sequenceDiagram
@@ -10,7 +101,7 @@ sequenceDiagram
10101
participant ES as Etherscan API
11102
participant Docker as Docker Sandbox
12103
participant Anvil as Anvil Fork
13-
participant LLM as LLM (Claude/Qwen)
104+
participant LLM as LLM (Claude Sonnet 4)
14105
participant SB as Supabase
15106
16107
User->>CLI: benchmark --dataset X --model Y
@@ -41,14 +132,42 @@ sequenceDiagram
41132
CLI->>User: Results table + cost summary
42133
```
43134

44-
## Data model
135+
### Agent loop state machine
136+
137+
```mermaid
138+
stateDiagram-v2
139+
[*] --> Read: initial prompt
140+
Read --> Identify: source read
141+
Identify --> Write: pattern detected
142+
Write --> Test: exploit.t.sol created
143+
Test --> Write: compile error<br/>(str_replace)
144+
Test --> Test: forge fails<br/>(retry different vector)
145+
Test --> Report: forge passes
146+
Identify --> Report: no exploit found<br/>after iter N
147+
Write --> Report: iteration budget hit
148+
Report --> [*]
149+
150+
Read: read source files
151+
Identify: identify attack vector
152+
Write: write exploit test
153+
Test: run forge_test
154+
Report: emit SOLHUNT_REPORT
155+
```
156+
157+
---
158+
159+
## Shared infrastructure
160+
161+
### Data model
45162

46163
```mermaid
47164
erDiagram
48165
benchmark_runs ||--o{ scan_runs : produces
49166
scan_runs ||--o{ tool_calls : logs
50167
contracts ||--o{ scan_runs : scanned
51168
scan_runs ||--o{ artifacts : generates
169+
duel_runs ||--o{ duel_rounds : has
170+
duel_rounds ||--o{ scan_runs : reuses
52171
53172
benchmark_runs {
54173
uuid id PK
@@ -77,6 +196,26 @@ erDiagram
77196
string conversation_path
78197
}
79198
199+
duel_runs {
200+
uuid id PK
201+
uuid contract_id FK
202+
string convergence
203+
int rounds
204+
float wall_time_seconds
205+
float notional_cost
206+
timestamp created_at
207+
}
208+
209+
duel_rounds {
210+
uuid id PK
211+
uuid duel_run_id FK
212+
int round_index
213+
bool exploit_neutralized
214+
bool benign_passed
215+
bool fresh_attacker_neutralized
216+
bool storage_layout_changed
217+
}
218+
80219
tool_calls {
81220
int id PK
82221
uuid scan_run_id FK
@@ -100,30 +239,6 @@ erDiagram
100239
}
101240
```
102241

103-
## Agent loop state machine
104-
105-
```mermaid
106-
stateDiagram-v2
107-
[*] --> Read: initial prompt
108-
Read --> Identify: source read
109-
Identify --> Write: pattern detected
110-
Write --> Test: exploit.t.sol created
111-
Test --> Write: compile error<br/>(str_replace)
112-
Test --> Test: forge fails<br/>(retry different vector)
113-
Test --> Report: forge passes
114-
Identify --> Report: no exploit found<br/>after iter N
115-
Write --> Report: iteration budget hit
116-
Report --> [*]
117-
118-
Read: read source files
119-
Identify: identify attack vector
120-
Write: write exploit test
121-
Test: run forge_test
122-
Report: emit SOLHUNT_REPORT
123-
```
124-
125-
## Key design decisions
126-
127242
### Why Docker sandbox per scan
128243
- Isolation: agent can't escape or affect host
129244
- Reproducibility: every scan starts from identical state
@@ -153,29 +268,39 @@ stateDiagram-v2
153268
- LLMs emit lowercase hex addresses frequently
154269
- Forge rejects with EIP-55 checksum errors
155270
- Agent wastes 5-10 iterations fighting this
156-
- Fix: regex replace all 0x[40 hex] with keccak256-computed checksums on every .sol file write
271+
- Fix: regex replace all `0x[40 hex]` with keccak256-computed checksums on every .sol file write
157272

158273
### Why vm.prank false-positive guard
159274
- `vm.prank(admin)` makes next call appear from admin
160275
- Agent discovered it could "exploit" access-controlled functions this way
161276
- But pranking as owner to call owner functions proves nothing
162277
- System prompt now lists valid uses (whale, EOA, governance-after-vote) and flags invalid use
163278

164-
## Cost circuit breaker
279+
### Why fresh-address bytecode cloning (Solhunt-Duel only)
280+
- Original contract address has constructor state, immutables, possibly initializer storage
281+
- vm.etch swaps runtime bytecode but does NOT replay the constructor
282+
- If we just etched on the original address, Blue's patches that depend on initializer storage would silently fail
283+
- Fix: clone bytecode to a fresh deterministic address with anvil_setCode, then run all verify stages against that address. Each stage anvil_setCode's the correct variant (original / patched) before running.
284+
285+
---
286+
287+
## Cost controls
165288

166289
Two layers of protection:
167290

168-
1. **Failure circuit breaker** (existing):
291+
1. **Failure circuit breaker** (existing, pre-Duel):
169292
- If last 3 contracts all failed without producing a report → stop
170293
- If last 3 contracts all hit the same error → stop
171294

172-
2. **Budget circuit breaker** (new):
295+
2. **Budget circuit breaker**:
173296
- `--max-budget <usd>` global cap
174297
- Checks cumulative cost between batches
175298
- Stops immediately if cap exceeded
176299
- Warns at 75% usage
177300

178-
Without these, a stuck agent in a 30-iteration loop at $3+ per contract could burn through a $80 budget in the first ~27 contracts.
301+
Without these, a stuck agent in a 30-iteration loop at $3+ per contract could burn through an $80 budget in the first ~27 contracts.
302+
303+
---
179304

180305
## Model abstraction
181306

@@ -188,6 +313,31 @@ flowchart LR
188313
Provider -->|openai format| Gemini[Gemini 2.0 Flash]
189314
Provider -->|anthropic SDK| Direct[Direct Anthropic]
190315
Provider -->|localhost| Ollama[Ollama local models]
316+
Provider -->|claude-cli| ClaudeCLI[Claude Code CLI]
191317
```
192318

193-
One provider abstraction, multiple backends. Qwen-specific handling: append `/no_think` to disable reasoning on local models. Cost calculated per-token against PRICING table.
319+
One provider abstraction, multiple backends. Qwen-specific handling: append `/no_think` to disable reasoning on local models. Cost calculated per-token against PRICING table. Solhunt-Duel adds a Claude Code CLI backend (uses Max subscription, not API metered) for autonomous overnight runs.
320+
321+
---
322+
323+
## Where to read the code
324+
325+
For the gate verifier (the load-bearing claim of Solhunt-Duel): start at [`src/sandbox/patch-harness.ts:89`](https://github.com/claygeo/solhunt-duel/blob/master/src/sandbox/patch-harness.ts#L89) — the `verifyPatch()` function is the entire gate-checking pipeline in 130 lines.
326+
327+
For the duel orchestration: [`src/duel/orchestrator.ts`](https://github.com/claygeo/solhunt-duel/blob/master/src/duel/orchestrator.ts) — Red/Blue round coordination, fresh-address bytecode cloning, audit trail.
328+
329+
For the single-agent scanner: [`src/agent/`](https://github.com/claygeo/solhunt-duel/tree/master/src/agent) — read loop, prompts, tool definitions.
330+
331+
For the benchmark runner: [`src/bench/`](https://github.com/claygeo/solhunt-duel/tree/master/src/bench) — phase0 (single-agent) and phase1 (duel) runner entry points.
332+
333+
---
334+
335+
## Honest limitations
336+
337+
- The Phase 4 duel set is 10 contracts. That's small-N. The v2 expansion plan in [PLAN-V2-BENCHMARK-EXPANSION.md](PLAN-V2-BENCHMARK-EXPANSION.md) is how we get to 50+ with vuln-class diversity and adversarial-clean baseline.
338+
- The 67.7% / 13.7% gap is real and unresolved. Sandbox limitations dominate the failure modes on random samples (compiler version mismatches, cross-contract dependencies the harness doesn't pre-load). See [README §The numbers](https://github.com/claygeo/solhunt-duel/blob/master/README.md#the-numbers--be-precise) for the failure breakdown.
339+
- The gates falsify positive claims, not negative ones. A `HARDENED` verdict means "we ran the four checks and they all passed." It does NOT mean "the contract is now bulletproof." Different claim, different scope.
340+
- MEV and front-running are invisible to the verifier. Deterministic block timing on a frozen fork. Real-world transaction ordering is not modeled.
341+
- Multi-contract attack chains that need specific pool / oracle state at specific blocks fail on our harness fork even when the exploit works on mainnet. We don't pre-load arbitrary DeFi state. Those collapse to RED_FAILED, not falsely-hardened runs.
342+
343+
If you find a counter-example to any of these, please file an issue. Honest negative results compound; spin destroys.

0 commit comments

Comments
 (0)