You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Iteration #7 — review feedback integrated + new artifacts:
PLAN-V2-BENCHMARK-EXPANSION.md UPDATES (post-review)
Both /plan-eng-review (APPROVE WITH CHANGES) and /codex outside-voice (GO with
constraints) reviewed the v1 plan. Integrated their feedback:
- Tier C: 5 -> 10 contracts. 1/5 = 20% +/- enormous CI is a vibe, not a
measurement. 10 with severity-weighted scoring + 2-3 near-miss controls
(audited-clean contracts that had a finding patched in audit).
- Reproducibility protocol: n=3 runs/contract, median + IQR reported, model
version pinned (claude-sonnet-4-20250514), temperature/top-p specified, prompt
git SHA logged.
- Tier C disclosure PRE-COMMITTED — text written into the leaderboard before
any run. Real failure mode is operator silently cutting Tier C post-result;
pre-commit prevents that.
- 6-day hard cap on v2 work; kill criteria if Phase 3+ runs out.
- Pause/reassess gate: halt before Phase 3 spend if Tier C FPR > 80%.
- Phase 1 schema parallel with Phase 2 sourcing (saves 1 day).
- Total revised budget: $110-150 API (was $30-50; bumped for n=3 x 57 contracts).
ARCHITECTURE.md (recruiter-readable rewrite)
- 60-second TL;DR at top with leaderboard / PROOF / V2 plan cross-links
- "Two Projects" comparison table (Solhunt single-agent vs Solhunt-Duel adversarial)
- New Solhunt-Duel architecture section: mermaid sequence diagram of Red<->Blue
+ verifyPatch + 4 gates table
- Convergence taxonomy: HARDENED / BLUE_FAILED / RED_FAILED / SAME_CLASS_ESCAPED
/ TIMEOUT (definitions)
- Updated ER diagram: duel_runs + duel_rounds tables
- New "Why fresh-address bytecode cloning" design decision
- Honest limitations: small-N, sandbox limits, MEV invisibility, multi-contract
chain failures
- Code-pointer section: where to read each subsystem
INTERVIEW-WALKTHROUGH-5MIN.md (NEW)
- 5-beat structure for screen calls: premise / setup / 4 gates / honest failure /
what's next, ~750-900 words, fits 5 min with interruption buffer
- Variants: 3-min screening, 15-min deep technical, behavioral/hiring-manager
- "What NOT to say" list (no SCONE direct compare, no oversell convergence,
no startup pitch, no trashing adjacent work)
- Practice notes: read aloud twice, time yourself, the 67%/13% beat is the
most underdone on first practice
- Honest meta: match artifact to audience
**Wake reason:** Operator interrupted iter #7 ScheduleWakeup wait with "why dont you just keep working, why wait till the morning, just keep building." Executed iter #7 work continuously without waiting for the 20:08 fallback.
432
+
433
+
### Built this iteration
434
+
435
+
**1. /plan-eng-review on V2 benchmark plan (parallel agent)**
436
+
- Verdict: **APPROVE WITH CHANGES**
437
+
- 3 blockers identified:
438
+
1. No reproducibility protocol (single-run-per-contract on stochastic agent ≠ benchmark)
439
+
2. Tier C N=5 too small (1/5 = 20% ± enormous CI; need ≥10 with severity-weighted scoring + near-miss controls)
440
+
3. Tier C false-find disclosure flow undefined (must close BEFORE sourcing, not after)
441
+
- Other gaps: model version pinning, seed/temp spec, no held-out set, no inter-rater rubric
442
+
- Phase 1 schema work + Phase 2 sourcing → run in parallel (saves 1 day)
443
+
444
+
**2. /codex outside-voice on V2 plan (parallel agent)**
445
+
- Verdict: **GO with constraints**
446
+
- Tier C downside math is asymmetric in operator's favor IF pre-committed to publish-whatever-happens
447
+
- Real failure mode: operator chickens out post-result and silently cuts Tier C (PRE-COMMIT THE DISCLOSURE POST)
448
+
- Inspect-AI integration: sequential not parallel (corpus first, framework second)
449
+
- Operator's 1.5 YOE filter: v2 only flips outcomes for ~15-20% of funnel (the high-ceiling AI-safety path)
450
+
- 6-day hard cap on v2 work; if budget runs out, ship Tier C standalone — gravy is gravy
451
+
452
+
**3. Integrated both reviews into PLAN-V2-BENCHMARK-EXPANSION.md**
453
+
- Tier C: 5 → 10 contracts (statistical, not vibe). Includes 2-3 near-miss controls. Severity-weighted scoring.
454
+
- Reproducibility protocol added: n=3, model version `claude-sonnet-4-20250514` pinned, temperature/top-p specified, prompt git SHA logged
455
+
- Tier C disclosure pre-commitment (drafted text written into the plan; operator can't quietly cut Tier C post-result)
456
+
- 6-day hard cap with kill criteria (Phase 2 standalone if Phase 3+ runs out of time)
457
+
- Pause/reassess gate: if Tier C FPR > 80%, halt before Phase 3 spend
458
+
- Phase 1 schema parallel with Phase 2 sourcing
459
+
- Total revised budget: $110-150 API (was $30-50; bumped for n=3 × 57 contracts)
> **TL;DR (60-second read):** This repo holds two related but separate projects.
4
+
> **Solhunt** is a single-agent scanner: give it a contract, it writes a Foundry exploit if one exists, otherwise it emits a structured no-find report. Numbers: 67.7% on a curated 32-contract DeFiHackLabs subset, 13.7% on a 95-contract random draw — both published honestly. **Solhunt-Duel** sits on top: Red writes the exploit, Blue writes a Solidity patch, a server-side harness enforces four gates the LLMs cannot see or modify (`exploitNeutralized`, `benignPassed`, `freshAttackerNeutralized`, `storageLayoutPreserved`) before declaring convergence. The premise: agents will lie about success if you let them, so the verdict lives outside the agent.
|`exploitNeutralized`| exploit FAILS on patched bytecode | "patch did nothing" |
74
+
|`freshAttackerNeutralized`| exploit FAILS from a different EOA | "patch only blocks the original attacker address" |
75
+
|`benignPassed`| benign happy-path tests still PASS | "patch deleted the function entirely" |
76
+
|`storageLayoutChanged == false`| original vs patched storage layout slots/offsets/types match | "patch silently bricks existing state" |
77
+
78
+
Source: [`src/sandbox/patch-harness.ts`](https://github.com/claygeo/solhunt-duel/blob/master/src/sandbox/patch-harness.ts) · Full walkthrough: [PROOF.md](PROOF.md)
79
+
80
+
**Convergence taxonomy** (what each leaderboard label means):
81
+
82
+
-`HARDENED` — Red found an exploit, Blue produced a patch, all 4 gates green
83
+
-`BLUE_FAILED` — Red found exploit, Blue exhausted budget without all 4 gates green
84
+
-`RED_FAILED` — Red emitted no-find within budget; contract may be safe OR our agent missed
85
+
-`SAME_CLASS_ESCAPED` — Blue's patch passed gates, Red pivoted to a DIFFERENT vulnerability of the same class — escape, not hardening
86
+
-`TIMEOUT` — wall-clock cap hit before any agent emitted a final verdict
87
+
88
+
The taxonomy is deliberately five-way, not two-way. "Did Blue patch successfully" and "is the contract now safe" are different questions; the labels keep them separate.
89
+
90
+
---
91
+
92
+
## Solhunt (single-agent scanner) — predecessor
93
+
94
+
The original loop. Used to produce the headline 67.7% / 13.7% numbers.
4
95
5
96
```mermaid
6
97
sequenceDiagram
@@ -10,7 +101,7 @@ sequenceDiagram
10
101
participant ES as Etherscan API
11
102
participant Docker as Docker Sandbox
12
103
participant Anvil as Anvil Fork
13
-
participant LLM as LLM (Claude/Qwen)
104
+
participant LLM as LLM (Claude Sonnet 4)
14
105
participant SB as Supabase
15
106
16
107
User->>CLI: benchmark --dataset X --model Y
@@ -41,14 +132,42 @@ sequenceDiagram
41
132
CLI->>User: Results table + cost summary
42
133
```
43
134
44
-
## Data model
135
+
### Agent loop state machine
136
+
137
+
```mermaid
138
+
stateDiagram-v2
139
+
[*] --> Read: initial prompt
140
+
Read --> Identify: source read
141
+
Identify --> Write: pattern detected
142
+
Write --> Test: exploit.t.sol created
143
+
Test --> Write: compile error<br/>(str_replace)
144
+
Test --> Test: forge fails<br/>(retry different vector)
145
+
Test --> Report: forge passes
146
+
Identify --> Report: no exploit found<br/>after iter N
147
+
Write --> Report: iteration budget hit
148
+
Report --> [*]
149
+
150
+
Read: read source files
151
+
Identify: identify attack vector
152
+
Write: write exploit test
153
+
Test: run forge_test
154
+
Report: emit SOLHUNT_REPORT
155
+
```
156
+
157
+
---
158
+
159
+
## Shared infrastructure
160
+
161
+
### Data model
45
162
46
163
```mermaid
47
164
erDiagram
48
165
benchmark_runs ||--o{ scan_runs : produces
49
166
scan_runs ||--o{ tool_calls : logs
50
167
contracts ||--o{ scan_runs : scanned
51
168
scan_runs ||--o{ artifacts : generates
169
+
duel_runs ||--o{ duel_rounds : has
170
+
duel_rounds ||--o{ scan_runs : reuses
52
171
53
172
benchmark_runs {
54
173
uuid id PK
@@ -77,6 +196,26 @@ erDiagram
77
196
string conversation_path
78
197
}
79
198
199
+
duel_runs {
200
+
uuid id PK
201
+
uuid contract_id FK
202
+
string convergence
203
+
int rounds
204
+
float wall_time_seconds
205
+
float notional_cost
206
+
timestamp created_at
207
+
}
208
+
209
+
duel_rounds {
210
+
uuid id PK
211
+
uuid duel_run_id FK
212
+
int round_index
213
+
bool exploit_neutralized
214
+
bool benign_passed
215
+
bool fresh_attacker_neutralized
216
+
bool storage_layout_changed
217
+
}
218
+
80
219
tool_calls {
81
220
int id PK
82
221
uuid scan_run_id FK
@@ -100,30 +239,6 @@ erDiagram
100
239
}
101
240
```
102
241
103
-
## Agent loop state machine
104
-
105
-
```mermaid
106
-
stateDiagram-v2
107
-
[*] --> Read: initial prompt
108
-
Read --> Identify: source read
109
-
Identify --> Write: pattern detected
110
-
Write --> Test: exploit.t.sol created
111
-
Test --> Write: compile error<br/>(str_replace)
112
-
Test --> Test: forge fails<br/>(retry different vector)
113
-
Test --> Report: forge passes
114
-
Identify --> Report: no exploit found<br/>after iter N
115
-
Write --> Report: iteration budget hit
116
-
Report --> [*]
117
-
118
-
Read: read source files
119
-
Identify: identify attack vector
120
-
Write: write exploit test
121
-
Test: run forge_test
122
-
Report: emit SOLHUNT_REPORT
123
-
```
124
-
125
-
## Key design decisions
126
-
127
242
### Why Docker sandbox per scan
128
243
- Isolation: agent can't escape or affect host
129
244
- Reproducibility: every scan starts from identical state
@@ -153,29 +268,39 @@ stateDiagram-v2
153
268
- LLMs emit lowercase hex addresses frequently
154
269
- Forge rejects with EIP-55 checksum errors
155
270
- Agent wastes 5-10 iterations fighting this
156
-
- Fix: regex replace all 0x[40 hex] with keccak256-computed checksums on every .sol file write
271
+
- Fix: regex replace all `0x[40 hex]` with keccak256-computed checksums on every .sol file write
157
272
158
273
### Why vm.prank false-positive guard
159
274
-`vm.prank(admin)` makes next call appear from admin
160
275
- Agent discovered it could "exploit" access-controlled functions this way
161
276
- But pranking as owner to call owner functions proves nothing
162
277
- System prompt now lists valid uses (whale, EOA, governance-after-vote) and flags invalid use
- Original contract address has constructor state, immutables, possibly initializer storage
281
+
- vm.etch swaps runtime bytecode but does NOT replay the constructor
282
+
- If we just etched on the original address, Blue's patches that depend on initializer storage would silently fail
283
+
- Fix: clone bytecode to a fresh deterministic address with anvil_setCode, then run all verify stages against that address. Each stage anvil_setCode's the correct variant (original / patched) before running.
One provider abstraction, multiple backends. Qwen-specific handling: append `/no_think` to disable reasoning on local models. Cost calculated per-token against PRICING table.
319
+
One provider abstraction, multiple backends. Qwen-specific handling: append `/no_think` to disable reasoning on local models. Cost calculated per-token against PRICING table. Solhunt-Duel adds a Claude Code CLI backend (uses Max subscription, not API metered) for autonomous overnight runs.
320
+
321
+
---
322
+
323
+
## Where to read the code
324
+
325
+
For the gate verifier (the load-bearing claim of Solhunt-Duel): start at [`src/sandbox/patch-harness.ts:89`](https://github.com/claygeo/solhunt-duel/blob/master/src/sandbox/patch-harness.ts#L89) — the `verifyPatch()` function is the entire gate-checking pipeline in 130 lines.
326
+
327
+
For the duel orchestration: [`src/duel/orchestrator.ts`](https://github.com/claygeo/solhunt-duel/blob/master/src/duel/orchestrator.ts) — Red/Blue round coordination, fresh-address bytecode cloning, audit trail.
328
+
329
+
For the single-agent scanner: [`src/agent/`](https://github.com/claygeo/solhunt-duel/tree/master/src/agent) — read loop, prompts, tool definitions.
330
+
331
+
For the benchmark runner: [`src/bench/`](https://github.com/claygeo/solhunt-duel/tree/master/src/bench) — phase0 (single-agent) and phase1 (duel) runner entry points.
332
+
333
+
---
334
+
335
+
## Honest limitations
336
+
337
+
- The Phase 4 duel set is 10 contracts. That's small-N. The v2 expansion plan in [PLAN-V2-BENCHMARK-EXPANSION.md](PLAN-V2-BENCHMARK-EXPANSION.md) is how we get to 50+ with vuln-class diversity and adversarial-clean baseline.
338
+
- The 67.7% / 13.7% gap is real and unresolved. Sandbox limitations dominate the failure modes on random samples (compiler version mismatches, cross-contract dependencies the harness doesn't pre-load). See [README §The numbers](https://github.com/claygeo/solhunt-duel/blob/master/README.md#the-numbers--be-precise) for the failure breakdown.
339
+
- The gates falsify positive claims, not negative ones. A `HARDENED` verdict means "we ran the four checks and they all passed." It does NOT mean "the contract is now bulletproof." Different claim, different scope.
340
+
- MEV and front-running are invisible to the verifier. Deterministic block timing on a frozen fork. Real-world transaction ordering is not modeled.
341
+
- Multi-contract attack chains that need specific pool / oracle state at specific blocks fail on our harness fork even when the exploit works on mainnet. We don't pre-load arbitrary DeFi state. Those collapse to RED_FAILED, not falsely-hardened runs.
342
+
343
+
If you find a counter-example to any of these, please file an issue. Honest negative results compound; spin destroys.
0 commit comments