Automated vulnerability detection for Ethereum smart contracts (Solidity). Uses a multi-engine pipeline — static analysis, IR analysis, call graph, taint tracking, symbolic verification, and LLM reasoning — to find security issues in Code4rena audit contests.
# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Build
cargo build --release
# Set your API key (required for LLM stages)
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY# Single contest analysis
cargo run -- --contest dataset/train/contracts/2024-01-curves
# Batch mode (all contests)
cargo run -- --batch
# Dry run (engines only, no LLM calls)
cargo run -- --contest dataset/train/contracts/2024-01-curves --dry-run
# Score predictions against labels
cargo run -- --batch --dry-run --score
# Custom output directory
cargo run -- --batch --output-dir /tmp/results
# Report formats: legacy (default), json, html
cargo run -- --batch --dry-run --output-format json
cargo run -- --batch --dry-run --output-format html --output-dir /tmp/reports| Flag | Description |
|---|---|
--contest <PATH> |
Contest directory to analyze |
--batch |
Run all contests in dataset |
--dataset <PATH> |
Dataset directory (default: dataset/train) |
--dry-run |
Skip LLM calls, run engines only |
--score |
Score predictions against labels |
--output-dir <PATH> |
Write per-contest results to directory |
--output-format <FMT> |
Output format: legacy, json, html |
--min-confidence <F> |
Minimum confidence threshold (0.0–1.0) |
--confidence-weights <PATH> |
Custom confidence weights JSON |
--no-cache |
Bypass analysis cache |
--clear-cache |
Clear cache and exit |
--verbose |
DEBUG-level tracing |
--quiet |
Errors only |
--log-format <FMT> |
text or json |
- Scope filtering — reads
scope.txt, restricts analysis to in-scope.solfiles - Static analysis triage — filters ~80 detectors, drops noise slugs, classifies findings
- IR analysis — contract structure, inheritance, state variables, function patterns
- Call graph analysis — inter-function call relationships, external call patterns
- Cross-contract analysis — multi-contract interactions, shared state
- Taint analysis — tracks tainted data flow from sources (msg.sender, msg.value) to sinks (selfdestruct, delegatecall, storage writes)
- Symbolic verification — validates/kills candidates using symbolic execution data
- LLM analysis — per-file and project-level prompts with full context from prior stages
- Post-processing — deduplication, confidence scoring, taxonomy normalization
flowchart LR
A[main.rs]
L[loader/<br/>scope, SA, IR,<br/>symbolic data]
E[engines/<br/>SA, IR, call graph,<br/>cross-contract,<br/>taint, symbolic]
LLM[LLM<br/>per-file +<br/>project pass]
P[postprocess/<br/>parse, validate,<br/>dedup, confidence]
R[report/<br/>legacy/json/html]
S[score.rs]
A -->|1| L
L -->|2| E
E -->|3| LLM
LLM -->|4| P
P -->|5| R
R -->|6| S
Training data: 10 Code4rena audit contests with labeled findings in dataset/train/labels.json.
Each contest provides pre-computed analysis data:
- IR (
ir.json) — contracts, functions, state variables, inheritance - Static analysis (
static_analysis.json) — pattern-based detector findings - Symbolic execution (
symbolic_execution.json) — execution paths and constraints (optional, not all contests have this)
[
{
"contest": "2024-01-curves",
"file": "contracts/Curves.sol",
"vulnerability_type": "Reentrancy",
"severity": "High",
"description": "The sellCurvesToken function transfers ETH before updating state...",
"confidence": 0.85,
"source": "sa_reentrancy",
"start_line": 42,
"end_line": 55
}
]# Score engine predictions against training labels
cargo run -- --batch --dry-run --score
# Score with LLM predictions
cargo run -- --batch --scoreOutputs precision, recall, and F1 per contest and aggregate, with per-severity, per-detector, and confusion matrix breakdowns.