vm/benchmark: add EVM performance benchmarks targeting mainnet bottlenecks by mh0lt · Pull Request #19932 · erigontech/erigon

mh0lt · 2026-03-16T15:33:03Z

Summary

Adds a new execution/vm/benchmark/ package with targeted EVM benchmarks based on analysis of 50 mainnet blocks (14,886 txs, 1.53B gas) and bloatnet comparison
Benchmarks cover the actual hot paths: call chains (68.7% gas), storage access, token transfer patterns, and interpreter dispatch
All benchmarks use versionedio (NewWithVersionMap) to match real parallel execution overhead

Benchmark suites

Suite	What it measures
`BenchmarkCallChain`	Nested STATICCALL/DELEGATECALL, DeFi swap patterns
`BenchmarkStorage`	Cold/warm SLOAD, SSTORE transitions, transient storage
`BenchmarkTokenTransfer`	ERC-20 transfer/transferFrom patterns
`BenchmarkInterpreter`	Arithmetic, stack, memory, keccak256 dispatch

Test plan

go test -run='^$' -bench=. ./execution/vm/benchmark/ compiles and runs
CI passes

🤖 Generated with Claude Code

yperbasis

From Claude:

Issues

SSTORE benchmarks measure wrong state transitions after warmup (high)

All three SSTORE sub-benchmarks have a warmup call before b.Loop(). This modifies state, and Prepare only resets the access list — not dirty storage. So every measured iteration operates on already-mutated
state:

zero-to-nonzero: Warmup writes 0xBEEF to all 100 slots. Every b.Loop() iteration then writes 0xBEEF to slots already containing 0xBEEF — a no-op SSTORE (100 gas), not zero-to-nonzero (20K gas). 0% of measured
iterations test what the name says.
nonzero-to-nonzero: Warmup overwrites 1000→2000. Subsequent iterations write 2000→2000 — again no-op SSTORE.
nonzero-to-zero: Warmup clears slots. Subsequent iterations write 0→0 — zero-to-zero, not nonzero-to-zero.

Fix: recreate state each iteration inside b.Loop(), or at minimum remove the warmup for these linear benchmarks.

BenchmarkSLOADCold and BenchmarkStorageDiversity have the same problem (medium)

These are also linear (no inner loop), with a warmup call. After warmup, the access list is reset by Prepare, so SLOADs are cold again — that part is fine. But the "cold" designation also affects SSTORE
benchmarks grouped nearby, and a reader might assume the pattern is consistent. More importantly, the warmup call consumes the one-shot gas budget and may OOG, silently returning an error. Since these don't
loop internally, the warmup is unnecessary — just remove it.

Unused code (low — will fail lint)

callContract in helpers.go:92-94 — defined but never called (all benchmarks use prepareAndCall)
addrEOA in helpers.go:16 — defined but never referenced
_ bool parameter in deployCallChain (bench_call_chain_test.go:294) — dead parameter

Name helpers are verbose and have bad defaults (nit)

depthName, layerName, slotName, batchName, gasName, sizeName are all hand-written switch statements. depthName(32) returns "depth-N" instead of "depth-32". Replace with fmt.Sprintf:

func depthName(d int) string { return fmt.Sprintf("depth-%d", d) }

Errors silently discarded on all calls (low)

Every prepareAndCall result is suppressed with //nolint:errcheck. For the gas-until-OOG benchmarks this is intentional (OOG is an error). But for the linear benchmarks with computed gas limits (BenchmarkSSTORE,
BenchmarkSLOADCold, BenchmarkStorageDiversity, BenchmarkERC20BatchTransfers), an unexpected OOG would silently produce garbage results. At minimum, check the warmup call:

_, _, err := prepareAndCall(cfg, addrContract, nil)
require.NoError(b, err)

Minor observations

Token contract in deployDeFiContracts will underflow after ~5000 loop iterations (500000 / 100). Doesn't affect benchmarking but is cosmetically wrong.
The README is well-written and provides good context for future developers.
Using NewWithVersionMap to mirror real parallel execution overhead is a good choice.
All APIs verified against codebase — types, signatures, and patterns match correctly.

Verdict

The benchmarks fill a real gap (existing Engine X suite covers precompiles but misses DeFi call chains, storage diversity, and compound patterns). The main issue is the SSTORE benchmarks are measuring the wrong
thing — they need state reset between iterations. The unused code will likely fail make lint. Everything else is minor.

yperbasis

From Claude:

Bug: Stack leak in two benchmarks

BenchmarkStackOps and BenchmarkMixedCompute have a net +1 stack item per loop iteration, causing a stack overflow at ~1024 iterations. This makes them terminate in ~0.15ms instead of using their 100M gas budget
(~168ms for equivalent benchmarks). They're measuring EVM setup overhead, not opcode dispatch.

StackOps (bench_interpreter_test.go:427-432): The loop body pushes 1 value + 8 DUPs but only has 8 POPs. Needs 9 POPs (or remove Push(0x42) from inside the loop):
Push(0x42) // +1
DUP1..DUP8 // +8
SWAP1..SWAP4 // +0
POP×8 // -8
Jump // +0
// Net: +1 per iteration → overflow at ~1024

MixedCompute (bench_interpreter_test.go:509-521): Same issue — the arithmetic section produces 1 value, stack ops add 4 via DUPs, memory ops consume some, but cleanup only does 4 POPs. Net +1 per iteration.

Confirmed empirically:
BenchmarkPureArithmetic/add/100M 168ms ← correct (uses full gas)
BenchmarkStackOps/dup-swap/100M 0.15ms ← 1000x too fast (stack overflow)
BenchmarkMixedCompute/mixed/100M 0.15ms ← 1000x too fast (stack overflow)

Minor issues

Dead code in BenchmarkCallWithValue/with-value (bench_call_chain_test.go:236): The first deployContractWithBalance(statedb, addrContract, nil, ...) is immediately overwritten by the second call on line 240.
Remove it.
DeFi swap balance underflow: The token contracts subtract 100 from slot 0 each loop iteration. After ~5000 inner iterations (within a single OOG call), the from balance hits 0 and wraps around to a large
uint256. Subsequent SSTOREs become zero-to-nonzero transitions (20K gas instead of 5K), changing the gas cost profile mid-measurement. Consider using snapshot/revert like the SSTORE benchmarks do, or giving
tokens a much larger starting balance.
makeAddrs limit (bench_call_chain_test.go:284): raw[19] = byte(i + 1) wraps at 255 addresses. Fine for current usage (max 16), but a comment noting the limit would help.

Verdict

The architecture and most benchmarks are solid. Fix the two stack-leak bugs — they're currently measuring nothing useful.

…necks Based on analysis of 50 mainnet blocks (14,886 txs, 1.53B gas) and bloatnet comparison, these benchmarks target the actual hot paths in real block execution: - Call chains (68.7% of mainnet gas): nested STATICCALL/DELEGATECALL, DeFi swap - Storage access (6% of DeFi gas): cold/warm SLOAD, SSTORE transitions, transient - Token transfers (16.7% of mainnet gas): ERC-20 transfer/transferFrom patterns - Interpreter loop: arithmetic, stack, memory, keccak256 dispatch overhead All benchmarks use versionedio (NewWithVersionMap) to match real parallel execution overhead. Profiling shows ~1M allocs/100M gas dominated by versionedRead/versionWritten tracking (28%), journal revert (23%), and state object storage maps (34%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix SSTORE benchmarks measuring wrong state transitions: use PushSnapshot/RevertToSnapshot to restore storage between iterations, ensuring each iteration measures the intended transition (zero-to-nonzero, nonzero-to-nonzero, nonzero-to-zero) - Fix SLOADCold and StorageDiversity benchmarks: same snapshot/revert pattern ensures slots are cold each iteration - Fix BatchTransfers: snapshot/revert prevents cumulative state mutation - Remove unused code: callContract helper, addrEOA, dead bool parameter in deployCallChain - Simplify name helpers: replace verbose switch statements with fmt.Sprintf (depthName, layerName, slotName, batchName, gasName, sizeName) - Add explicit OOG comments on errcheck suppressions for looping benchmarks that intentionally run until out-of-gas Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yperbasis

Still broken: Stack leaks (from review round 2)

BenchmarkStackOps (bench_interpreter_test.go:427-432) — net +1 per iteration:

Push(0x42) // +1
DUP1..DUP8 // +8
SWAP1..SWAP4 // ±0
POP ×8 // -8
// Net: +1 → overflow at ~1024 iterations

Needs 9 POPs or move Push(0x42) outside the loop (before JUMPDEST).

BenchmarkMixedCompute (bench_interpreter_test.go:509-521) — net +1 per iteration:

Push(42) Push(17) ADD Push(3) MUL // produces 1 value
DUP1 DUP2 SWAP1 SWAP2 // +2
DUP1 DUP2 SWAP1 // +2
Push(0) MSTORE // -2
Push(0) MLOAD // net 0
POP POP POP POP // -4
// 1 + 2 + 2 - 2 + 0 - 4 = -1?

Let me recount more carefully. Starting from empty stack each iteration:

Arithmetic: Push Push ADD Push MUL → 1 item
Stack ops: DUP1 DUP2 → 3, SWAP1 SWAP2 → 3, DUP1 DUP2 → 5, SWAP1 → 5
Memory: Push(0) → 6, MSTORE → 4, Push(0) → 5, MLOAD → 5
Cleanup: POP×4 → 1

Net +1 per iteration. Overflows at ~1024. Both benchmarks complete in ~0.15ms instead of the expected ~168ms — they're measuring EVM startup, not opcode dispatch.

Still present: Dead code in BenchmarkCallWithValue/with-value

bench_call_chain_test.go:236:
deployContractWithBalance(statedb, addrContract, nil, uint256.NewInt(1_000_000_000))
// ... immediately overwritten on line 240:
deployContractWithBalance(statedb, addrContract, code, uint256.NewInt(1_000_000_000))

First call is dead.

Still present: Token balance underflow in looping benchmarks

BenchmarkERC20Transfer, BenchmarkERC20TransferFrom, and BenchmarkDeFiSwapChain subtract 100 from a from-balance each inner loop iteration. Starting at 1,000,000 (or 500,000 for DeFi), balance hits zero after
10,000 (5,000) iterations. With 100M gas, the loop runs ~50K-200K iterations. After underflow, the uint256 wraps and subsequent SSTOREs become zero-to-nonzero (20K gas) for one iteration, shifting the gas
profile. Practically negligible but cosmetically wrong — use a much larger starting balance or snapshot/revert.

Issues from review 1 that appear fixed

SSTORE benchmarks now use PushSnapshot/RevertToSnapshot without warmup calls
BenchmarkSLOADCold and BenchmarkStorageDiversity now use snapshot/revert
Dead code (callContract, addrEOA, dead bool param) removed

- BenchmarkStackOps: add 9th POP to balance PUSH+DUP8 (was +1/iter → overflow at 1024) - BenchmarkMixedCompute: add 5th POP to balance full stack (was +1/iter → overflow at 1024) - BenchmarkCallWithValue: remove dead deployContractWithBalance(nil code) call - DeFi token contracts: increase starting balance from 500K to 1B to prevent uint256 underflow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…necks (#19932) ## Summary - Adds a new `execution/vm/benchmark/` package with targeted EVM benchmarks based on analysis of 50 mainnet blocks (14,886 txs, 1.53B gas) and bloatnet comparison - Benchmarks cover the actual hot paths: call chains (68.7% gas), storage access, token transfer patterns, and interpreter dispatch - All benchmarks use `versionedio` (NewWithVersionMap) to match real parallel execution overhead ### Benchmark suites | Suite | What it measures | |-------|-----------------| | `BenchmarkCallChain` | Nested STATICCALL/DELEGATECALL, DeFi swap patterns | | `BenchmarkStorage` | Cold/warm SLOAD, SSTORE transitions, transient storage | | `BenchmarkTokenTransfer` | ERC-20 transfer/transferFrom patterns | | `BenchmarkInterpreter` | Arithmetic, stack, memory, keccak256 dispatch | ## Test plan - [x] `go test -run='^$' -bench=. ./execution/vm/benchmark/` compiles and runs - [ ] CI passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

mh0lt requested a review from yperbasis as a code owner March 16, 2026 15:33

yperbasis added the performance label Mar 16, 2026

yperbasis requested changes Mar 16, 2026

View reviewed changes

mh0lt and others added 3 commits March 16, 2026 23:19

vm/benchmark: fix chain config types after *big.Int → *uint64 migration

8d5cb1e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mh0lt force-pushed the evm-benchmarks branch from 5c027ff to 8d5cb1e Compare March 16, 2026 23:23

yperbasis requested changes Mar 17, 2026

View reviewed changes

yperbasis added this to the 3.5.0 milestone Mar 17, 2026

yperbasis approved these changes Mar 17, 2026

View reviewed changes

mh0lt merged commit a7c972b into main Mar 17, 2026
37 checks passed

mh0lt deleted the evm-benchmarks branch March 17, 2026 12:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vm/benchmark: add EVM performance benchmarks targeting mainnet bottlenecks#19932

vm/benchmark: add EVM performance benchmarks targeting mainnet bottlenecks#19932
mh0lt merged 4 commits into
mainfrom
evm-benchmarks

mh0lt commented Mar 16, 2026

Uh oh!

yperbasis left a comment

Uh oh!

yperbasis left a comment

Uh oh!

yperbasis left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mh0lt commented Mar 16, 2026

Summary

Benchmark suites

Test plan

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants