Gas Benchmarks measures and compares performance of Ethereum execution clients using deterministic engine payloads. It runs curated or captured tests across clients (Nethermind, Geth, Reth, Besu, Erigon, etc.), collects raw results, aggregates metrics (MGas/s ), renders reports, and can ingest data into PostgreSQL for Grafana dashboards.
Reference: see README.md for full setup and usage details.
- Prereqs: Python 3.10, Docker, Docker Compose, .NET 8, make
- Install deps and prepare tools:
pip install -r requirements.txt
make prepare_tools
mkdir -p results- Run benchmarks (example):
bash run.sh -t "eest_tests/" -w "warmup-tests" -c "nethermind,geth,reth" -r 3Outputs land in results/ and reports in results/reports/.
run.sh: End-to-end pipeline controller. Flags:--t,--w,--c,--r,--i(see README.md).setup_node.py: Bring up a specific client stack (writes.env, selects genesis, runsscripts/<client>/run.sh).run_kute.py: Execute a single test payload against:8551engine endpoint with labeled environment.- Reporting:
report_html.py,report_txt.py,report_tables.pyconsume normalized metrics to produce HTML/TXT/table outputs. - DB ETL:
generate_postgres_schema.py(create/update table),fill_postgres_db.py(bulk insert runs/specs). - Test capture:
capture_eest_tests.pyto convert Execution Layer Spec (EELS) fixtures into newline-delimited.txtRPC payloads.
- Author benchmark definitions in the
execution-specsrepository undertests/benchmark/, following the EELS conventions for deterministic inputs. - Capture those tests into newline-delimited payload files with:
python capture_eest_tests.py -o eest_tests -x "pattern_to_exclude"capture_eest_tests.pyfetches fixture bundles from thebenchmark@v*releases published in theexecution-specsrepository. Make sure your tests are included in such a benchmark release (or reference a specific tag via--release-tag) before trying to capture them, otherwise the script will not find them.- Generate matching warmup payloads with
make_warmup_tests.pyand run the pipeline with-t "eest_tests/". - Filenames drive discovery (
utils.get_test_casesparses names/gas values), so keep theScenario_<Gas>M.txtpattern consistent.
- Local runs via
run.sh. Override images with--i '{"client":"repo:tag"}'or set them inimages.yaml. - Engine auth uses
engine-jwt/jwt.hex(mounted or copied by compose). - Raw artifacts:
- Responses:
results/{client}_response_{run}_{test}_{gas}M.txt(line-delimited JSON withVALIDstatus) - Results:
results/{client}_results_{run}_{test}_{gas}M.txt(measurement sections:engine_newPayloadV4→ fields incl.max)
- Responses:
- If a response line isn’t
VALID, it won’t be aggregated.
- HTML: open
results/reports/index.html(sortable tables; includes computer specs if present). - TXT/table summaries: see
results/reports/andreports/tables_norm.txt. - Reporters derive titles from the captured filenames (you can optionally add a metadata file next to your dataset if you need friendlier labels) and compute Min/Max/p50/p95/p99 plus N.
- Keep logic pure where possible: parsing/aggregation in
utils.py; orchestration inrun.sh. - Follow the adapter layout for new clients under
scripts/<client>/:docker-compose.yaml,run.sh,jwtsecret; add genesis underscripts/genesisfiles/<client>/- Set/override default images in
images.yaml(or via--i)
- Add a new report format by consuming
utils.get_gas_table(...)and the discovered test-case metadata (filenames or any optional metadata you provide). - Extend DB schema via
generate_postgres_schema.pyand map fields infill_postgres_db.py. - Prefer explicit artifacts (files) over hidden state; it simplifies debugging and comparisons.
- Use
run_and_post_metrics.shto loop: pull → run → ingest → cleanup. - DB setup: run
generate_postgres_schema.pyonce; then pointfill_postgres_db.pytoresults/. - Only Kute tests supported for now.
- Engine not reachable: ensure
scripts/<client>/docker-compose.yamlstack is up and:8551is exposed. - Invalid responses: confirm JWT (
engine-jwt/jwt.hex) and genesis file selection. - Missing Kute: run
make prepare_toolsand checkrun_kute.pypaths. - No results: verify test filenames and gas suffixes, and that
run.shflags point to the correct paths.
Kute is a .NET CLI tool used here to replay JSON‑RPC engine messages against an execution client and measure performance. It simulates the Consensus Layer sending engine_* calls (plus optional eth_*) to the client at :8551, validates responses, and aggregates timings.
Location: nethermind/tools/Nethermind.Tools.Kute/ (after running make prepare_tools) or https://github.com/NethermindEth/nethermind/tree/master/tools/Kute
Why it matters in this repo: run_kute.py wraps Kute to execute per‑test payload files; reporters parse Kute outputs to compute MGas/s metrics and build reports.
Requires .NET SDK (8+). From the tool directory:
cd nethermind/tools/Nethermind.Tools.Kute
dotnet build -c ReleaseThe run_kute.py wrapper points to the built binary path under nethermind/tools/artifacts/bin/Nethermind.Tools.Kute/release/Nethermind.Tools.Kute (prepared by make prepare_tools).
- Reads messages from a file or directory (
-i/--input). Each line is a JSON‑RPC request or a JSON batch. - Authenticates using JWT (
-s/--secret) with optional TTL (-t/--ttl). - Submits to
:8551(-a/--address) sequentially or at a target RPS (--rps). - Optionally unwraps batch requests into single requests (
-u/--unwrapBatch). - Validates responses (non‑error + newPayload checks) unless
--dryis used. - Emits metrics (per‑method durations, batch durations, totals) and can trace responses to a file (
-r/--responses).
-i, --input <path>: File or directory of messages (required)-s, --secret <path>: Hex JWT secret file (required)-a, --address <URL>: Engine address (defaulthttp://localhost:8551)-t, --ttl <seconds>: JWT TTL seconds (default 60)-o, --output <Report|Json>: Metrics output format (default Report)-r, --responses <path>: Write JSON‑RPC responses to file-f, --filters <patterns>: Comma‑separated regex; supports limits:pattern=NN-e, --rps <int>: Requests per second (>0 throttles; <=0 sequential)-u, --unwrapBatch: Treat batch items as individual requests-d, --dry: Don’t send requests (still builds auth token)-p, --progress: Show progress (startup overhead)
- Wrapper:
run_kute.pybuilds the command and sets labels for remote metrics (Loki/Prometheus) using env vars. It writes stdout toresults/{client}_results_*and engine responses toresults/{client}_response_*. - Orchestrator:
run.shcalls the wrapper per test case, per client, per run; reporters then compute aggregates from these artifacts.
Minimal direct usage from repo root (example):
./nethermind/tools/artifacts/bin/Nethermind.Tools.Kute/release/Nethermind.Tools.Kute \
-i eest_tests/testing/000001/MStore_150M.txt \
-s engine-jwt/jwt.hex \
-a http://localhost:8551 \
-r results/nethermind_response_1_MStore_150M.txt \
-o ReportA “test” for Kute is a newline‑delimited file of JSON‑RPC requests (or batches). In this repo:
- Captured payloads live under
eest_tests/(grouped intosetup/andtesting/directories) and end with_<Gas>M.txt(e.g.,MStore_150M.txt). utils.get_test_cases(tests_path)discovers test names and gas variants by filename pattern; reporters use those names directly (or any optional metadata you place alongside the dataset).- Lines must be valid JSON objects or JSON arrays (for batch). If using
--unwrapBatch, each array item will be sent individually. - To record real traffic, run a client with RpcRecorderState and export logs, or use
capture_eest_tests.pyto transform Execution Layer Spec (EELS) fixtures into.txtlines.
- Metrics stdout (Report|Json): per‑method durations, totals, counts; gas-benchmarks parsers convert these to MGas/s and percentiles.
- Response trace file (
-r): line‑delimited JSON responses, used to validateVALIDstatus for aggregation.
Execution Layer Spec Tests (EELS) is the canonical test suite and tooling for Ethereum execution clients. It can generate and run test cases across forks and exposes multiple execution modes:
- consume direct: call a client’s test interface for fast EVM dev loops
- consume rlp: feed RLP blocks to simulate historical sync
- consume engine: drive the Engine API (post-merge) with payloads
- execute remote/hive: run Python tests against a live client via RPC, or on a local Hive network
In this repo, EELS is used to source benchmark scenarios and produce deterministic payloads for performance measurements.
- Authoritative scenarios for protocol/fork coverage
- Deterministic inputs across clients
- Multiple execution backends (Engine API, RLP, direct) to stress different code paths
- Works both locally and in CI/Hive setups
Clone the upstream repository (requires uv):
git clone https://github.com/ethereum/execution-specs
cd execution-specs
uv python install 3.11
uv python pin 3.11
uv sync --all-extras- Execute on a live client (remote):
uv run execute remote \
--fork=Prague \
--rpc-endpoint=http://127.0.0.1:8545 \
--rpc-chain-id=1 \
--rpc-seed-key 0x<private_key> \
tests -- -m benchmark -n 1- Execute a specific test file/case remotely:
uv run execute remote --fork=Prague --rpc-endpoint=http://127.0.0.1:8545 --rpc-chain-id=1 --rpc-seed-key 0x<key> \
./tests/prague/.../test_x.py::test_case- Run Engine API simulator with JSON fixtures (parallel):
uv run consume engine --input=<fixture_dir> -n auto- List collected tests without running:
uv run consume engine --input=<fixture_dir> --collect-only -q- RLP mode (pre/post-merge forks; sync path):
uv run consume rlp --input=<fixture_dir>Notes:
--forkmust match the target fork (e.g., Prague/Osaka. Previous forks are not supported yet).- Remote mode needs a funded key (
--rpc-seed-key) and--rpc-chain-id. - Use pytest filters
-kor marks-m benchmarkto select benchmark tests.
- place tests under
tests/benchmark/(fork subtrees as needed) and make them filterable with-m benchmark. - avoid randomness and time-based values; explicitly set addresses, nonces, balances, gas, and data sizes so payloads can be reproduced.
- use
@pytest.mark.valid_from("<Fork>")/@pytest.mark.valid_until("<Fork>")at function/class/module level to scope forks. - prefer
@pytest.mark.parametrize(..., ids=[...])to encode size/shape variations (e.g., payload byte sizes, number of txs) for consistent benchmark IDs. - one bottleneck per test (e.g., SSTORE cold/warm, KECCAK sizes, precompile inputs). Keep pre-state minimal and re-use shared pre-state when possible to reduce setup noise.
- choose the backend that exercises the intended path (Engine for post-merge consensus path, RLP for sync/import, direct for fast EVM-only loops). Benchmarks should state their intent.
- ensure tests can run under
execute remoteand/or produce fixtures consumable byconsume engine/rlp.
Reference: EELS benchmark tests guide: https://github.com/ethereum/execution-specs/blob/main/docs/writing_tests/benchmarks.md
- Capture/convert fixtures to payload files for Kute with the repo tool:
python capture_eest_tests.py -o eest_tests -x "pattern_to_exclude"This reads EELS fixtures and writes newline-delimited JSON-RPC payload .txt files (using utils/make_rpc.jq).
- Run the gas-benchmarks pipeline with
-t eest_tests/to benchmark across clients:
bash run.sh -t "eest_tests/" -w "warmup-tests" -c "client1,client2" -r 3- Parallelism:
-n auto(requires pytest-xdist) foruv run consume engine ... -n auto - Durations:
--durations=10to print slowest tests - Verbosity/debug:
-v,-x,--pdb,-s
- Upstream CI target for deployed-fork benchmarks:
uvx --with=tox-uv tox -e tests-deployed-benchmark - Local Hive (for execute): run Hive dev mode and point EELS to the simulator (
HIVE_SIMULATOR)
- Purpose
- EELS: Full framework to generate and run protocol tests; can drive clients via Engine API, direct EVM harnesses, RLP sync, or live RPC.
- Kute: Lightweight replayer that measures Engine API performance by sending prebuilt JSON-RPC messages.
- Inputs
- EELS: Python test modules/fixtures; generates payloads/transactions, validates outcomes.
- Kute: Newline-delimited JSON-RPC request lines or batches.
- Execution context
- EELS: Can build state, deploy contracts, produce payloads; validates correctness against spec expectations.
- Kute: Assumes inputs are already valid; focuses on timing/throughput and response tracing.
- When to use
- Use EELS to author/capture benchmark scenarios and validate behavior.
- Use Kute to replay those scenarios uniformly across clients/images and compute perf metrics.
- Install EELS dependencies and confirm you can list/execute benchmark-marked tests.
- Create/edit benchmark tests under
tests/benchmark/with deterministic behavior. - Capture or transform tests into payload
.txtfiles viacapture_eest_tests.pywhen you want to benchmark with Kute. - Run gas-benchmarks (
run.sh) over clients/images; reviewresults/reports/. - For long-term tracking, ingest reports into PostgreSQL with the provided DB scripts.
- Run a small benchmark subset locally and attach
results/reports/index.html(or TXT tables) to the PR. - Keep changes minimal and focused (new tests, new client adapter, new reporter, or utility changes).
Your changes help maintain reliable, comparable performance signals across Ethereum execution clients. Keep runs reproducible, artifacts explicit, and reports easy to consume.