Skip to content

eest: add standalone CLI runners for statetest, blocktest, and enginetest#4101

Draft
spencer-tb wants to merge 8 commits intostatus-im:masterfrom
spencer-tb:master
Draft

eest: add standalone CLI runners for statetest, blocktest, and enginetest#4101
spencer-tb wants to merge 8 commits intostatus-im:masterfrom
spencer-tb:master

Conversation

@spencer-tb
Copy link
Copy Markdown

@spencer-tb spencer-tb commented Apr 5, 2026

Summary

Add standalone CLI wrappers for the existing EEST test runners, enabling `consume direct` integration without Hive. Each runner supports `--run=`, `--json`, `--workers=N` (accepted, sequential), and directory input.

Why standalone CLIs?

The existing test infrastructure uses unittest2 harnesses (eest_blockchain_test.nim, eest_engine_test.nim) and a separate evmstate tool. None support the flags needed for consume direct:

  • unittest2 harness — no --run filter, no --json output, hardcoded fixture paths
  • evmstate — single-file only, no directory support

The new CLI modes allow EELS `consume direct` to invoke each runner directly with fixture paths.

evmstate improvements

  • --run=<pattern>: substring filter on file paths
  • --workers=N: placeholder for future parallel support
  • Directory input: recursively finds .json files

eest_blockchain CLI mode

Standalone binary mode via when isMainModule. Walks fixture directories, processes each file through the existing processFile() which uses ForkedChainRef + block executor — same code path as consume rlp.

eest_engine CLI mode

Standalone binary mode with two paths:

  • Default: starts real HTTP RPC server per test, sends `engine_newPayloadV1-V5` + `engine_forkchoiceUpdatedV1-V4` via JSON-RPC client. Same code path as `consume engine` via Hive.
  • `--fast`: calls `BeaconEngine.newPayload()` / `forkchoiceUpdated()` directly without HTTP. Skips TCP bind/connect/close per test. 2x faster.

Benchmarks

Tested against EEST v5.3.0 stable fixtures on Apple M-series.

`evmstate` (2,674 state test files):

Mode Time
Sequential 7.3s

`eest_blockchain` (2,777 blockchain test files):

Mode Time
Default 2m15s

`eest_engine` (2,776 engine test files):

Mode Time
Default (RPC) ~2h (est)
`--fast` (BeaconEngine direct) 1m57s
`--fast` Prague only (146 files) 9.1s

100% pass rate across all three runners on v5.3.0 stable.

Usage

# State tests
./tools/evmstate/evmstate /path/to/state_tests/
./tools/evmstate/evmstate --run=eip7702 /path/to/state_tests/

# Block tests
nim c tests/eest/eest_blockchain
./tests/eest/eest_blockchain /path/to/blockchain_tests/
./tests/eest/eest_blockchain --json /path/to/blockchain_tests/

# Engine tests
nim c tests/eest/eest_engine
./tests/eest/eest_engine --fast /path/to/blockchain_tests_engine/
./tests/eest/eest_engine /path/to/blockchain_tests_engine/  # full RPC path

Related: ethereum/go-ethereum#34650, erigontech/erigon#20315, NethermindEth/nethermind#11035, besu-eth/besu#10184, lambdaclass/ethrex#6445, ethereum/execution-spec-tests#2319

- --run=<pattern>: substring filter on file paths
- --workers=N: placeholder for future parallel support
- Directory input: recursively finds .json files when path is a directory

Full suite: 2,674 files in 7.2s (sequential).
evmstate: --run, --workers, directory support
eest_blockchain: --run, --json, --workers, directory support
eest_engine: --run, --json, --workers, directory support

All runners now work as standalone binaries with consume direct
compatible flags. Engine test uses real JSON-RPC newPayloadV1-V5 +
forkchoiceUpdatedV1-V4 handlers.
blocktest --fast: bypasses ForkedChainRef, calls executor directly.
  2,777 files: 2m19s (minimal improvement — EVM execution dominates)

enginetest --fast: calls BeaconEngine.newPayload/forkchoiceUpdated
  directly without HTTP RPC server. Prague: 9.1s (was 18.9s, 2x faster)

Default modes preserve full consume rlp/engine code paths.
Process-level parallelism too heavy for nimbus binary (~100MB per fork).
In-process parallelism needs GC-safe procs (nimbus team effort).
Sequential: blocktest 2m15s, enginetest --fast 9.1s.
processBlock(skipStateRootCheck=true) in --fast mode.
Minimal speedup — the bottleneck is ledger.persist() writing to the
merkle patricia trie per block, not the root computation itself.
2m17s vs 2m15s default. Needs lightweight ledger mode for real improvement.
spencer-tb added a commit to spencer-tb/execution-specs that referenced this pull request Apr 8, 2026
New `validate` CLI command for running EEST fixtures directly against
client EVM binaries, replacing Hive for execution correctness testing.

Usage:
  validate health                    # health check all clients
  validate engine --client geth      # engine tests
  validate state --client besu       # state tests
  validate block --client nethermind # block tests

Features:
- 7 clients: geth, besu, nethermind, erigon, reth, ethrex, nimbus
- Per-type Pydantic result models: StateTestResult, BlockTestResult,
  EngineTestResult with type-specific fields
- Exception matching: maps client error strings to EEST exception
  types via ExceptionMapper, verifies correct exception for every
  invalid test (--no-exception-check to disable)
- Cross-validation: lastBlockHash against fixture, lastPayloadStatus
  (VALID/INVALID) for engine tests
- validate.toml config for client binary paths with per-type overrides
  (state-bin, block-bin, engine-bin)
- Auto bin-workers and xdist tuning per client
- Bundled Frontier sanity fixtures for health checks
- Shared validate_helpers.py for validation logic

Client binary PRs:
- geth: ethereum/go-ethereum#34650
- erigon: erigontech/erigon#20315
- besu: besu-eth/besu#10184
- nethermind: NethermindEth/nethermind#11035
- reth: paradigmxyz/reth#23361
- ethrex: lambdaclass/ethrex#6445
- nimbus: status-im/nimbus-eth1#4101
- revm: bluealloy/revm#3544

Tracking issue: ethereum#2319
spencer-tb added a commit to spencer-tb/execution-specs that referenced this pull request Apr 8, 2026
New `validate` CLI command for running EEST fixtures directly against
client EVM binaries, replacing Hive for execution correctness testing.

Usage:
  validate health                    # health check all clients
  validate engine --client geth      # engine tests
  validate state --client besu       # state tests
  validate block --client nethermind # block tests

Features:
- 7 clients: geth, besu, nethermind, erigon, reth, ethrex, nimbus
- Per-type Pydantic result models: StateTestResult, BlockTestResult,
  EngineTestResult with type-specific fields
- Exception matching: maps client error strings to EEST exception
  types via ExceptionMapper, verifies correct exception for every
  invalid test (--no-exception-check to disable)
- Cross-validation: lastBlockHash against fixture, lastPayloadStatus
  (VALID/INVALID) for engine tests
- validate.toml config for client binary paths with per-type overrides
  (state-bin, block-bin, engine-bin)
- Auto bin-workers and xdist tuning per client
- Bundled Frontier sanity fixtures for health checks
- Shared validate_helpers.py for validation logic

Client binary PRs:
- geth: ethereum/go-ethereum#34650
- erigon: erigontech/erigon#20315
- besu: besu-eth/besu#10184
- nethermind: NethermindEth/nethermind#11035
- reth: paradigmxyz/reth#23361
- ethrex: lambdaclass/ethrex#6445
- nimbus: status-im/nimbus-eth1#4101
- revm: bluealloy/revm#3544

Tracking issue: ethereum#2319
spencer-tb added a commit to spencer-tb/execution-specs that referenced this pull request Apr 8, 2026
New `validate` CLI command for running EEST fixtures directly against
client EVM binaries, replacing Hive for execution correctness testing.

Usage:
  validate health                    # health check all clients
  validate engine --client geth      # engine tests
  validate state --client besu       # state tests
  validate block --client nethermind # block tests

Features:
- 7 clients: geth, besu, nethermind, erigon, reth, ethrex, nimbus
- Per-type Pydantic result models: StateTestResult, BlockTestResult,
  EngineTestResult with type-specific fields
- Exception matching: maps client error strings to EEST exception
  types via ExceptionMapper, verifies correct exception for every
  invalid test (--no-exception-check to disable)
- Cross-validation: lastBlockHash against fixture, lastPayloadStatus
  (VALID/INVALID) for engine tests
- validate.toml config for client binary paths with per-type overrides
  (state-bin, block-bin, engine-bin)
- Auto bin-workers and xdist tuning per client
- Bundled Frontier sanity fixtures for health checks
- Shared validate_helpers.py for validation logic

Client binary PRs:
- geth: ethereum/go-ethereum#34650
- erigon: erigontech/erigon#20315
- besu: besu-eth/besu#10184
- nethermind: NethermindEth/nethermind#11035
- reth: paradigmxyz/reth#23361
- ethrex: lambdaclass/ethrex#6445
- nimbus: status-im/nimbus-eth1#4101
- revm: bluealloy/revm#3544

Tracking issue: ethereum#2319
spencer-tb added a commit to spencer-tb/execution-specs that referenced this pull request Apr 8, 2026
New `validate` CLI command for running EEST fixtures directly against
client EVM binaries, replacing Hive for execution correctness testing.

Usage:
  validate health                    # health check all clients
  validate engine --client geth      # engine tests
  validate state --client besu       # state tests
  validate block --client nethermind # block tests

Features:
- 7 clients: geth, besu, nethermind, erigon, reth, ethrex, nimbus
- Per-type Pydantic result models: StateTestResult, BlockTestResult,
  EngineTestResult with type-specific fields
- Exception matching: maps client error strings to EEST exception
  types via ExceptionMapper, verifies correct exception for every
  invalid test (--no-exception-check to disable)
- Cross-validation: lastBlockHash against fixture, lastPayloadStatus
  (VALID/INVALID) for engine tests
- validate.toml config for client binary paths with per-type overrides
  (state-bin, block-bin, engine-bin)
- Auto bin-workers and xdist tuning per client
- Bundled Frontier sanity fixtures for health checks
- Shared validate_helpers.py for validation logic

Client binary PRs:
- geth: ethereum/go-ethereum#34650
- erigon: erigontech/erigon#20315
- besu: besu-eth/besu#10184
- nethermind: NethermindEth/nethermind#11035
- reth: paradigmxyz/reth#23361
- ethrex: lambdaclass/ethrex#6445
- nimbus: status-im/nimbus-eth1#4101
- revm: bluealloy/revm#3544

Tracking issue: ethereum#2319
spencer-tb added a commit to spencer-tb/execution-specs that referenced this pull request Apr 8, 2026
New `validate` CLI command for running EEST fixtures directly against
client EVM binaries, replacing Hive for execution correctness testing.

Usage:
  validate health                    # health check all clients
  validate engine --client geth      # engine tests
  validate state --client besu       # state tests
  validate block --client nethermind # block tests

Features:
- 7 clients: geth, besu, nethermind, erigon, reth, ethrex, nimbus
- Per-type Pydantic result models: StateTestResult, BlockTestResult,
  EngineTestResult with type-specific fields
- Exception matching: maps client error strings to EEST exception
  types via ExceptionMapper, verifies correct exception for every
  invalid test (--no-exception-check to disable)
- Cross-validation: lastBlockHash against fixture, lastPayloadStatus
  (VALID/INVALID) for engine tests
- validate.toml config for client binary paths with per-type overrides
  (state-bin, block-bin, engine-bin)
- Auto bin-workers and xdist tuning per client
- Bundled Frontier sanity fixtures for health checks
- Shared validate_helpers.py for validation logic

Client binary PRs:
- geth: ethereum/go-ethereum#34650
- erigon: erigontech/erigon#20315
- besu: besu-eth/besu#10184
- nethermind: NethermindEth/nethermind#11035
- reth: paradigmxyz/reth#23361
- ethrex: lambdaclass/ethrex#6445
- nimbus: status-im/nimbus-eth1#4101
- revm: bluealloy/revm#3544

Tracking issue: ethereum#2319
spencer-tb added a commit to spencer-tb/execution-specs that referenced this pull request Apr 8, 2026
New `validate` CLI command for running EEST fixtures directly against
client EVM binaries, replacing Hive for execution correctness testing.

Usage:
  validate health                    # health check all clients
  validate engine --client geth      # engine tests
  validate state --client besu       # state tests
  validate block --client nethermind # block tests

Features:
- 7 clients: geth, besu, nethermind, erigon, reth, ethrex, nimbus
- Per-type Pydantic result models: StateTestResult, BlockTestResult,
  EngineTestResult with type-specific fields
- Exception matching: maps client error strings to EEST exception
  types via ExceptionMapper, verifies correct exception for every
  invalid test (--no-exception-check to disable)
- Cross-validation: lastBlockHash against fixture, lastPayloadStatus
  (VALID/INVALID) for engine tests
- validate.toml config for client binary paths with per-type overrides
  (state-bin, block-bin, engine-bin)
- Auto bin-workers and xdist tuning per client
- Bundled Frontier sanity fixtures for health checks
- Shared validate_helpers.py for validation logic

Client binary PRs:
- geth: ethereum/go-ethereum#34650
- erigon: erigontech/erigon#20315
- besu: besu-eth/besu#10184
- nethermind: NethermindEth/nethermind#11035
- reth: paradigmxyz/reth#23361
- ethrex: lambdaclass/ethrex#6445
- nimbus: status-im/nimbus-eth1#4101
- revm: bluealloy/revm#3544

Tracking issue: ethereum#2319
spencer-tb added a commit to spencer-tb/execution-specs that referenced this pull request Apr 8, 2026
New `validate` CLI command for running EEST fixtures directly against
client EVM binaries, replacing Hive for execution correctness testing.

Usage:
  validate health                    # health check all clients
  validate engine --client geth      # engine tests
  validate state --client besu       # state tests
  validate block --client nethermind # block tests

Features:
- 7 clients: geth, besu, nethermind, erigon, reth, ethrex, nimbus
- Per-type Pydantic result models: StateTestResult, BlockTestResult,
  EngineTestResult with type-specific fields
- Exception matching: maps client error strings to EEST exception
  types via ExceptionMapper, verifies correct exception for every
  invalid test (--no-exception-check to disable)
- Cross-validation: lastBlockHash against fixture, lastPayloadStatus
  (VALID/INVALID) for engine tests
- validate.toml config for client binary paths with per-type overrides
  (state-bin, block-bin, engine-bin)
- Auto bin-workers and xdist tuning per client
- Bundled Frontier sanity fixtures for health checks
- Shared validate_helpers.py for validation logic

Client binary PRs:
- geth: ethereum/go-ethereum#34650
- erigon: erigontech/erigon#20315
- besu: besu-eth/besu#10184
- nethermind: NethermindEth/nethermind#11035
- reth: paradigmxyz/reth#23361
- ethrex: lambdaclass/ethrex#6445
- nimbus: status-im/nimbus-eth1#4101
- revm: bluealloy/revm#3544

Tracking issue: ethereum#2319
spencer-tb added a commit to spencer-tb/execution-specs that referenced this pull request Apr 8, 2026
New `validate` CLI command for running EEST fixtures directly against
client EVM binaries, replacing Hive for execution correctness testing.

Usage:
  validate health                    # health check all clients
  validate engine --client geth      # engine tests
  validate state --client besu       # state tests
  validate block --client nethermind # block tests

Features:
- 7 clients: geth, besu, nethermind, erigon, reth, ethrex, nimbus
- Per-type Pydantic result models: StateTestResult, BlockTestResult,
  EngineTestResult with type-specific fields
- Exception matching: maps client error strings to EEST exception
  types via ExceptionMapper, verifies correct exception for every
  invalid test (--no-exception-check to disable)
- Cross-validation: lastBlockHash against fixture, lastPayloadStatus
  (VALID/INVALID) for engine tests
- validate.toml config for client binary paths with per-type overrides
  (state-bin, block-bin, engine-bin)
- Auto bin-workers and xdist tuning per client
- Bundled Frontier sanity fixtures for health checks
- Shared validate_helpers.py for validation logic

Client binary PRs:
- geth: ethereum/go-ethereum#34650
- erigon: erigontech/erigon#20315
- besu: besu-eth/besu#10184
- nethermind: NethermindEth/nethermind#11035
- reth: paradigmxyz/reth#23361
- ethrex: lambdaclass/ethrex#6445
- nimbus: status-im/nimbus-eth1#4101
- revm: bluealloy/revm#3544

Tracking issue: ethereum#2319
spencer-tb added a commit to spencer-tb/execution-specs that referenced this pull request Apr 8, 2026
New `validate` CLI command for running EEST fixtures directly against
client EVM binaries, replacing Hive for execution correctness testing.

Usage:
  validate health                    # health check all clients
  validate engine --client geth      # engine tests
  validate state --client besu       # state tests
  validate block --client nethermind # block tests

Features:
- 7 clients: geth, besu, nethermind, erigon, reth, ethrex, nimbus
- Per-type Pydantic result models: StateTestResult, BlockTestResult,
  EngineTestResult with type-specific fields
- Exception matching: maps client error strings to EEST exception
  types via ExceptionMapper, verifies correct exception for every
  invalid test (--no-exception-check to disable)
- Cross-validation: lastBlockHash against fixture, lastPayloadStatus
  (VALID/INVALID) for engine tests
- validate.toml config for client binary paths with per-type overrides
  (state-bin, block-bin, engine-bin)
- Auto bin-workers and xdist tuning per client
- Bundled Frontier sanity fixtures for health checks
- Shared validate_helpers.py for validation logic

Client binary PRs:
- geth: ethereum/go-ethereum#34650
- erigon: erigontech/erigon#20315
- besu: besu-eth/besu#10184
- nethermind: NethermindEth/nethermind#11035
- reth: paradigmxyz/reth#23361
- ethrex: lambdaclass/ethrex#6445
- nimbus: status-im/nimbus-eth1#4101
- revm: bluealloy/revm#3544

Tracking issue: ethereum#2319
Copy link
Copy Markdown
Contributor

@kdeme kdeme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look into the details of what you implemented here to see if it is all correct because:

  1. I'm not sure what it needs to do
  2. It seems to contain quite a bit of custom logic which I believe test applications should not have. Else you are testing the test application and not the actual implementation. This is also not maintainable.

None support the flags needed for consume direct:

Sorry for my ignorance, but care to explain what consume direct means here and what the required flags are?


return testPass

proc runTestFast(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proc contains quite a lot of custom logic for a testing proc. This is not maintainable.


var testPass = true
for unit in fixture.units:
try:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a big try/except clause. We prefer to place these around specific areas where they are needed

if entry.endsWith(".json") and "/.meta/" notin entry:
result.add(entry)

proc printUsage() =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are to expand this standalone functionality, to a proper tool, we should be using confutils for cli option parsing to be consistent with our other binaries / tools.

failCount = 0

if workers > 1:
discard workers # TODO: in-process parallelism requires GC-safe procs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need this for your cli compatibility, then perhaps log here that it doesn't do anything when set >1, and also mention this on the cli flag help.

var testPass = true
for unit in fixture.units:
try:
let
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason we can't use prepareEnv here like in normal test run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants