Skip to content

feat(specs,tests): Update all benchmark tests to use gas_benchmark_value#1935

Merged
marioevz merged 17 commits intomainfrom
update-all-benchmark-tests
Jul 28, 2025
Merged

feat(specs,tests): Update all benchmark tests to use gas_benchmark_value#1935
marioevz merged 17 commits intomainfrom
update-all-benchmark-tests

Conversation

@marioevz
Copy link
Copy Markdown
Member

@marioevz marioevz commented Jul 23, 2025

🗒️ Description

Add check for gas used in blockchain and state tests

blockchain_test and state_test now have a flag called expected_benchmark_gas_used which can be used to specify the expected gas used by the last payload (the benchmark payload).

If the value is not specified, the test is expected to use the value specified by --gas-benchmark-values entirely.

Modified all benchmark tests to use gas_benchmark_value

All benchmark tests now use gas_benchmark_value, and also expected_benchmark_gas_used if the test is not expected to use the entire gas (e.g. when the tx is not expected to run out-of-gas and instead consume a specific amount of gas).

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).

@marioevz marioevz requested review from LouisTsai-Csie and jsign July 23, 2025 00:38
@marioevz marioevz marked this pull request as ready for review July 23, 2025 02:11
@jsign
Copy link
Copy Markdown
Collaborator

jsign commented Jul 24, 2025

I'll take a look at this PR by tomorrow!

OK, pushed it for today :)

@marioevz marioevz force-pushed the update-all-benchmark-tests branch from 0e966d4 to 8f1795a Compare July 24, 2025 22:45
@marioevz
Copy link
Copy Markdown
Member Author

Updated the branch with the following changes:

  • Fix test_empty_block so it no longer requests for unused fixture.
  • Improve test_worst_storage_access_cold to add cases where it (1) completes execution and SSTOREs affect the state, (2) reverts after SSTOREs, (3) runs out of gas and also reverts SSTOREs.
  • Fixes to some tests' environments and bumped the maximum environment gas limit, and now every single test can be run from a SINGLE GENESIS FILE 🎉
(ethereum-execution-spec-tests) ➜  execution-spec-tests git:(update-all-benchmark-tests) uv run groupstats ./fixtures/blockchain_tests_engine_x/pre_alloc -vv

Pre-Allocation Statistics Summary
Total groups: 1
Total tests: 3271
Total accounts: 64033

Tests and Accounts per Group
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┓
┃ Group Hash  ┃ Fork   ┃ Tests ┃ Accounts ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━┩
│ 0x313d28... │ Prague │  3271 │    64033 │
└─────────────┴────────┴───────┴──────────┘

No need to restart clients anymore to perform every single benchmark we have 😉

cc @jsign @LouisTsai-Csie

Copy link
Copy Markdown
Collaborator

@jsign jsign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice change! Left some opinions for your consideration.

Lateral question: is there a way to compare the generated fixtures of this branch against master to see if there are only changes in the expected tests that had bugs/improvements? (e.g., not sure if there's some produced "hash" of each fixture output that could be compared to double check).

I'm a paranoid person, so for such a big refactoring I'm always fearing something falling through the cracks.

If it sounds like a rabbit hole, dismiss.

@marioevz marioevz force-pushed the update-all-benchmark-tests branch from 8f1795a to 1fca4d9 Compare July 25, 2025 20:56
@marioevz
Copy link
Copy Markdown
Member Author

marioevz commented Jul 25, 2025

Very nice change! Left some opinions for your consideration.

Lateral question: is there a way to compare the generated fixtures of this branch against master to see if there are only changes in the expected tests that had bugs/improvements? (e.g., not sure if there's some produced "hash" of each fixture output that could be compared to double check).

I'm a paranoid person, so for such a big refactoring I'm always fearing something falling through the cracks.

If it sounds like a rabbit hole, dismiss.

I think I've got your solution with this evmone new feature: ipsilon/evmone#1274

I could give it a try to (1) test it out for pawel, (2) verify that the tests produce similar opcode numbers before and after this PR using the same gas limits.

@jsign
Copy link
Copy Markdown
Collaborator

jsign commented Jul 25, 2025

Very nice change! Left some opinions for your consideration.
Lateral question: is there a way to compare the generated fixtures of this branch against master to see if there are only changes in the expected tests that had bugs/improvements? (e.g., not sure if there's some produced "hash" of each fixture output that could be compared to double check).
I'm a paranoid person, so for such a big refactoring I'm always fearing something falling through the cracks.
If it sounds like a rabbit hole, dismiss.

I think I've got your solution with this evmone new feature: ipsilon/evmone#1274

I could give it a try to (1) test it out for pawel, (2) verify that the tests produce similar opcode numbers before and after this PR using the same gas limits.

Makes sense, since actually most "compute" test might not have a visible effect in block header fields such as state root and similar (used gas is a red herring) -- so the opcode count actually sounds better to achieve that. For the "stateful" ones, checking the block header might be enough.

@marioevz
Copy link
Copy Markdown
Member Author

Generated the fixtures from master and from this PR with the opcode counts produced by ipsilon/evmone#1274:

benchmark-fixtures-with-opcode-count.tar.gz

Each fixture in every fixture file now contains a opcode_count element in the _info object, and it contains the number of times every opcode was executed during the test generation.

I have not made the comparison because the name changes make it tricky, but will do it next week.

@marioevz
Copy link
Copy Markdown
Member Author

I gpt'd a script to get the differences (which could definitely be included in the PR to support opcode counts!) and got the results:

https://gist.github.com/marioevz/b745761e0882549c59ab9195b6865386

@jsign
Copy link
Copy Markdown
Collaborator

jsign commented Jul 26, 2025

I gpt'd a script to get the differences (which could definitely be included in the PR to support opcode counts!) and got the results:

https://gist.github.com/marioevz/b745761e0882549c59ab9195b6865386

Looks awesome!

@marioevz
Copy link
Copy Markdown
Member Author

Merging since I believe @LouisTsai-Csie comments were addressed 👍

@marioevz marioevz merged commit 377e175 into main Jul 28, 2025
16 checks passed
@marioevz marioevz deleted the update-all-benchmark-tests branch July 28, 2025 19:23
kclowes pushed a commit to kclowes/execution-spec-tests that referenced this pull request Oct 20, 2025
…alue` (ethereum#1935)

* feat(specs): Check gas used by benchmark tests matches expectation

* feat(tests): Update all benchmark tests to use `gas_benchmark_value`

* fix: tox

* fix(plugins/benchmarking): Increase GIGA_GAS

* fix(tests): Environment usages

* docs: Changelog

* fix(tox): tests-deployed-benchmark command line

* fix: Remove unused parameter

* fix: `test_worst_storage_access_cold` and add more cases

* fix: `test_worst_bytecode_single_opcode` incorrect env overwrite

* fix(plugins): Bump `BENCHMARKING_MAX_GAS` so all benchmarking tests can run from a single genesis

* feat(plugins): Pass `env` fixture to the test spec builder by default if not specified

* fix(benchmark): Remove `env` from tests where it's unused

* fix(benchmark): tests referencing the `fee_recipient` from the environment

* fix(plugins): Set benchmark environment based on test mark

* fix: minor fix

* fix(plugins): Defaults for `env` and `genesis_environment`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants