Skip to content

Conversation

@LouisTsai-Csie
Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie commented Sep 16, 2025

🗒️ Description

Enhanced Interface

In the original benchmark helpers, the repeated code pattern is structured as:

<setup><JUMPDEST><attack><attack>...<attack><JUMP> 

This PR adds a new cleanup section before each iteration, and the interface change accordingly:

<setup><JUMPDEST><attack><attack>...<attack><cleanup><JUMP> 

Refactoring target

Updating to benchmark wrapper

  • Update to benchmark test wrapper to handle eip7825 scenario
  • Use JumpLoopGenerator or ExtCodeGenerator for repeated pattern.

Unifying the variable names

  • Use setup for the code pattern before the bytecode loop (instead of code_prefix, calldata).
  • Use attack_block for the actual benchmark execution pattern (instead of code_loop_body, attack_iter, code_sequence, loop_body, code_segment, op_sequence, or iter_block).
  • Use cleanup for the phase after the execution loop (instead of code_suffix, code_loop_footer).
  • Use gas_benchmark_value for the benchmark gas limit (instead of attack_gas_limit).
  • Use tx_gas_limit for the transaction gas limit cap (instead of tx_gas_limit_cap) -> this should be avoided as the wrapper automatically handle such scenario.

Removing redundant parameters

  • Remove gas_benchmark_value as it is configured automatically
  • Remove fork, env param if not used in function

Implementation Note

Most of the test logic remains the same, but the following cases differ and may be worth a closer look from reviewers.

  • test_worst_returndatasize_zero
  • test_worst_keccak
  • test_worst_binop_simple
  • test_worst_unop
  • test_worst_calldatacopy
  • test_block_full_data
  • test_worst_blockhash

Some test cases have not been refactored, as they would require significantly more effort and may even need to be rewritten. I suggest skipping these for now and updating them in a separate PR:

  • test_worst_bytecode_single_opcode

Additionally, some test cases may fail on the Osaka fork:

  • test_worst_address_state_cold
  • test_worst_selfdestruct_existing
  • test_worst_selfdestruct_created
  • test_worst_selfdestruct_initcode

🔗 Related Issues or PRs

Requires PR #1956 and PR #1945
Relevant discussion: issue ethereum/execution-specs#1557

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

@LouisTsai-Csie LouisTsai-Csie self-assigned this Sep 16, 2025
@LouisTsai-Csie LouisTsai-Csie changed the title Refactor benchmark tests refactor(benchmark): update to benchmark test wrapper Sep 16, 2025
@LouisTsai-Csie LouisTsai-Csie force-pushed the refactor-benchmark-tests branch 2 times, most recently from eb1fc44 to 619f1cf Compare September 19, 2025 07:11
@LouisTsai-Csie LouisTsai-Csie force-pushed the refactor-benchmark-tests branch from 41bc03e to cc03678 Compare September 25, 2025 06:08
@LouisTsai-Csie LouisTsai-Csie marked this pull request as ready for review September 25, 2025 08:48
@LouisTsai-Csie LouisTsai-Csie force-pushed the refactor-benchmark-tests branch 2 times, most recently from beb239d to b51eba1 Compare September 29, 2025 09:24
@LouisTsai-Csie
Copy link
Collaborator Author

LouisTsai-Csie commented Oct 10, 2025

The change in the PR might be large, but it could be split into smaller parts:

  • benchmark_code_generator.py & benchmark.py: These two cases update the fundamental helper function.

Below are files that contains smaller amount of test cases, it would be nice if someone could first review some of them, so that i could apply initial feedback to the remaining cases.

  • test_worst_blocks.py(4)
  • test_worst_bytecode (4)
  • test_worst_memory (4)
  • test_worst_opcode (4)
  • test_worst_stateful_opcodes (10)

This is the largest one, but most of the changes are identical, i already documented the tests that contains logic changes above.

  • test_worst_compute (32)

@LouisTsai-Csie LouisTsai-Csie force-pushed the refactor-benchmark-tests branch from 5d5ffda to d77d3a4 Compare October 10, 2025 16:14
@LouisTsai-Csie
Copy link
Collaborator Author

The opcode count comparison result: https://gist.github.com/LouisTsai-Csie/74189c969201871d641a785ab9966090

@marioevz
Copy link
Member

Pushed a commit that changes the following:

  • Makes deploy_contracts and generate_transaction in BenchmarkCodeGenerator require keyword arguments.
  • Added a tx_kwargs field to BenchmarkCodeGenerator that can be used to add extra values to the benchmark transaction (like its data, or value, blobs, etc). This removed most of the generate_transaction usages in tests, so now only BenchmarkTest calls this function internally, and its usage is mostly abstracted to the user. See before and after.
  • Removed having to pass pre or post to benchmark_test so the usages are cleaner throughout the tests.
  • Added defaults to methods that have setup and cleanup to be empty bytecode (Bytecode()) so we don't have to always pass it.

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I left only two comments but it's fine by me if we instead create issues for them.

@LouisTsai-Csie LouisTsai-Csie force-pushed the refactor-benchmark-tests branch from e2f2024 to 454bdd6 Compare October 13, 2025 06:24
@LouisTsai-Csie
Copy link
Collaborator Author

Thanks @marioevz for the fix, i rebase the code and fix some issue.

  • Update max_iteration calculation logic.
  • Refactor MSIZE benchmark test.

This is final comparison, except MSIZE, all the other benchmark tests do not reduce the total cycle count during execution.

Please review the discussion above for the MSIZE issue.

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved again!

I've fixed the setup_blocks for them to always be included in the test.

@marioevz marioevz merged commit 131adfe into ethereum:main Oct 13, 2025
27 of 30 checks passed
kclowes pushed a commit to kclowes/execution-spec-tests that referenced this pull request Oct 20, 2025
* refactor(benchmark): update code generator interface

* refactor(benchmark): update worst bytecode scenario

* refactor(benchmark): update worst compute scenario

* refactor(benchmark): update worst memory scenario

* refactor(benchmark): update worst opcode scenario

* refactor(benchmark): update worst statefule scenario

* refactor(benchmark): update modexp cases by parameterization

* refactor: remove blockchain filler in worst block cases

* refactor: update blockhash case

* fix: resolve failing blob and block has tests

* fix: update linting comment length

* fix linting and typing issue

* fix incorrect logic for missing coverage

* refactor unop worst case

* refactor(tests/benchmark): Optimize generators usages

* refactor(benchmark): Improve max_iterations calculation

* feat(benchmark): add setup_blocks and update tests

* fix(specs): Make sure setup blocks are always used

---------

Co-authored-by: Mario Vega <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants