Skip to content

feat(benchmark,fill): generate all gas value benchmarks in one fill execution #1891

@danceratopz

Description

@danceratopz

The current status of benchmark fixture releases should be improved to help use EEST benchmarking test fixtures with Nethermind's benchmarking infra.

More background from Kamil:

For a "proper" benchmarking what we need to do is to have a solid stable SINGLE genesis file on which we can execute various-sized scenarios and grasp the data from them.

So if we have 5 different sized tests - they all should be executable on single genesis file so we can execute all of them one after another, get the data and analyze.

Current approach would require us to stop node multiple times and restart which makes it difficult from a warmup reasons.

Status quo

  1. To generate benchmark tests at a specific gas value/limit fill is called with the --block-gas-limit flag specifying the limit.
  2. fill must be executed once per gas value to be benchmarked, with different values to --block-gas-limit.
  3. As a consequence of 2., a benchmark release consists of multiple tarballs, one per gas value, e.g. [email protected].
  4. Each tarball in the release corresponds requires a different client genesis configuration as the gasLimit specified via --block-gas-limit is set as the default gas limit. This value lands in the block header, of course, this also influences the block hash.

Goal

  1. Tests for all gas values get created in one fill execution, as a result, a benchmark release will now be contained in a single tarball.
  2. All tests, even for different gas benchmark values, from one pre-alloc group can use the same genesis configuration.

Approach

  1. Add a new command-line flag for fill: Suggestion --gas-benchmark-values. This takes a comma-separated list of the gas benchmark values.
  2. The default value of the Environment's gas is set to a large enough value that it is irrelevant for the benchmarking: This will ensure that the genesis configuration is the same for all tests generated from the single fill execution.
  3. Parametrize the tests with the values from --gas-benchmark-values. This should be done via fills pytest_generate_tests() (parameter name suggestion: gas_benchmark_value, but only if the flag from 1. has been set, otherwise we can take the default value from the flag? (TBD). We need to be careful with non-benchmark tests, if a test is not a benchmark test it should be unaffected.(and this should not show up in its test id). See example test IDs below.
    It is possible to only parametrize a test if it uses a certain pytest fixture (e.g. benchmark_gas_value). The fixture might need to be explicitly defined in filler.py (not sure off the top of my head). Then pytest_generate_tests() would parametrize the fixture with these values and there indirectly the tests that use this fixture).
    def pytest_generate_tests(metafunc: pytest.Metafunc):
    """
    Pytest hook used to dynamically generate test cases for each fixture format a given
    test spec supports.
  4. Test functions that are included in a benchmark release must take this parametrized value from 3. as a function argument (they then "use this benchmark_gas_value pytest fixture). These tests must then generate the bytecode / payloads according to this value, not the environment gas limit default.

Example test ids

The benchmark test with the original id:

  • tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test]
    would now generate the following tests (only examples for the gas values):
  • tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_45M]
  • tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_60M]
  • tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_80M]
  • tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_100M]
  • tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_150M]

A non-benchmark test ID would ideally remain unchanged:

  • tests/istanbul/eip1344_chainid/test_chainid.py::test_chainid[fork_Istanbul-state_test]

Afterwards, we can collapse these into one configuration:

benchmark_1M:
evm-type: benchmark
fill-params: --from=Cancun --until=Prague --block-gas-limit 1000000 -m benchmark ./tests
solc: 0.8.21
feature_only: true
benchmark_10M:
evm-type: benchmark
fill-params: --from=Cancun --until=Prague --block-gas-limit 10000000 -m benchmark ./tests
solc: 0.8.21
feature_only: true
benchmark_30M:
evm-type: benchmark
fill-params: --from=Cancun --until=Prague --block-gas-limit 30000000 -m benchmark ./tests
solc: 0.8.21
feature_only: true
benchmark_60M:
evm-type: benchmark
fill-params: --from=Cancun --until=Prague --block-gas-limit 60000000 -m benchmark ./tests
solc: 0.8.21
feature_only: true
benchmark_90M:
evm-type: benchmark
fill-params: --from=Cancun --until=Prague --block-gas-limit 90000000 -m benchmark ./tests
solc: 0.8.21
feature_only: true
benchmark_120M:
evm-type: benchmark
fill-params: --from=Cancun --until=Prague --block-gas-limit 120000000 -m benchmark ./tests
solc: 0.8.21
feature_only: true

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions