-
Notifications
You must be signed in to change notification settings - Fork 185
Description
The current status of benchmark fixture releases should be improved to help use EEST benchmarking test fixtures with Nethermind's benchmarking infra.
More background from Kamil:
For a "proper" benchmarking what we need to do is to have a solid stable SINGLE genesis file on which we can execute various-sized scenarios and grasp the data from them.
So if we have 5 different sized tests - they all should be executable on single genesis file so we can execute all of them one after another, get the data and analyze.
Current approach would require us to stop node multiple times and restart which makes it difficult from a warmup reasons.
Status quo
- To generate benchmark tests at a specific gas value/limit
fillis called with the--block-gas-limitflag specifying the limit. fillmust be executed once per gas value to be benchmarked, with different values to--block-gas-limit.- As a consequence of 2., a benchmark release consists of multiple tarballs, one per gas value, e.g. [email protected].
- Each tarball in the release corresponds requires a different client genesis configuration as the
gasLimitspecified via--block-gas-limitis set as the default gas limit. This value lands in the block header, of course, this also influences the blockhash.
Goal
- Tests for all gas values get created in one
fillexecution, as a result, a benchmark release will now be contained in a single tarball. - All tests, even for different gas benchmark values, from one pre-alloc group can use the same genesis configuration.
Approach
- Add a new command-line flag for
fill: Suggestion --gas-benchmark-values. This takes a comma-separated list of the gas benchmark values. - The default value of the
Environment's gas is set to a large enough value that it is irrelevant for the benchmarking: This will ensure that the genesis configuration is the same for all tests generated from the singlefillexecution. - Parametrize the tests with the values from
--gas-benchmark-values. This should be done viafillspytest_generate_tests()(parameter name suggestion:gas_benchmark_value, but only if the flag from 1. has been set, otherwise we can take the default value from the flag? (TBD). We need to be careful with non-benchmark tests, if a test is not a benchmark test it should be unaffected.(and this should not show up in its test id). See example test IDs below.
It is possible to only parametrize a test if it uses a certain pytest fixture (e.g.benchmark_gas_value). The fixture might need to be explicitly defined infiller.py(not sure off the top of my head). Thenpytest_generate_tests()would parametrize the fixture with these values and there indirectly the tests that use this fixture).
execution-spec-tests/src/pytest_plugins/filler/filler.py
Lines 1000 to 1003 in 5878612
def pytest_generate_tests(metafunc: pytest.Metafunc): """ Pytest hook used to dynamically generate test cases for each fixture format a given test spec supports. - Test functions that are included in a benchmark release must take this parametrized value from 3. as a function argument (they then "use this
benchmark_gas_valuepytest fixture). These tests must then generate the bytecode / payloads according to this value, not the environment gas limit default.
Example test ids
The benchmark test with the original id:
tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test]
would now generate the following tests (only examples for the gas values):tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_45M]tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_60M]tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_80M]tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_100M]tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_150M]
A non-benchmark test ID would ideally remain unchanged:
- tests/istanbul/eip1344_chainid/test_chainid.py::test_chainid[fork_Istanbul-state_test]
Afterwards, we can collapse these into one configuration:
execution-spec-tests/.github/configs/feature.yaml
Lines 14 to 43 in 5878612
| benchmark_1M: | |
| evm-type: benchmark | |
| fill-params: --from=Cancun --until=Prague --block-gas-limit 1000000 -m benchmark ./tests | |
| solc: 0.8.21 | |
| feature_only: true | |
| benchmark_10M: | |
| evm-type: benchmark | |
| fill-params: --from=Cancun --until=Prague --block-gas-limit 10000000 -m benchmark ./tests | |
| solc: 0.8.21 | |
| feature_only: true | |
| benchmark_30M: | |
| evm-type: benchmark | |
| fill-params: --from=Cancun --until=Prague --block-gas-limit 30000000 -m benchmark ./tests | |
| solc: 0.8.21 | |
| feature_only: true | |
| benchmark_60M: | |
| evm-type: benchmark | |
| fill-params: --from=Cancun --until=Prague --block-gas-limit 60000000 -m benchmark ./tests | |
| solc: 0.8.21 | |
| feature_only: true | |
| benchmark_90M: | |
| evm-type: benchmark | |
| fill-params: --from=Cancun --until=Prague --block-gas-limit 90000000 -m benchmark ./tests | |
| solc: 0.8.21 | |
| feature_only: true | |
| benchmark_120M: | |
| evm-type: benchmark | |
| fill-params: --from=Cancun --until=Prague --block-gas-limit 120000000 -m benchmark ./tests | |
| solc: 0.8.21 | |
| feature_only: true |