feat(benchmark,fill): generate all gas value benchmarks in one `fill` execution

The current status of benchmark fixture releases should be improved to help use EEST benchmarking test fixtures with Nethermind's benchmarking infra.

More background from Kamil:

> For a "proper" benchmarking what we need to do is to have a solid stable SINGLE genesis file on which we can execute various-sized scenarios and grasp the data from them.
> 
> So if we have 5 different sized tests - they all should be executable on single genesis file so we can execute all of them one after another, get the data and analyze.
> 
> Current approach would require us to stop node multiple times and restart which makes it difficult from a warmup reasons.

## Status quo

1. To generate benchmark tests at a specific gas value/limit `fill` is called with the `--block-gas-limit` flag specifying the limit.
2. `fill` must be executed once per gas value to be benchmarked, with different values to `--block-gas-limit`.
3. As a consequence of 2., a benchmark release consists of multiple tarballs, one per gas value, e.g. [benchmark@v0.0.2](https://github.com/ethereum/execution-spec-tests/releases/tag/benchmark%40v0.0.2).
4. Each tarball in the release corresponds requires a different client genesis configuration as the `gasLimit` specified via `--block-gas-limit` is set as the default gas limit. This value lands in the block header, of course, this also influences the block `hash`.

## Goal

1. Tests for all gas values get created in one `fill` execution, as a result, a benchmark release will now be contained in a single tarball.
2. All tests, even for different gas benchmark values, from one pre-alloc group can use the same genesis configuration.

## Approach

1. Add a new command-line flag for `fill`: Suggestion --gas-benchmark-values. This takes a comma-separated list of the gas benchmark values.
2. The default value of the `Environment`'s gas is set to a large enough value that it is irrelevant for the benchmarking: This will ensure that the genesis configuration is the same for all tests generated from the single `fill` execution.
3. Parametrize the tests with the values from `--gas-benchmark-values`. This should be done via `fill`s `pytest_generate_tests()` (parameter name suggestion: `gas_benchmark_value`, but only if the flag from 1. has been set, otherwise we can take the default value from the flag? (TBD). We need to be careful with non-benchmark tests, if a test is not a benchmark test it should be unaffected.(and this should not show up in its test id). See example test IDs below. 
    It is possible to only parametrize a test if it uses a certain pytest fixture (e.g. `benchmark_gas_value`). The fixture might need to be explicitly defined in `filler.py` (not sure off the top of my head). Then `pytest_generate_tests()` would parametrize the fixture with these values and there indirectly the tests that use this fixture).
    https://github.com/ethereum/execution-spec-tests/blob/5878612b86c055c883917447b2af79dd3eb16584/src/pytest_plugins/filler/filler.py#L1000-L1003
4. Test functions that are included in a benchmark release must take this parametrized value from 3. as a function argument (they then "use this `benchmark_gas_value` pytest fixture). These tests must then generate the bytecode / payloads according to this value, not the environment gas limit default.

#### Example test ids

The benchmark test with the original id:
- `tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test]`
would now generate the following tests (only examples for the gas values):
- `tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_45M]`
- `tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_60M]`
- `tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_80M]`
- `tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_100M]`
- `tests/benchmark/test_worst_stateful_opcodes.py::test_worst_selfbalance[fork_Cancun-state_test-benchmark-gas-value_150M]`

A non-benchmark test ID would ideally remain unchanged:
- tests/istanbul/eip1344_chainid/test_chainid.py::test_chainid[fork_Istanbul-state_test]

Afterwards, we can collapse these into one configuration:
https://github.com/ethereum/execution-spec-tests/blob/5878612b86c055c883917447b2af79dd3eb16584/.github/configs/feature.yaml#L14-L43

	def pytest_generate_tests(metafunc: pytest.Metafunc):
	"""
	Pytest hook used to dynamically generate test cases for each fixture format a given
	test spec supports.

	benchmark_1M:
	evm-type: benchmark
	fill-params: --from=Cancun --until=Prague --block-gas-limit 1000000 -m benchmark ./tests
	solc: 0.8.21
	feature_only: true
	benchmark_10M:
	evm-type: benchmark
	fill-params: --from=Cancun --until=Prague --block-gas-limit 10000000 -m benchmark ./tests
	solc: 0.8.21
	feature_only: true
	benchmark_30M:
	evm-type: benchmark
	fill-params: --from=Cancun --until=Prague --block-gas-limit 30000000 -m benchmark ./tests
	solc: 0.8.21
	feature_only: true
	benchmark_60M:
	evm-type: benchmark
	fill-params: --from=Cancun --until=Prague --block-gas-limit 60000000 -m benchmark ./tests
	solc: 0.8.21
	feature_only: true
	benchmark_90M:
	evm-type: benchmark
	fill-params: --from=Cancun --until=Prague --block-gas-limit 90000000 -m benchmark ./tests
	solc: 0.8.21
	feature_only: true
	benchmark_120M:
	evm-type: benchmark
	fill-params: --from=Cancun --until=Prague --block-gas-limit 120000000 -m benchmark ./tests
	solc: 0.8.21
	feature_only: true

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(benchmark,fill): generate all gas value benchmarks in one `fill` execution #1891

Status quo

Goal

Approach

Example test ids

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(benchmark,fill): generate all gas value benchmarks in one fill execution #1891

Description

Status quo

Goal

Approach

Example test ids

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

feat(benchmark,fill): generate all gas value benchmarks in one `fill` execution #1891