perf(precompiles): characterize BLS12-381 G1 MSM hotspots

Background

Nethermind already has a dedicated accelerated path for the Prague-era BLS12-381 G1 MSM precompile.

Relevant code:
`src/Nethermind/Nethermind.Evm.Precompiles/zkevm/Bls12381G1MsmPrecompile.cs:22-24`
chooses `Mul(...)` for a single item and `Msm(...)` for multiple items.

`src/Nethermind/Nethermind.Evm.Precompiles/zkevm/Bls12381G1MsmPrecompile.cs:61`
calls `Accelerators.Bls12381G1Msm(decoded, (nuint)pairCount, output)`.

`src/Nethermind/Nethermind.Precompiles.Benchmark/Bls12381G1MsmBenchmark.cs:9-13`
already provides a dedicated benchmark entrypoint for this precompile.

Problem

The current accelerated path does not make it obvious where end-to-end time is actually spent.

The `Msm(...)` path has three distinct stages:
1. input decode and layout rewrite into the trimmed internal representation
2. `Accelerators.Bls12381G1Msm(...)`
3. output re-encoding into the EVM return shape

Relevant code:
`src/Nethermind/Nethermind.Evm.Precompiles/zkevm/Bls12381G1MsmPrecompile.cs:54-63`
`src/Nethermind/Nethermind.Evm.Precompiles/zkevm/Bls12381G1MsmPrecompile.cs:84-103`
`src/Nethermind/Nethermind.Evm.Precompiles/zkevm/Bls12381G1MsmPrecompile.cs:107-115`
`src/Nethermind/Nethermind.Evm.Precompiles/Eip2537.zkevm.cs:61-69`
`src/Nethermind/Nethermind.Evm.Precompiles/Eip2537.zkevm.cs:14-18`

Without stage-level attribution it is hard to tell whether the next optimization should target:
- decode and buffer preparation
- the accelerator boundary itself
- some batch-size threshold effect between the two

Why this matters

`src/Nethermind/Nethermind.Evm.Test/Eip2537Tests.cs:87-98`
verifies the G1 MSM precompile is enabled after Prague.

`src/Nethermind/Nethermind.Evm.Test/Bls12381G1MsmPrecompileTests.cs:10-19`
shows there is already dedicated vector-based coverage for this path.

There is also precedent for performance work in this area:
- `76801d5915` `optimisations and cleanup concurrent g1 msm`
- `5e87830335` `start implementing concurrent decoding for msm`
- `9b16f46e01` `finish concurrent msm decoding`

Desired outcome

Extend the existing benchmark coverage around `Bls12381G1MsmBenchmark` so that we can measure:
- full precompile runtime
- decode and layout cost
- accelerator compute cost
- encode cost

The useful result here would be a repeatable breakdown across representative `pairCount` sizes, so the next optimization can target the actual hotspot instead of guessing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(precompiles): characterize BLS12-381 G1 MSM hotspots #11802

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

perf(precompiles): characterize BLS12-381 G1 MSM hotspots #11802

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions