Skip to content

perf(precompiles): characterize BLS12-381 G1 MSM hotspots #11802

@peter941221

Description

@peter941221

Background

Nethermind already has a dedicated accelerated path for the Prague-era BLS12-381 G1 MSM precompile.

Relevant code:
src/Nethermind/Nethermind.Evm.Precompiles/zkevm/Bls12381G1MsmPrecompile.cs:22-24
chooses Mul(...) for a single item and Msm(...) for multiple items.

src/Nethermind/Nethermind.Evm.Precompiles/zkevm/Bls12381G1MsmPrecompile.cs:61
calls Accelerators.Bls12381G1Msm(decoded, (nuint)pairCount, output).

src/Nethermind/Nethermind.Precompiles.Benchmark/Bls12381G1MsmBenchmark.cs:9-13
already provides a dedicated benchmark entrypoint for this precompile.

Problem

The current accelerated path does not make it obvious where end-to-end time is actually spent.

The Msm(...) path has three distinct stages:

  1. input decode and layout rewrite into the trimmed internal representation
  2. Accelerators.Bls12381G1Msm(...)
  3. output re-encoding into the EVM return shape

Relevant code:
src/Nethermind/Nethermind.Evm.Precompiles/zkevm/Bls12381G1MsmPrecompile.cs:54-63
src/Nethermind/Nethermind.Evm.Precompiles/zkevm/Bls12381G1MsmPrecompile.cs:84-103
src/Nethermind/Nethermind.Evm.Precompiles/zkevm/Bls12381G1MsmPrecompile.cs:107-115
src/Nethermind/Nethermind.Evm.Precompiles/Eip2537.zkevm.cs:61-69
src/Nethermind/Nethermind.Evm.Precompiles/Eip2537.zkevm.cs:14-18

Without stage-level attribution it is hard to tell whether the next optimization should target:

  • decode and buffer preparation
  • the accelerator boundary itself
  • some batch-size threshold effect between the two

Why this matters

src/Nethermind/Nethermind.Evm.Test/Eip2537Tests.cs:87-98
verifies the G1 MSM precompile is enabled after Prague.

src/Nethermind/Nethermind.Evm.Test/Bls12381G1MsmPrecompileTests.cs:10-19
shows there is already dedicated vector-based coverage for this path.

There is also precedent for performance work in this area:

  • 76801d5915 optimisations and cleanup concurrent g1 msm
  • 5e87830335 start implementing concurrent decoding for msm
  • 9b16f46e01 finish concurrent msm decoding

Desired outcome

Extend the existing benchmark coverage around Bls12381G1MsmBenchmark so that we can measure:

  • full precompile runtime
  • decode and layout cost
  • accelerator compute cost
  • encode cost

The useful result here would be a repeatable breakdown across representative pairCount sizes, so the next optimization can target the actual hotspot instead of guessing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions