Skip to content

perf: optimize scalar multiplications and multi-scalar multiplications circuits via lattice reductions#1697

Open
yelhousni wants to merge 50 commits intomasterfrom
perf/ec-mul
Open

perf: optimize scalar multiplications and multi-scalar multiplications circuits via lattice reductions#1697
yelhousni wants to merge 50 commits intomasterfrom
perf/ec-mul

Conversation

@yelhousni
Copy link
Contributor

@yelhousni yelhousni commented Jan 31, 2026

Description

This PR migrates gnark's scalar decomposition hints from the eisenstein package to the new lattice package in Consensys/gnark-crypto#799, following the lattice-based rational reconstruction approach from "Fast elliptic curve scalar multiplications in SN(T)ARK circuits" by Eagen-ElHousni-Masson-Piellard (https://eprint.iacr.org/2025/933.pdf).

The new approach provides proven bounds from LLL lattice reduction theory, replacing heuristic bounds. This allows tighter bit-width bounds for the decomposed scalars, reducing circuit constraints.

The PR also revisits the complete arithmetic path to make it more constraint-optimized.

Changes

Hint Renames

  • halfGCDrationalReconstruct (2-part decomposition using lattice.RationalReconstruct)
  • halfGCDEisensteinrationalReconstructExt (4-part decomposition using lattice.RationalReconstructExt)

Tighter Bounds

The number of bits for decomposed scalars has been reduced:

  • Old: r.BitLen()/4 + 9 (heuristic with large safety margin)
  • New: (r.BitLen()+3)/4 + 2 (proven bound from LLL: outputs < 1.25·r^(1/4))

This saves ~7 iterations in the scalar multiplication loop.

Affected Packages

  • std/algebra/emulated/sw_emulated - Emulated short Weierstrass curves G1
  • std/algebra/emulated/sw_bls12381, std/algebra/emulated/sw_bn254 and std/algebra/emulated/sw_bw6761 - emulated G2
  • std/algebra/native/sw_bls12377 - Native BLS12-377 G1 and G2
  • std/algebra/native/twistededwards - Native twisted Edwards curves

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization

How has this been tested?

All existing tests pass:

go test -short ./std/algebra/emulated/sw_emulated/...
go test -short ./std/algebra/native/sw_bls12377/...
go test -short ./std/algebra/native/twistededwards/...

How has this been benchmarked?

Constraint Counts (Plonk/SCS)

G1 scalar multiplication:

Curve/G1 Old (eisenstein) New (lattice) Δ Improvement
emulated
secp256k1 394,959 364,495 -30,464 7.7%
BN254 390,317 364,205 -26,112 6.7%
BLS12-381 550,299 512,235 -38,064 6.9%
BW6-761 1,376,789 1,317,113 -59,676 4.3%

G2 scalar multiplication:

Curve/G2 Old (2D GLV) New (4D-lattice GLV+FakeGLV) Δ Improvement
emulated
BN254 599,779 411,854 -187,925 31.3%
BLS12-381 913,513 584,794 -328,719 36.0%
BW6-761 1,090,960 728,286 -362,674 33.2%

G1 MSM of size 2 :

Curve GLV Method Old New Δ Improvement
emulated (short Weierstrass)
P256 No 2 scalar muls + add 523,062 277,544 -245,518 46.9%
native (twisted Edwards)
BabyJubjub (BN254) No lattice 3-MSM with LogUp 9,956 6,785 -3,171 32%
Jubjub (BLS12-381) No lattice 3-MSM with LogUp 9,930 6,793 -3,137 32%
Bandersnatch (BLS12-381) Yes lattice 6-MSM with LogUp 10,185 6,820 -3,365 33%

Applications:

Precompile Old New Δ Improvement
P256Verify 666,146 533,201 -132,945 20%
BLSG2MSM (10 pairs) 9,304,517 7,135,929 -2,168,588 23.3%
ECMul (BN254) 210,369 195,663 -14,706 7.0%
BLSG1MSM (10 pairs) 4,397,157 4,243,497 -153,660 3.5%
KZGPointEval 2,928,188 2,897,456 -30,732 1.0%
PLONK recursion Old New Δ Improvement
Emulated (BW6-761 in BN254) 15,042,004 14,713,771 -328,233 2.2%
EdDSA GLV Old New Δ Improvement
Jubjub (BLS12-381) No 13,570 10,680 -2,890 21%
Bandersnatch (BLS12-381) Yes 13,835 10,706 -3,129 23%

Discussion

1. Hint Computation Time

2-part decomposition

Method Time Notes
Old (HalfGCD) 9.5 μs xgcd (PrecomputeLattice)
New (uncached) 5.0 μs xgcd
New (cached) 3.8 μs xgcd (Cached reconstructor)

4-part decomposition (RationalReconstructExt) - GLV curves

Method Time Notes
Old (HalfGCDEisenstein) 43 μs Eisenstein HalfGCD
New (uncached) 43 ms LLL from scratch
New (cached) 43 ms LLL (Caching doesn't help here)

The new approach is slower for hint computation (4D) because it runs LLL reduction from scratch rather than using 2-step Eisenstein half-GCD. However, hint computation happens outside the prover and is negligible compared to proof generation time. The constraint reduction provides a net benefit.

3-part decomposition (MultiRationalReconstruct) - 2 scalars

Method Time Notes
New (uncached) 1.7 ms LLL from scratch
New (cached) 0.6 μs LLL (Huge speedup with caching)

6-part decomposition (MultiRationalReconstructExt) - 2 scalars

Method Time Notes
New (uncached) 563 ms LLL from scratch
New (cached) 500 ms Minimal improvement

2. logup vs Mux

For G1 we can do a 4-MSM. For G2 we can leverage the Frobenius as a second endomorphism, we can apply it to all and get a 8-MSM or to half and get a 6-MSM. But with big tables Mux becomes the bottlneck, so we can try with logup.

  • G1 BLS12-381:

    Method Mux Logup Savings
    G1 4D GLV+FakeGLV 272,520 340,273 +24.9% worse
  • G2 BLS12-381:

    Method Mux Logup Savings
    G2 4D GLV+FakeGLV 584,794 741,975 +26.9% worse
    G2 6D GLV+GLS+FakeGLV 1,645,337 1,512,349 -8.1% better
    G2 8D GLV+GLS+FakeGLV 6,239,417 5,786,450 -7.3% better

The 4D GLV+FakeGLV method with Mux remains optimal for single scalar multiplication on both G1 and G2. Higher-dimensional methods (6D, 8D) using the ψ endomorphism don't reduce constraints because the Mux/logup overhead outweighs the benefits of fewer loop iterations, even with logup optimization.

3. MSM

According to [EEMP25], we can turn a MSM(2,n) verification (i.e. a MSM of size 2 with scalars of n bits) into a MSM(3,2n/3) or MSM(6,n/3) verification. We implemented this for the native (SW and tEd) and emulated (SW) cases with Mux and logup (for native). For all the scenario existing algorithms were better except for:

  • native non-GLV tEd MSM(3,2n/3) with LogUp
  • emulated non-GLV SW MSM(3,2n/3) with Mux
  • native GLV tEd MSM(6,n/3) with Mux(Bandersnatch).

Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • golangci-lint does not output errors locally
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules (gnark-crypto lattice package)

Note

High Risk
High-risk because it rewrites core scalar multiplication/MSM verification logic and hint decomposition across multiple curves (GLV and non-GLV), where subtle arithmetic/edge-case mistakes can break proof correctness despite passing tests.

Overview
Replaces the scalar-decomposition hint machinery used by emulated and native elliptic-curve circuits from eisenstein/half-GCD style routines to lattice-based rational reconstruction (lattice.RationalReconstruct/RationalReconstructExt), and tightens sub-scalar bit bounds to reduce loop iterations and constraints.

Updates G2 scalar multiplication for emulated bls12-381, bn254, and bw6-761 to a new GLV+fakeGLV verification flow using 4-way decomposition (u1,u2,v1,v2), adds precomputed generator bias points (g2Gen, g2GenNbits) to avoid incomplete additions, introduces new hints (scalarMulG2Hint, rationalReconstructExtG2), and extends tests to cover complete-arithmetic edge cases.

Refactors emulated short-Weierstrass MSM paths: jointScalarMulFakeGLV now prefers two ScalarMul calls plus Add/AddUnified, and scalarMulFakeGLV/scalarMulGLVAndFakeGLV adopt the new rational-reconstruction hints, add denominator non-zero checks, adjust complete-arithmetic edge-case handling (including ±1/±3 collisions), and add extensive new edge-case tests.

Enhances native circuits: BLS12-377 G1 joint scalar multiplication gains a complete-arithmetic path backed by a new jointScalarMulG1Hint; BLS12-377 G2 adds a new scalarMulGLVAndFakeGLV implementation and hint (scalarMulGLVG2Hint); twisted Edwards DoubleBaseScalarMul now selects between 6-MSM (with endomorphism) and 3-MSM implementations and adds a constraint-count benchmark. Also updates UnsatisfiedConstraintError to support Go error unwrapping via Unwrap(), and refreshes internal/stats/latest_stats.csv with new measurements.

Written by Cursor Bugbot for commit bbc3000. This will update automatically on new commits. Configure here.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates gnark's scalar decomposition hints from the eisenstein package to the new lattice package in gnark-crypto, implementing lattice-based rational reconstruction following the approach from "Fast elliptic curve scalar multiplications in SN(T)ARK circuits" (EEMPE 2025). The new approach provides proven bounds from LLL lattice reduction theory instead of heuristic bounds, enabling tighter bit-width bounds for decomposed scalars.

Changes:

  • Renamed hint functions: halfGCDrationalReconstruct and halfGCDEisensteinrationalReconstructExt
  • Reduced bit bounds from r.BitLen()/4 + 9 to (r.BitLen()+3)/4 + 2, saving ~7 iterations in scalar multiplication loops
  • Updated imports to use github.com/consensys/gnark-crypto/algebra/lattice instead of the eisenstein package

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
std/algebra/native/twistededwards/hints.go Reimplemented rationalReconstruct hint using lattice.RationalReconstruct with proper sign handling and overflow computation
std/algebra/native/twistededwards/point.go Updated hint call from halfGCD to rationalReconstruct
std/algebra/native/twistededwards/curve_test.go Added benchmark for constraint counting
std/algebra/native/sw_bls12377/hints.go Reimplemented rationalReconstructExt using lattice.RationalReconstructExt for 4-part decomposition
std/algebra/native/sw_bls12377/g1.go Updated hint call, bounds calculation, and comments to reflect new LLL-proven bounds
std/algebra/native/sw_bls12377/g1_test.go Added benchmark for constraint counting
std/algebra/emulated/sw_emulated/hints.go Reimplemented both rationalReconstruct and rationalReconstructExt for emulated field arithmetic
std/algebra/emulated/sw_emulated/point.go Updated hint calls, bounds calculation, and comments
std/algebra/emulated/sw_emulated/point_test.go Added benchmarks for multiple curve configurations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yelhousni yelhousni self-assigned this Feb 2, 2026
@yelhousni yelhousni added dep: linea Issues affecting Linea downstream type: perf feat: ECC labels Feb 2, 2026
@yelhousni yelhousni added this to the v0.14.N milestone Feb 2, 2026
@yelhousni yelhousni changed the title perf: use lattice reduction instead of eisenstein gcd for tighter bounds perf: optimize scalar multiplications and multi-scalar multiplications Feb 5, 2026
@yelhousni yelhousni changed the title perf: optimize scalar multiplications and multi-scalar multiplications perf: optimize scalar multiplications and multi-scalar multiplications circuits Feb 5, 2026
@yelhousni yelhousni changed the title perf: optimize scalar multiplications and multi-scalar multiplications circuits perf: optimize scalar multiplications and multi-scalar multiplications circuits via lattice reductions Feb 8, 2026
Copy link
Collaborator

@ivokub ivokub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick pass but imo we have soundness issues right now as we don't check the scalar decomposition and trust the hinted joint scalarmul result in case of edge cases.

Also, a few tests are failing, possibly because we hit the edge cases already and the hint doesn't cover it?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

std/algebra/native/twistededwards/point.go:213

  • scalarMulFakeGLV soundness: the current checks allow the hint to return s1=0, s2=0, k=0 (which satisfies the modular equation and makes the MSM accumulator trivially (0,1)), leaving the hinted point q unconstrained. This lets a malicious prover pick an arbitrary q while still satisfying constraints. Consider adding a non-triviality constraint (e.g., enforce s2 != 0 when scalar != 0, and separately constrain the scalar==0 case to return the identity (0,1)).
	// the hints allow to decompose the scalar s into s1 and s2 such that
	// s1 + s * s2 == 0 mod Order,
	s, err := api.NewHint(rationalReconstruct, 4, scalar, curve.Order)
	if err != nil {
		// err is non-nil only for invalid number of inputs
		panic(err)
	}
	s1, s2, bit, k := s[0], s[1], s[2], s[3]

	// check that s1 + s2 * s == k*Order
	_s2 := api.Mul(s2, scalar)
	_k := api.Mul(k, curve.Order)
	lhs := api.Select(bit, s1, api.Add(s1, _s2))
	rhs := api.Select(bit, api.Add(_k, _s2), _k)
	api.AssertIsEqual(lhs, rhs)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ivokub
Copy link
Collaborator

ivokub commented Mar 16, 2026

Hmm, something strange is happening right now -- testing with -short passes on twistededwards package, but not with -tags prover_checks (using actual solver).

And I think the issue is that we're using DivUnchecked(0,0) in

func (p *Point) phi(api frontend.API, p1 *Point, curve *CurveParams, endo *EndoParams) *Point {

	xy := api.Mul(p1.X, p1.Y)
	yy := api.Mul(p1.Y, p1.Y)
	f := api.Sub(1, yy)
	f = api.Mul(f, endo.Endo[1])
	g := api.Add(yy, endo.Endo[0])
	g = api.Mul(g, endo.Endo[0])
	h := api.Sub(yy, endo.Endo[0])

	p.X = api.DivUnchecked(f, xy) // <---- here
	p.Y = api.DivUnchecked(g, h)

	return p
}

which is unconstrained by the frontend.API:

	// DivUnchecked returns i1 / i2
	// If i1 == i2 == 0, the return value (0) is unconstrained.
	DivUnchecked(i1, i2 Variable) Variable

Here test engine silently returns 0 and R1CS solver as well (it could return anything though), but PLONK solver explicitly fails here.

So I think there is still the issue that the twistededwards path doesn't handle edge cases GLV in twistededwards.

Additionally, imo in another PR we should make test engine more strict to panic explicitly in case we have DivUnchecked(0,0) to avoid having unconstrained circuits during development time.

@ivokub
Copy link
Collaborator

ivokub commented Mar 16, 2026

Hmm, something strange is happening right now -- testing with -short passes on twistededwards package, but not with -tags prover_checks (using actual solver).

And I think the issue is that we're using DivUnchecked(0,0) in

func (p *Point) phi(api frontend.API, p1 *Point, curve *CurveParams, endo *EndoParams) *Point {

	xy := api.Mul(p1.X, p1.Y)
	yy := api.Mul(p1.Y, p1.Y)
	f := api.Sub(1, yy)
	f = api.Mul(f, endo.Endo[1])
	g := api.Add(yy, endo.Endo[0])
	g = api.Mul(g, endo.Endo[0])
	h := api.Sub(yy, endo.Endo[0])

	p.X = api.DivUnchecked(f, xy) // <---- here
	p.Y = api.DivUnchecked(g, h)

	return p
}

which is unconstrained by the frontend.API:

	// DivUnchecked returns i1 / i2
	// If i1 == i2 == 0, the return value (0) is unconstrained.
	DivUnchecked(i1, i2 Variable) Variable

Here test engine silently returns 0 and R1CS solver as well (it could return anything though), but PLONK solver explicitly fails here.

So I think there is still the issue that the twistededwards path doesn't handle edge cases GLV in twistededwards.

Additionally, imo in another PR we should make test engine more strict to panic explicitly in case we have DivUnchecked(0,0) to avoid having unconstrained circuits during development time.

Made test engine stricter in #1734. It is merged now and could merge master into this branch for helping to debug.

@yelhousni
Copy link
Contributor Author

yelhousni commented Mar 16, 2026

Thanks for the stricter test engine in #1734 — merged master and it immediately surfaced the issue.

The problem was in phi: when the input is the identity (0,1), both f and xy are 0, leading to DivUnchecked(0,0). Fixed by selecting xy=1 when p1.X=0 (since f=0 too, 0/1=0 gives the correct X coordinate). Tests pass now. Mathematically speaking phi is defined over the prime subgroup, which the identity (0,1) belongs to -- but we need to explicitly handle in-circuit (0,1)-->(0,1) under phi.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Copy link
Collaborator

@ivokub ivokub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the emulated cases, still reviewing 2-chains. I'm posting my comments for now. I'm not confident that the changes are correct, particularly we seem to trust the hinted scalar mul result before we constrain it. And later we also may switch back to hinted result without constraining for particular edge cases (scalar=0 for example).

@@ -24,6 +25,8 @@ func GetHints() []solver.Hint {
pairingCheckHint,
millerLoopAndCheckFinalExpHint,
decomposeScalarG1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hint not used anymore?

u1, u2, v1, v2 := sd[0], sd[1], sd[2], sd[3]
isNegu1, isNegu2, isNegv1, isNegv2 := signs[0], signs[1], signs[2], signs[3]

// Check that: s*(v1 + λ*v2) + u1 + λ*u2 = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use g2.fr.Eval perhaps? But can keep as is for brevity.

// if R.X == Q.X (happens when s=±1, so R=±Q), the incomplete addition fails
// We check this BEFORE potentially modifying R
_selector1 = g2.Ext2.IsZero(g2.Ext2.Sub(&Q.P.X, &R.P.X))
// if s=0/s=-1 (selector0), Q=(0,0) (_selector0), or R.X==Q.X (_selector1),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The selectorAny is correct, but comment imo is incorrect -- selector0 handles case s=0 and _selector1 case where s=+-1. But I think we can comment keep as is.

}

// TestScalarMulG2EdgeCases tests edge cases: s=0, s=1, s=-1, Q=(0,0)
func TestScalarMulG2EdgeCases(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't test for Q=(0,0)

}

// TestScalarMulG2EdgeCases tests edge cases: s=0, s=1, s=-1, Q=(0,0)
func TestScalarMulG2EdgeCases(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't test for Q=(0,0)

}

// TestScalarMulG2EdgeCases tests edge cases: s=0, s=1, s=-1, Q=(0,0)
func TestScalarMulG2EdgeCases(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't test for Q=(0,0)

_Q = g2.Select(_selector0, &G2Affine{P: g2AffP{X: *one, Y: *one}}, Q)
// if R.X == Q.X (happens when s=±1, so R=±Q), the incomplete addition fails
// We check this BEFORE potentially modifying R
_selector1 = g2.Ext2.IsZero(g2.Ext2.Sub(&Q.P.X, &R.P.X))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sound here? We choose _selector1 based on hinted R? Later on line 663 we switch to R back when _selector1 is 1, but then we always trust the hinted (unconstrained) result.

_Q = g2.Select(_selector0, &G2Affine{P: g2AffP{X: *one, Y: *one}}, Q)
// if R.X == Q.X (happens when s=±1, so R=±Q), the incomplete addition fails
// We check this BEFORE potentially modifying R
_selector1 = g2.Ext2.IsZero(g2.Ext2.Sub(&Q.P.X, &R.P.X))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as for BN254, we set _selector1 based on hinted result and later on line 810 switch back to it in case we have s=0 or Q=(0,0)

_Q = g2.Select(_selector0, &G2Affine{P: g2AffP{X: *one, Y: *one}}, Q)
// if R.X == Q.X (happens when s=±1, so R=±Q), the incomplete addition fails
// We check this BEFORE potentially modifying R
_selector1 = g2.curveF.IsZero(g2.curveF.Sub(&Q.P.X, &R.P.X))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue with hinted R

// for complete arithmetic
var selector1, selector2, selector3 frontend.Variable
_Q := Q
if cfg.CompleteArithmetic {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - I remember for some reason the change from c.Add to switched c.Add/c.AddUnified was important and it allowed to avoid some edge cases.

Here again we seem to rely on the hinted result R for selecting and dummy point addition but imo it was not sound. I cannot pinpoint right now the soundness issues, but I have a strange feeling. I'll need to think about it a bit more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dep: linea Issues affecting Linea downstream feat: ECC type: perf

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants