Implement KZG proof batch verification (option 2) - uses worker pool #15617

terencechain · 2025-08-21T18:29:30Z

This is an alt to #15609, it changes KZG proof batch verification from timer-based batching to a responsive worker pool model that processes messages immediately when idle and buffers when busy

nalepae

I did not run the code itself.

beacon-chain/sync/batch_verifier.go

beacon-chain/sync/validate_data_column.go

beacon-chain/sync/batch_verifier.go

beacon-chain/sync/service.go

satushh · 2025-09-02T18:50:27Z

beacon-chain/sync/batch_verifier.go

+	verificationSet := &kzgVerifier{dataColumns: dataColumns, resChan: resChan}
+	s.kzgChan <- verificationSet
+
+	resErr := <-resChan


Should this be wrapped in a select statement with ctx.Done()

I think so too, and actually I would suggest using context.WithTimeout here to limit how long this potentially blocks on batch verification. This would simplify the other side of the channel because it would no longer need to write ctx.Err() back to all the channels (which racy anyway and can leave these goroutines wedged).

beacon-chain/sync/batch_verifier.go

nalepae · 2025-09-07T12:13:24Z

beacon-chain/sync/validate_data_column.go

+		return validationResult, err
 	}
+	// Mark KZG verification as satisfied since we did it via batch verifier
+	verifier.SatisfyRequirement(verification.RequireSidecarKzgProofVerified)


Why not using

defer dv.recordResult(RequireSidecarKzgProofVerified, &err)

directly in s.validateWithKzgBatchVerifier as done for all other verification functions?

beacon-chain/sync/kzg_batch_verifier_test.go

Co-authored-by: Manu NALEPA <[email protected]>

kasey · 2025-09-10T01:23:41Z

beacon-chain/sync/batch_verifier.go

+			verifyKzgBatch(kzgBatch)
+		}
+	}
+}


I think you could simplify the pair of methods kzgVerifierRoutine + pullKzgChan into a single method like the following. Also note that I have dropped this kzg.resChan <- s.ctx.Err() loop - I think this is not necessary if the callers use a select and a context.WithTimeout (which they should do anyway for safe concurrency).

// A routine that runs in the background to perform batch // KZG verifications by draining the channel and processing all pending requests. func (s *Service) kzgVerifierRoutine() { kzgBatch := make([]*kzgVerifier, 0, 1) for { select { case <-s.ctx.Done(): return case kzg := <-s.kzgChan: kzgBatch = append(kzgBatch, kzg) continue default: verifyKzgBatch(kzgBatch) kzgBatch = make([]*kzgVerifier, 0, 1) } } }

(edit: fixed a bug in the suggestion)

kasey · 2025-09-10T01:28:16Z

beacon-chain/sync/batch_verifier.go

+	s.kzgChan <- verificationSet
+
+	resErr := <-resChan
+	close(resChan)


You don't really need to close channels, they get cleaned up by the garbage collector. You should only close a channel on the sending side, to unblock any receivers who could be waiting on more values from a sender which isn't going to produce any more. Receiving from a closed channel is fine (instantly unblocks any goroutine trying to receive), but sending to one /causes a panic/. This feedback will be more critical if you take the other piece of feedback to check ctx.Done() above, because in that instance you won't have read from resChan, so the other side could easily send to it after it is closed here and panic.

kasey · 2025-09-10T01:30:15Z

beacon-chain/sync/batch_verifier.go

+		// the individual data columns might still be valid.
+		err := peerdas.VerifyDataColumnsSidecarKZGProofs(dataColumns)
+		if err != nil {
+			verErr := errors.Wrapf(err, "Could not verify")


nitpick: Why a new variable and not just err = errors.Wrapf(err, "Could not verify")?

kasey · 2025-09-10T01:34:27Z

beacon-chain/sync/service.go

 	s.newColumnsVerifier = newDataColumnsVerifierFromInitializer(v)

 	go s.verifierRoutine()
+	go s.kzgVerifierRoutine()


Should we have more than 1 goroutine doing this?

Before this commit: After 100 ms, an un-batched verification is launched concurrently to the batched one. As a result, a stressed node could start to be even more stressed by the multiple verifications. Also, it is always hard to choose a correct timeout value. 100ms may be OK for a given node with a given BPO version, and not ok for the same node with a BPO version with 10x more blobs. However, we know this gossip validation won't be useful after a full slot duration. After this commit: After a full slot duration, we just ignore the incoming gossip message. It's important to ignore it and not to reject it, since rejecting it would downscore the peer sending this message.

…ffchainLabs#15617) * Implement KZG proof batch verification for data column gossip validation * Manu's feedback * Add tests * Update beacon-chain/sync/batch_verifier.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/batch_verifier.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Fix tests * Kasey's feedback * `validateWithKzgBatchVerifier`: Give up after a full slot. Before this commit: After 100 ms, an un-batched verification is launched concurrently to the batched one. As a result, a stressed node could start to be even more stressed by the multiple verifications. Also, it is always hard to choose a correct timeout value. 100ms may be OK for a given node with a given BPO version, and not ok for the same node with a BPO version with 10x more blobs. However, we know this gossip validation won't be useful after a full slot duration. After this commit: After a full slot duration, we just ignore the incoming gossip message. It's important to ignore it and not to reject it, since rejecting it would downscore the peer sending this message. --------- Co-authored-by: Manu NALEPA <[email protected]>

terencechain force-pushed the verify-kzg-proofs1 branch from 54e4f63 to 5f13126 Compare August 21, 2025 18:59

terencechain added the fulu label Aug 22, 2025

nalepae added the peer-das label Aug 25, 2025

nalepae reviewed Aug 26, 2025

View reviewed changes

terencechain force-pushed the verify-kzg-proofs1 branch 2 times, most recently from 0df4dde to 427d0ad Compare September 2, 2025 14:48

terencechain marked this pull request as ready for review September 2, 2025 14:51

terencechain added 2 commits September 2, 2025 07:51

Implement KZG proof batch verification for data column gossip validation

ff216f3

Manu's feedback

ad4f921

terencechain force-pushed the verify-kzg-proofs1 branch from 427d0ad to 839b011 Compare September 2, 2025 14:51

Add tests

6c6c119

terencechain force-pushed the verify-kzg-proofs1 branch from 839b011 to 6c6c119 Compare September 2, 2025 15:00

satushh reviewed Sep 2, 2025

View reviewed changes

Merge branch 'develop' into verify-kzg-proofs1

dcda40c

nalepae reviewed Sep 7, 2025

View reviewed changes

beacon-chain/sync/batch_verifier.go Outdated Show resolved Hide resolved

nalepae reviewed Sep 7, 2025

View reviewed changes

terencechain and others added 8 commits September 7, 2025 09:20

Update beacon-chain/sync/batch_verifier.go

68430a2

Co-authored-by: Manu NALEPA <[email protected]>

Update beacon-chain/sync/batch_verifier.go

ff4a28d

Co-authored-by: Manu NALEPA <[email protected]>

Update beacon-chain/sync/kzg_batch_verifier_test.go

cc48c70

Co-authored-by: Manu NALEPA <[email protected]>

Update beacon-chain/sync/kzg_batch_verifier_test.go

eeef7a4

Co-authored-by: Manu NALEPA <[email protected]>

Update beacon-chain/sync/kzg_batch_verifier_test.go

47aebb4

Co-authored-by: Manu NALEPA <[email protected]>

Update beacon-chain/sync/kzg_batch_verifier_test.go

4ccccbc

Co-authored-by: Manu NALEPA <[email protected]>

Update beacon-chain/sync/kzg_batch_verifier_test.go

b94f73b

Co-authored-by: Manu NALEPA <[email protected]>

Fix tests

c4e7268

nalepae previously approved these changes Sep 7, 2025

View reviewed changes

kasey reviewed Sep 10, 2025

View reviewed changes

terencechain dismissed nalepae’s stale review via b7105f6 September 10, 2025 02:08

terencechain force-pushed the verify-kzg-proofs1 branch 5 times, most recently from 593feda to 16110b5 Compare September 10, 2025 03:13

Kasey's feedback

0feae98

terencechain force-pushed the verify-kzg-proofs1 branch from 16110b5 to 0feae98 Compare September 10, 2025 17:29

nalepae previously approved these changes Sep 11, 2025

View reviewed changes

Merge branch 'develop' into verify-kzg-proofs1

1cc03f1

terencechain dismissed nalepae’s stale review via 1cc03f1 September 11, 2025 15:02

terencechain enabled auto-merge September 11, 2025 15:02

nalepae approved these changes Sep 11, 2025

View reviewed changes

terencechain added this pull request to the merge queue Sep 11, 2025

Merged via the queue into develop with commit 9e40551 Sep 11, 2025
17 checks passed

terencechain deleted the verify-kzg-proofs1 branch September 11, 2025 16:00

Implement KZG proof batch verification (option 2) - uses worker pool #15617

Implement KZG proof batch verification (option 2) - uses worker pool #15617

Uh oh!

Conversation

terencechain commented Aug 21, 2025

Uh oh!

nalepae left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

satushh Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

kasey Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nalepae Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kasey Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kasey Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kasey Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kasey Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kasey Sep 10, 2025 •

edited

Loading

kasey Sep 10, 2025 •

edited

Loading

kasey Sep 10, 2025 •

edited

Loading