-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Implement KZG proof batch verification (option 2) - uses worker pool #15617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
54e4f63
to
5f13126
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not run the code itself.
0df4dde
to
427d0ad
Compare
427d0ad
to
839b011
Compare
839b011
to
6c6c119
Compare
beacon-chain/sync/batch_verifier.go
Outdated
verificationSet := &kzgVerifier{dataColumns: dataColumns, resChan: resChan} | ||
s.kzgChan <- verificationSet | ||
|
||
resErr := <-resChan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be wrapped in a select statement with ctx.Done()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so too, and actually I would suggest using context.WithTimeout
here to limit how long this potentially blocks on batch verification. This would simplify the other side of the channel because it would no longer need to write ctx.Err() back to all the channels (which racy anyway and can leave these goroutines wedged).
return validationResult, err | ||
} | ||
// Mark KZG verification as satisfied since we did it via batch verifier | ||
verifier.SatisfyRequirement(verification.RequireSidecarKzgProofVerified) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using
defer dv.recordResult(RequireSidecarKzgProofVerified, &err)
directly in s.validateWithKzgBatchVerifier
as done for all other verification functions?
Co-authored-by: Manu NALEPA <[email protected]>
Co-authored-by: Manu NALEPA <[email protected]>
Co-authored-by: Manu NALEPA <[email protected]>
Co-authored-by: Manu NALEPA <[email protected]>
Co-authored-by: Manu NALEPA <[email protected]>
Co-authored-by: Manu NALEPA <[email protected]>
Co-authored-by: Manu NALEPA <[email protected]>
beacon-chain/sync/batch_verifier.go
Outdated
verifyKzgBatch(kzgBatch) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could simplify the pair of methods kzgVerifierRoutine
+ pullKzgChan
into a single method like the following. Also note that I have dropped this kzg.resChan <- s.ctx.Err()
loop - I think this is not necessary if the callers use a select and a context.WithTimeout (which they should do anyway for safe concurrency).
// A routine that runs in the background to perform batch
// KZG verifications by draining the channel and processing all pending requests.
func (s *Service) kzgVerifierRoutine() {
kzgBatch := make([]*kzgVerifier, 0, 1)
for {
select {
case <-s.ctx.Done():
return
case kzg := <-s.kzgChan:
kzgBatch = append(kzgBatch, kzg)
continue
default:
verifyKzgBatch(kzgBatch)
kzgBatch = make([]*kzgVerifier, 0, 1)
}
}
}
(edit: fixed a bug in the suggestion)
beacon-chain/sync/batch_verifier.go
Outdated
s.kzgChan <- verificationSet | ||
|
||
resErr := <-resChan | ||
close(resChan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't really need to close channels, they get cleaned up by the garbage collector. You should only close a channel on the sending side, to unblock any receivers who could be waiting on more values from a sender which isn't going to produce any more. Receiving from a closed channel is fine (instantly unblocks any goroutine trying to receive), but sending to one /causes a panic/. This feedback will be more critical if you take the other piece of feedback to check ctx.Done()
above, because in that instance you won't have read from resChan
, so the other side could easily send to it after it is closed here and panic.
beacon-chain/sync/batch_verifier.go
Outdated
// the individual data columns might still be valid. | ||
err := peerdas.VerifyDataColumnsSidecarKZGProofs(dataColumns) | ||
if err != nil { | ||
verErr := errors.Wrapf(err, "Could not verify") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Why a new variable and not just err = errors.Wrapf(err, "Could not verify")
?
s.newColumnsVerifier = newDataColumnsVerifierFromInitializer(v) | ||
|
||
go s.verifierRoutine() | ||
go s.kzgVerifierRoutine() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have more than 1 goroutine doing this?
593feda
to
16110b5
Compare
16110b5
to
0feae98
Compare
Before this commit: After 100 ms, an un-batched verification is launched concurrently to the batched one. As a result, a stressed node could start to be even more stressed by the multiple verifications. Also, it is always hard to choose a correct timeout value. 100ms may be OK for a given node with a given BPO version, and not ok for the same node with a BPO version with 10x more blobs. However, we know this gossip validation won't be useful after a full slot duration. After this commit: After a full slot duration, we just ignore the incoming gossip message. It's important to ignore it and not to reject it, since rejecting it would downscore the peer sending this message.
…ffchainLabs#15617) * Implement KZG proof batch verification for data column gossip validation * Manu's feedback * Add tests * Update beacon-chain/sync/batch_verifier.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/batch_verifier.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Fix tests * Kasey's feedback * `validateWithKzgBatchVerifier`: Give up after a full slot. Before this commit: After 100 ms, an un-batched verification is launched concurrently to the batched one. As a result, a stressed node could start to be even more stressed by the multiple verifications. Also, it is always hard to choose a correct timeout value. 100ms may be OK for a given node with a given BPO version, and not ok for the same node with a BPO version with 10x more blobs. However, we know this gossip validation won't be useful after a full slot duration. After this commit: After a full slot duration, we just ignore the incoming gossip message. It's important to ignore it and not to reject it, since rejecting it would downscore the peer sending this message. --------- Co-authored-by: Manu NALEPA <[email protected]>
…ffchainLabs#15617) * Implement KZG proof batch verification for data column gossip validation * Manu's feedback * Add tests * Update beacon-chain/sync/batch_verifier.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/batch_verifier.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/sync/kzg_batch_verifier_test.go Co-authored-by: Manu NALEPA <[email protected]> * Fix tests * Kasey's feedback * `validateWithKzgBatchVerifier`: Give up after a full slot. Before this commit: After 100 ms, an un-batched verification is launched concurrently to the batched one. As a result, a stressed node could start to be even more stressed by the multiple verifications. Also, it is always hard to choose a correct timeout value. 100ms may be OK for a given node with a given BPO version, and not ok for the same node with a BPO version with 10x more blobs. However, we know this gossip validation won't be useful after a full slot duration. After this commit: After a full slot duration, we just ignore the incoming gossip message. It's important to ignore it and not to reject it, since rejecting it would downscore the peer sending this message. --------- Co-authored-by: Manu NALEPA <[email protected]>
This is an alt to #15609, it changes KZG proof batch verification from timer-based batching to a responsive worker pool model that processes messages immediately when idle and buffers when busy