[KLC-2395] fix parallel subtest data race in HonestyScore test#47
[KLC-2395] fix parallel subtest data race in HonestyScore test#47fbsobreira wants to merge 1 commit into
Conversation
TestSubslotSignature_ReceivedSignature_HonestyScore declared the parent test parallel and ran its two table-driven subtests in parallel as well, while they all shared the outer-scope container and subslotSignature. Each subtest re-registered the container's PeerHonestyHandler with a closure that asserts against its own table row. When the subtests run concurrently, one subtest's handler fires inside the other's sr.ReceivedSignature call, the wrong-row assertions trip, and the eventual t.Fail on a *testing.T whose subtest has already returned produces "panic: Fail in goroutine after [subtest] has completed". That recovered panic shows up in CI as the unnamed null FAIL. Construct container + subslotSignature inside each subtest so the parallel runs no longer share mutable state. Table now stores a ConsensusGroup index so each subtest resolves its own pubKey from its own sr. Reproduced reliably on develop with `go test -count=500` (or `-race -count=200`) within the first ~200 iterations; with this change the same stress completes cleanly at 1000 iterations plain and 500 iterations under -race.
There was a problem hiding this comment.
Pull request overview
Fixes an intermittent CI flake in TestSubslotSignature_ReceivedSignature_HonestyScore caused by parallel subtests sharing mutable test state (a consensus container and SubslotSignature) and overwriting the PeerHonestyHandler closure.
Changes:
- Move
container := mock.InitConsensusCore()andsr := *initSubslotSignatureWithContainer(container)initialization into each parallel subtest to avoid shared mutable state. - Replace table-driven test’s stored
pubKeywithpubKeyIndex(resolved from each subtest’s own freshly createdsr). - Update assertions/message construction to use the per-subtest
pubKey.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🧰 Additional context used📓 Path-based instructions (3)**/*.go📄 CodeRabbit inference engine (Custom checks)
Files:
core/consensus/**⚙️ CodeRabbit configuration file
Files:
**/*_test.go⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (1)📚 Learning: 2026-04-21T20:12:22.959ZApplied to files:
🔇 Additional comments (2)
WalkthroughThis PR eliminates a data race in ChangesHonesty Score Test Data Race Elimination
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 7 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (7 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
n/a - to be handle in KLC-2382 |
Summary
TestSubslotSignature_ReceivedSignature_HonestyScore(
core/consensus/slot/bls/subslotSignature_test.go) failed intermittentlyin CI. The parent test calls
t.Parallel()and both of its table-drivensubtests also call
t.Parallel()while sharing the outer-scopecontainerand
subslotSignature. Each subtest re-registers the container'sPeerHonestyHandlerwith a closure that asserts against its own table row,so when the subtests run concurrently one subtest's handler fires inside
the other's
sr.ReceivedSignaturecall — the wrong-row assertions trip andthe eventual
t.Failon a*testing.Twhose subtest has already returnedproduces
panic: Fail in goroutine after [subtest] has completed. Thatrecovered panic shows up in CI as the unnamed
nullFAIL.The fix constructs
containerandsubslotSignatureinside eachsubtest so the parallel runs no longer share mutable state. The table now
stores a
ConsensusGroup()index instead of a pre-resolved pubKey, sincethe pubKey is derived from each subtest's own fresh
sr.Verification
Reliably reproduced on
develop:Post-fix on this branch:
No panics, no race-detector warnings.
Related
integrationTest/consensus; samenullCI artifact pattern from a recovered
panicduring test teardown).Test plan
go build ./...golangci-lint run ./core/consensus/slot/bls/...— 0 issues-race): passSummary
This PR fixes a critical data race in the
TestSubslotSignature_ReceivedSignature_HonestyScoretest within the BLS (Boneh-Lynn-Shacham) consensus slot signature verification component. The fix ensures proper test isolation for parallel execution and restores reliable validation of peer honesty score updates.Blockchain-Critical Components Affected
The test validates the BLS subslot signature consensus component (
core/consensus/slot/bls/subslotSignature.go), which is responsible for:PeerHonestyHandlercallbacks (defined incore/consensus/interface.go)These are core to consensus stability and validator reputation tracking, which affects node peer selection and consensus reliability.
Issue & Resolution
The Problem: The test and its subtests both called
t.Parallel()while sharing mutable state (acontainerandsubslotSignatureinstance) across parallel test cases. Each subtest registered aPeerHonestyHandlerclosure that captured test case-specific assertions. When subtests ran concurrently, a handler registered by one subtest could execute during another subtest'ssr.ReceivedSignature()call, triggering assertion failures against wrong test data and causing panics like "Fail in goroutine after [subtest] has completed."The Fix: Each parallel subtest now creates its own fresh
containerandsubslotSignatureinstance, eliminating shared mutable state. The test cases now store apubKeyIndex(index into the consensus group) instead of a precomputedpubKeystring; each subtest derives its public key dynamically from its isolated instance.Impact on Node Stability & Data Integrity
-raceflag) showing no panics or race detector warningsTesting Validation
go test -count=500and-race -count=200