Skip to content

[KLC-1920] fix: force not-synced when gossip is ahead of probable highest nonce#69

Open
nickgs1337 wants to merge 4 commits into
developfrom
klc-1920-klc-2389-track-gossip-in-probable-highest
Open

[KLC-1920] fix: force not-synced when gossip is ahead of probable highest nonce#69
nickgs1337 wants to merge 4 commits into
developfrom
klc-1920-klc-2389-track-gossip-in-probable-highest

Conversation

@nickgs1337
Copy link
Copy Markdown
Contributor

@nickgs1337 nickgs1337 commented Jun 3, 2026

Both KLC-1920 (fallback desync after election period, 2+ years open) and KLC-2389 (fallback stalls in ~15-block bursts after mid-epoch restart) stem from the same invariant violation in the fork detector / synced-state computation. This PR fixes both with one change.

The fallback's heartbeat uses a freshly-generated observer key (KLC-2388 sibling work). When peer churn after an election (KLC-1920) or a mid-epoch restart (KLC-2389) breaks the BHReceived path while gossip (BHProposed) keeps flowing, probableHighestNonce freezes at the last processed nonce while highestNonceReceived climbs. computeNodeState then compares probableHighestNonce == currentBlockNonce and reports isNodeSynchronized=true — even though the chain is silently advancing past the node. The Slack-thread log captured this exactly: setHighestNonceReceived firing continuously, zero forkDetector.AddHeader state=0 lines, node declaring itself synced while 7+ blocks behind.

Fix

computeNodeState already uses probableHighestNonce <= currentBlockNonce for hasLastBlock. We add one extra condition: if the gossip-derived HighestNonceReceived is more than BlockFinality ahead of currentBlockNonce, force hasLastBlock=false. The fork detector's update semantics (processReceivedBlock, bfd.headers, probableHighestNonce) are untouched — only the synced-state interpretation changes. Consensus rounds, proposal/commit timing, and downstream listeners behave identically in healthy operation.

What changed

  • core/process/sync/baseSync.go::computeNodeState — one new if after the existing hasLastBlock line, plus updated debug log.
  • core/process/sync/baseForkDetector.go — expose HighestNonceReceived() uint64 (the private getter was already there).
  • core/process/interface.go — add HighestNonceReceived() uint64 to the ForkDetector interface.
  • common/mock/forkDetectorMock.go — corresponding mock impl.
  • core/process/sync/klc1920_repro_test.go — two regression tests pinning down the invariant the fix relies on.

5 files, 130 / 3 lines.

Validation

Unit tests: go test ./core/process/sync/... -run KLC green.

Empirical reproduction. Because the bug needs the asymmetric gossip-vs-fetch pattern, a healthy 3-node localnet cannot fire it organically. We use a controlled -infected build that drops BHReceived events on the fallback after 60s (env-var-gated, byte-equivalent production binaries when unset). Two clean 6-minute runs, same image base (cf9f612c), same infection patch, only the fix differs:

PRE-FIX infected POST-FIX infected
node0 final nonce 90 89
fallback final nonce 15 (permanently stalled) 87 (kept up with chain)
gap to chain 75 blocks behind 2 blocks (normal)
fallback klv_is_syncing 0 — lies, says synced while 75 behind 0 — correct, actually synced
infection drops applied 75+ BHReceived dropped 75 BHReceived dropped (identical pressure)
consensus on primaries producing producing

Same infection pressure, same scenario. Pre-fix exhibits the production bug shape (fallback stuck, false-synced). Post-fix the fallback keeps up despite the same disrupted-fetch pressure, and the synced metric is honest.

Full artifacts at sprint-97/KLC-1920/validation-artifacts/:

  • README.md — narrative, repro instructions, ruled-out alternatives
  • AB-A-prefix-infected/ — bug-reproduction run (poll.tsv, snapshots, full container logs)
  • AB-B-postfix-infected/ — fix-prevents-bug run (same shape)
  • run-AB-infected.sh — scenario runner (full docker reset → fresh boot → poll → capture)

Caveats

  • The localnet validation needs the -infected images because the production failure conditions don't occur naturally at 3-node scale. The infection is env-var-gated and only built into -infected tags; production binaries are byte-equivalent.
  • The original Slack-log evidence from the production fallback (sprint-97/KLC-1920/slack-thread/log.txt) matches the pre-fix infected run line-for-line — same symptom pattern, same metric lies.

Why not other shapes

Earlier iterations of this work bumped probableHighestNonce inline from BHProposed events in processReceivedBlock itself. That shape also catches the bug but mutates fork-detector internals and changes a counter's update timing. This PR keeps the fork detector alone and answers the synced-state question where it's actually asked. The discarded alternative is preserved on branch klc-1920-klc-2389-old-broken-attempt for reference.

Draft status: opened for review before squashing the regression-tests commit history. Will mark ready once a maintainer signs off on the approach.

Consensus and Synchronization Stability Fix

Blockchain-Critical Impact: Fixes a sync-state invariant in the bootstrap/fork-detection layer that could let a node report itself synchronized while gossiped headers (BHProposed) advanced the chain. Affects consensus/synchronization, networking/gossip handling, and node sync metrics; does not change transaction processing, KVM, block acceptance, or persistent chain data.

Root Cause: computeNodeState determined hasLastBlock/isNodeSynchronized solely from ProbableHighestNonce vs currentBlockNonce. If BHReceived/BHProcessed stalled but BHProposed gossip continued, highestNonceReceived (from gossip) could advance while probableHighestNonce stayed frozen, allowing a false synced state.

What changed

  • core/process/sync/baseSync.go
    • computeNodeState now reads ForkDetector.HighestNonceReceived() and forces hasLastBlock = false when HighestNonceReceived > currentBlockNonce + process.BlockFinality. Adds debug logging for probableHighestNonce, highestNonceReceived, and currentBlockNonce. This is a defensive sync-state guard only — it does not mutate consensus state or fork-detector behavior.
  • core/process/sync/baseForkDetector.go
    • Added exported HighestNonceReceived() uint64 accessor. Tightened mutex usage in setHighestNonceReceived to avoid an early-return race path.
  • core/process/interface.go
    • Added HighestNonceReceived() uint64 to the ForkDetector interface.
  • common/mock/forkDetectorMock.go
    • Mock updated to implement HighestNonceReceived callback.
  • Tests
    • core/process/sync/klc1920_repro_test.go: regression tests showing HighestNonceReceived advances under BHProposed-only delivery and that the gossip/probable gap can exceed process.BlockFinality.
    • core/process/sync/klc1920_node_state_test.go and core/process/sync/export_test.go: tests validating the new synced-state gate at the BlockFinality boundary (strict > behavior) and metric (klv_is_syncing) changes.

Node Stability & Data Integrity: Improves correctness of sync-state reporting and related metrics. No changes to block/header acceptance, consensus rules, transaction execution, or persistent chain data.

Concurrency & Error Handling: Small concurrency adjustment in setHighestNonceReceived (holding mutFork during compare/update to prevent missed early-return). No broad error-handling changes.

Validation: Unit tests for the sync package (including new regression tests) passed. An empirical repro using an instrumented build shows pre-fix nodes could report synced while falling ~75 blocks behind; post-fix behavior keeps fallback close (2-block gap) and reports sync accurately. Repro artifacts and runner script included in the PR.

Cross-cutting Notes: Monitoring and tooling that depend on probable/highest-nonce or isNodeSynchronized metrics will observe more accurate sync state. The change is defensive and backwards-compatible with consensus/state integrity.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5843c653-f1ff-49ef-817f-0716d05a73b2

📥 Commits

Reviewing files that changed from the base of the PR and between b860d55 and e97e103.

📒 Files selected for processing (1)
  • core/process/sync/klc1920_node_state_test.go
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: setup-and-lint / setup-and-lint
  • GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (Custom checks)

**/*.go: Verify that any new or modified concurrent code (goroutines, channels, mutexes, sync primitives) is free of race conditions. Check for: proper lock/unlock pairing, no goroutine leaks, correct channel lifecycle management, and proper context cancellation propagation.
Verify that errors are not silently discarded. Check for: unchecked error returns, error wrapping with context, proper error propagation up the call chain, and no bare panic() calls outside of init() functions.

Files:

  • core/process/sync/klc1920_node_state_test.go
**/*_test.go

⚙️ CodeRabbit configuration file

**/*_test.go: Test files. Review for: - Adequate coverage of edge cases and error paths - Proper use of test helpers and assertions - Race condition coverage (tests should use -race flag patterns) - No hardcoded sleep for synchronization (use channels or sync primitives) - Test isolation (no shared mutable state between tests)

Files:

  • core/process/sync/klc1920_node_state_test.go
🧠 Learnings (4)
📚 Learning: 2026-04-07T14:36:46.394Z
Learnt from: RomuloSiebra
Repo: klever-io/klever-go PR: 35
File: network/p2p/libp2p/peerid_stability_test.go:100-116
Timestamp: 2026-04-07T14:36:46.394Z
Learning: In `network/p2p/libp2p/peerid_stability_test.go` (Go, klever-go repo), the empty-seed tests (`TestCreateP2PPrivKey_EmptySeed_NoError` and `TestCreateP2PPrivKey_EmptySeed_LegacySeed_NoError`) intentionally only assert `NoError/NotNil`. A previous `NotEqual` assertion across two `crypto/rand`-backed calls was deliberately removed because it is a probabilistic assertion that re-verifies OS/stdlib entropy rather than project logic. Do not suggest adding `NotEqual` comparisons for empty-seed / `crypto/rand` paths in this codebase.

Applied to files:

  • core/process/sync/klc1920_node_state_test.go
📚 Learning: 2024-11-19T20:43:36.454Z
Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 879
File: core/statistics/tpsBenchmark.go:167-170
Timestamp: 2024-11-19T20:43:36.454Z
Learning: In the `core/statistics/tpsBenchmark.go` file, within the `updateStatistics` method, note that `Header.Nonce` can be zero (e.g., for the genesis block), so adding a zero nonce check is unnecessary.

Applied to files:

  • core/process/sync/klc1920_node_state_test.go
📚 Learning: 2026-04-21T20:12:22.959Z
Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 38
File: indexer/eventsProcessor.go:188-211
Timestamp: 2026-04-21T20:12:22.959Z
Learning: In Go structs that are JSON-marshaled, if a field is a `bool` and has the `json:"...,omitempty"` tag, then leaving that field at its zero value (`false`) is functionally equivalent (in the resulting JSON) to explicitly setting `Foundation: false`. Reviewers should not flag struct literals that omit such `bool` fields as an inconsistency; they will serialize identically because `omitempty` suppresses `false` values.

Applied to files:

  • core/process/sync/klc1920_node_state_test.go
📚 Learning: 2026-05-23T22:52:58.065Z
Learnt from: fbsobreira
Repo: klever-io/klever-go PR: 65
File: data/blockchain/blockchain.go:170-172
Timestamp: 2026-05-23T22:52:58.065Z
Learning: In Go, the pattern `append([]byte(nil), src...)` should be treated as preserving nil identity when `src` is a nil `[]byte`: spreading a nil slice contributes zero variadic arguments, so `append` performs no allocation and returns the original nil destination slice unchanged (i.e., result is nil, not an empty non-nil slice). Do not flag this as an incorrect empty-slice conversion; it intentionally maintains `nil`.

Applied to files:

  • core/process/sync/klc1920_node_state_test.go
🔇 Additional comments (4)
core/process/sync/klc1920_node_state_test.go (4)

33-35: LGTM!


93-119: LGTM!


121-144: LGTM!


145-166: LGTM!


Walkthrough

Adds HighestNonceReceived() to the ForkDetector API and mock, implements it in baseForkDetector with a guarded setter, updates computeNodeState to mark the node not synced when gossip is ahead by > BlockFinality, and adds regression tests and test harness reproducing KLC-1920.

Changes

Fork detector highest nonce tracking

Layer / File(s) Summary
ForkDetector contract and mock support
core/process/interface.go, common/mock/forkDetectorMock.go
ForkDetector interface gains HighestNonceReceived() uint64; ForkDetectorMock adds HighestNonceReceivedCalled callback and a mock HighestNonceReceived() method.
baseForkDetector implementation and setter guard
core/process/sync/baseForkDetector.go
Implements exported HighestNonceReceived() delegating to internal state; updates setHighestNonceReceived to perform comparison while holding mutFork and early-return to avoid races.
Node state computation with finality guard
core/process/sync/baseSync.go
computeNodeState reads probableHighestNonce and highestNonceReceived, computes gap relative to currentBlockNonce, and forces hasLastBlock=false when gap > process.BlockFinality; debug logging updated.
KLC-1920 regression tests and test harness
core/process/sync/klc1920_repro_test.go, core/process/sync/export_test.go, core/process/sync/klc1920_node_state_test.go
Adds tests asserting HighestNonceReceived advances under BHProposed-only delivery while ProbableHighestNonce stays at the last processed nonce, verifies gap > BlockFinality triggers not-synced behavior, and provides exported test bootstrap and observable metrics helper.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

consensus-critical

Suggested reviewers

  • RomuloSiebra
🚥 Pre-merge checks | ✅ 8
✅ Passed checks (8 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title follows the required format [KLC-XXXX] type: description with KLC-1920 JIRA key, 'fix' type, and a clear description of the change.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Concurrency Safety ✅ Passed All new and modified concurrent code is free of race conditions: proper lock/unlock pairing, no goroutine leaks, correct early returns with explicit unlocks, and no nested locks.
Error Handling ✅ Passed All error returns are properly checked (assert.Nil), no bare panic() calls, no unchecked errors, and computeNodeState only invokes non-error-returning methods.
State Consistency ✅ Passed PR modifies only synchronization state (boolean flags, uint64 nonces), not blockchain ledger state. Proper mutex protection with lock/defer ensures thread safety.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch klc-1920-klc-2389-track-gossip-in-probable-highest

Comment @coderabbitai help to get the list of available commands and usage tips.

@nickgs1337 nickgs1337 marked this pull request as ready for review June 3, 2026 11:52
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/process/sync/baseSync.go (1)

295-301: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Apply the gossip-gap guard when currentHeader is nil too.

Line 296 can still mark boot.hasLastBlock=true using only ProbableHighestNonce, so a genesis-only node can be misclassified as synced while gossip is already ahead by more than BlockFinality.

Proposed fix
 	lastNonce := genesisNonce
 	lastSlot := boot.chainHandler.GetGenesisHeader().GetSlot()
-	if check.IfNil(currentHeader) {
-		boot.hasLastBlock = boot.forkDetector.ProbableHighestNonce() == genesisNonce
-		log.Debug("computeNodeState",
-			"probableHighestNonce", boot.forkDetector.ProbableHighestNonce(),
-			"currentBlockNonce", nil,
-			"boot.hasLastBlock", boot.hasLastBlock)
-	} else {
+	currentBlockNonce := genesisNonce
+	if check.IfNil(currentHeader) {
+		// keep genesis nonce
+	} else {
 		lastNonce = currentHeader.GetNonce()
 		lastSlot = currentHeader.GetSlot()
-		currentBlockNonce := boot.chainHandler.GetCurrentBlockHeader().GetNonce()
-		probableHighestNonce := boot.forkDetector.ProbableHighestNonce()
-		highestNonceReceived := boot.forkDetector.HighestNonceReceived()
-		boot.hasLastBlock = probableHighestNonce <= currentBlockNonce
-		// KLC-1920: gossip-derived ceiling is the source of truth that
-		// probableHighestNonce can lag behind when the BHReceived path is
-		// disrupted (peer churn after an election, fallback observer not
-		// receiving fetched headers). If gossip reports the network ahead
-		// by more than the normal proposal/commit window, the node is not
-		// really synced even if probableHighestNonce equals currentBlockNonce.
-		if highestNonceReceived > currentBlockNonce+process.BlockFinality {
-			boot.hasLastBlock = false
-		}
-		log.Debug("computeNodeState",
-			"probableHighestNonce", probableHighestNonce,
-			"highestNonceReceived", highestNonceReceived,
-			"currentBlockNonce", currentBlockNonce,
-			"boot.hasLastBlock", boot.hasLastBlock)
+		currentBlockNonce = currentHeader.GetNonce()
 	}
+	probableHighestNonce := boot.forkDetector.ProbableHighestNonce()
+	highestNonceReceived := boot.forkDetector.HighestNonceReceived()
+	boot.hasLastBlock = probableHighestNonce <= currentBlockNonce
+	if highestNonceReceived > currentBlockNonce &&
+		highestNonceReceived-currentBlockNonce > uint64(process.BlockFinality) {
+		boot.hasLastBlock = false
+	}
+	log.Debug("computeNodeState",
+		"probableHighestNonce", probableHighestNonce,
+		"highestNonceReceived", highestNonceReceived,
+		"currentBlockNonce", currentBlockNonce,
+		"boot.hasLastBlock", boot.hasLastBlock)

Also applies to: 304-316

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/process/sync/baseSync.go` around lines 295 - 301, When currentHeader is
nil you still set boot.hasLastBlock based only on
boot.forkDetector.ProbableHighestNonce() which can misclassify a genesis-only
node as synced; change the check so boot.hasLastBlock is set true only if the
gossip gap is within BlockFinality (e.g.
boot.forkDetector.ProbableHighestNonce() - genesisNonce <= BlockFinality). Apply
the same gossip-gap guard to the existing branch that handles non-nil
currentHeader (the logic around check.IfNil(currentHeader), boot.hasLastBlock,
boot.forkDetector.ProbableHighestNonce(), and genesisNonce) so both paths use
the BlockFinality comparison before marking hasLastBlock.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/process/sync/baseForkDetector.go`:
- Around line 307-314: The highestNonceReceived updates can regress under
concurrency; update the implementation so reads/writes are atomic and monotonic:
replace the mutable field usage in baseForkDetector with an atomic uint64 (use
atomic.LoadUint64 for HighestNonceReceived() and an atomic CAS loop to set the
new value only if greater) or protect the update path in highestNonceReceived
(and anywhere that assigns to that field, e.g., the writer path currently doing
a compare outside the lock) by moving the compare-and-set inside the same mutex.
In short, ensure highestNonceReceived uses atomic.LoadUint64 for reads and an
atomic CompareAndSwap/Store-max loop (or a locked section) to only increase the
stored value.

In `@core/process/sync/klc1920_repro_test.go`:
- Line 86: The test currently ignores the error returned by bfd.AddHeader in the
repro loop; update the loop to check and fail on that error (e.g., using
t.Fatalf/t.Fatal or the test assertion helper in use) so BHProposed insertion
failures cause the test to fail. Locate the call to
bfd.AddHeader(&block.Block{Header: hdr}, hash, process.BHProposed, nil, nil) and
replace the silent discard with error handling that reports the error and aborts
the test.

---

Outside diff comments:
In `@core/process/sync/baseSync.go`:
- Around line 295-301: When currentHeader is nil you still set boot.hasLastBlock
based only on boot.forkDetector.ProbableHighestNonce() which can misclassify a
genesis-only node as synced; change the check so boot.hasLastBlock is set true
only if the gossip gap is within BlockFinality (e.g.
boot.forkDetector.ProbableHighestNonce() - genesisNonce <= BlockFinality). Apply
the same gossip-gap guard to the existing branch that handles non-nil
currentHeader (the logic around check.IfNil(currentHeader), boot.hasLastBlock,
boot.forkDetector.ProbableHighestNonce(), and genesisNonce) so both paths use
the BlockFinality comparison before marking hasLastBlock.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e6680200-113a-4926-8790-4fe62c354ba8

📥 Commits

Reviewing files that changed from the base of the PR and between 785b77c and 240a6e5.

📒 Files selected for processing (5)
  • common/mock/forkDetectorMock.go
  • core/process/interface.go
  • core/process/sync/baseForkDetector.go
  • core/process/sync/baseSync.go
  • core/process/sync/klc1920_repro_test.go
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (Custom checks)

**/*.go: Verify that any new or modified concurrent code (goroutines, channels, mutexes, sync primitives) is free of race conditions. Check for: proper lock/unlock pairing, no goroutine leaks, correct channel lifecycle management, and proper context cancellation propagation.
Verify that errors are not silently discarded. Check for: unchecked error returns, error wrapping with context, proper error propagation up the call chain, and no bare panic() calls outside of init() functions.

Files:

  • common/mock/forkDetectorMock.go
  • core/process/sync/baseForkDetector.go
  • core/process/interface.go
  • core/process/sync/klc1920_repro_test.go
  • core/process/sync/baseSync.go
**/*_test.go

⚙️ CodeRabbit configuration file

**/*_test.go: Test files. Review for: - Adequate coverage of edge cases and error paths - Proper use of test helpers and assertions - Race condition coverage (tests should use -race flag patterns) - No hardcoded sleep for synchronization (use channels or sync primitives) - Test isolation (no shared mutable state between tests)

Files:

  • core/process/sync/klc1920_repro_test.go
🧠 Learnings (7)
📓 Common learnings
Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 879
File: core/statistics/tpsBenchmark.go:167-170
Timestamp: 2024-11-19T20:43:36.454Z
Learning: In the `core/statistics/tpsBenchmark.go` file, within the `updateStatistics` method, note that `Header.Nonce` can be zero (e.g., for the genesis block), so adding a zero nonce check is unnecessary.
📚 Learning: 2026-04-21T20:12:22.959Z
Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 38
File: indexer/eventsProcessor.go:188-211
Timestamp: 2026-04-21T20:12:22.959Z
Learning: In Go structs that are JSON-marshaled, if a field is a `bool` and has the `json:"...,omitempty"` tag, then leaving that field at its zero value (`false`) is functionally equivalent (in the resulting JSON) to explicitly setting `Foundation: false`. Reviewers should not flag struct literals that omit such `bool` fields as an inconsistency; they will serialize identically because `omitempty` suppresses `false` values.

Applied to files:

  • common/mock/forkDetectorMock.go
  • core/process/sync/baseForkDetector.go
  • core/process/interface.go
  • core/process/sync/klc1920_repro_test.go
  • core/process/sync/baseSync.go
📚 Learning: 2026-05-23T22:52:58.065Z
Learnt from: fbsobreira
Repo: klever-io/klever-go PR: 65
File: data/blockchain/blockchain.go:170-172
Timestamp: 2026-05-23T22:52:58.065Z
Learning: In Go, the pattern `append([]byte(nil), src...)` should be treated as preserving nil identity when `src` is a nil `[]byte`: spreading a nil slice contributes zero variadic arguments, so `append` performs no allocation and returns the original nil destination slice unchanged (i.e., result is nil, not an empty non-nil slice). Do not flag this as an incorrect empty-slice conversion; it intentionally maintains `nil`.

Applied to files:

  • common/mock/forkDetectorMock.go
  • core/process/sync/baseForkDetector.go
  • core/process/interface.go
  • core/process/sync/klc1920_repro_test.go
  • core/process/sync/baseSync.go
📚 Learning: 2024-11-19T20:43:36.454Z
Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 879
File: core/statistics/tpsBenchmark.go:167-170
Timestamp: 2024-11-19T20:43:36.454Z
Learning: In the `core/statistics/tpsBenchmark.go` file, within the `updateStatistics` method, note that `Header.Nonce` can be zero (e.g., for the genesis block), so adding a zero nonce check is unnecessary.

Applied to files:

  • core/process/sync/baseForkDetector.go
  • core/process/sync/klc1920_repro_test.go
  • core/process/sync/baseSync.go
📚 Learning: 2026-04-07T14:36:46.394Z
Learnt from: RomuloSiebra
Repo: klever-io/klever-go PR: 35
File: network/p2p/libp2p/peerid_stability_test.go:100-116
Timestamp: 2026-04-07T14:36:46.394Z
Learning: In `network/p2p/libp2p/peerid_stability_test.go` (Go, klever-go repo), the empty-seed tests (`TestCreateP2PPrivKey_EmptySeed_NoError` and `TestCreateP2PPrivKey_EmptySeed_LegacySeed_NoError`) intentionally only assert `NoError/NotNil`. A previous `NotEqual` assertion across two `crypto/rand`-backed calls was deliberately removed because it is a probabilistic assertion that re-verifies OS/stdlib entropy rather than project logic. Do not suggest adding `NotEqual` comparisons for empty-seed / `crypto/rand` paths in this codebase.

Applied to files:

  • core/process/sync/klc1920_repro_test.go
📚 Learning: 2024-11-18T18:23:56.020Z
Learnt from: fbsobreira
Repo: klever-io/klever-go PR: 877
File: core/process/transaction/txProcess_test.go:2513-2516
Timestamp: 2024-11-18T18:23:56.020Z
Learning: In the test function `TestTxProcessor_ProcessClaimAllowanceKFIShouldFail` in `core/process/transaction/txProcess_test.go`, both validation and processing steps should be tested even when validation is expected to fail, to ensure that the processing correctly handles invalid inputs.

Applied to files:

  • core/process/sync/klc1920_repro_test.go
📚 Learning: 2024-11-18T18:25:28.208Z
Learnt from: fbsobreira
Repo: klever-io/klever-go PR: 877
File: core/process/transaction/txProcess_test.go:2481-2484
Timestamp: 2024-11-18T18:25:28.208Z
Learning: In the `TestTxProcessor_ProcessClaimAllowanceOkValsShouldWork` function in `core/process/transaction/txProcess_test.go`, the transaction processing and assertions are already included.

Applied to files:

  • core/process/sync/klc1920_repro_test.go
🔇 Additional comments (2)
core/process/interface.go (1)

65-65: LGTM!

common/mock/forkDetectorMock.go (1)

16-16: LGTM!

Also applies to: 65-70

Comment thread core/process/sync/baseForkDetector.go
Comment thread core/process/sync/klc1920_repro_test.go Outdated
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 3, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/process/sync/klc1920_node_state_test.go`:
- Around line 122-141: The test
TestKLC1920_ComputeNodeState_GossipWithinFinalityStaysSynced currently hardcodes
the `highest` value (51) to model `gap == BlockFinality`; change the test to
compute `highest` from the real constant by using `process.BlockFinality` (e.g.
set `highest := probable + process.BlockFinality`) when calling
buildKLC1920Bootstrap so the boundary remains correct if BlockFinality changes,
and optionally add a sibling test that sets `highest := probable +
process.BlockFinality + 1` to exercise the `gap == BlockFinality+1` case; update
references around TestKLC1920_ComputeNodeState_GossipWithinFinalityStaysSynced,
buildKLC1920Bootstrap and ComputeNodeState accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 767bcc76-5b47-468f-bb17-fe7a51543940

📥 Commits

Reviewing files that changed from the base of the PR and between 875fbfc and b860d55.

📒 Files selected for processing (2)
  • core/process/sync/export_test.go
  • core/process/sync/klc1920_node_state_test.go
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: setup-and-lint / setup-and-lint
  • GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (Custom checks)

**/*.go: Verify that any new or modified concurrent code (goroutines, channels, mutexes, sync primitives) is free of race conditions. Check for: proper lock/unlock pairing, no goroutine leaks, correct channel lifecycle management, and proper context cancellation propagation.
Verify that errors are not silently discarded. Check for: unchecked error returns, error wrapping with context, proper error propagation up the call chain, and no bare panic() calls outside of init() functions.

Files:

  • core/process/sync/export_test.go
  • core/process/sync/klc1920_node_state_test.go
**/*_test.go

⚙️ CodeRabbit configuration file

**/*_test.go: Test files. Review for: - Adequate coverage of edge cases and error paths - Proper use of test helpers and assertions - Race condition coverage (tests should use -race flag patterns) - No hardcoded sleep for synchronization (use channels or sync primitives) - Test isolation (no shared mutable state between tests)

Files:

  • core/process/sync/export_test.go
  • core/process/sync/klc1920_node_state_test.go
🧠 Learnings (3)
📚 Learning: 2026-04-21T20:12:22.959Z
Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 38
File: indexer/eventsProcessor.go:188-211
Timestamp: 2026-04-21T20:12:22.959Z
Learning: In Go structs that are JSON-marshaled, if a field is a `bool` and has the `json:"...,omitempty"` tag, then leaving that field at its zero value (`false`) is functionally equivalent (in the resulting JSON) to explicitly setting `Foundation: false`. Reviewers should not flag struct literals that omit such `bool` fields as an inconsistency; they will serialize identically because `omitempty` suppresses `false` values.

Applied to files:

  • core/process/sync/export_test.go
  • core/process/sync/klc1920_node_state_test.go
📚 Learning: 2026-05-23T22:52:58.065Z
Learnt from: fbsobreira
Repo: klever-io/klever-go PR: 65
File: data/blockchain/blockchain.go:170-172
Timestamp: 2026-05-23T22:52:58.065Z
Learning: In Go, the pattern `append([]byte(nil), src...)` should be treated as preserving nil identity when `src` is a nil `[]byte`: spreading a nil slice contributes zero variadic arguments, so `append` performs no allocation and returns the original nil destination slice unchanged (i.e., result is nil, not an empty non-nil slice). Do not flag this as an incorrect empty-slice conversion; it intentionally maintains `nil`.

Applied to files:

  • core/process/sync/export_test.go
  • core/process/sync/klc1920_node_state_test.go
📚 Learning: 2026-04-07T14:36:46.394Z
Learnt from: RomuloSiebra
Repo: klever-io/klever-go PR: 35
File: network/p2p/libp2p/peerid_stability_test.go:100-116
Timestamp: 2026-04-07T14:36:46.394Z
Learning: In `network/p2p/libp2p/peerid_stability_test.go` (Go, klever-go repo), the empty-seed tests (`TestCreateP2PPrivKey_EmptySeed_NoError` and `TestCreateP2PPrivKey_EmptySeed_LegacySeed_NoError`) intentionally only assert `NoError/NotNil`. A previous `NotEqual` assertion across two `crypto/rand`-backed calls was deliberately removed because it is a probabilistic assertion that re-verifies OS/stdlib entropy rather than project logic. Do not suggest adding `NotEqual` comparisons for empty-seed / `crypto/rand` paths in this codebase.

Applied to files:

  • core/process/sync/klc1920_node_state_test.go
🔇 Additional comments (3)
core/process/sync/klc1920_node_state_test.go (1)

28-53: LGTM!

core/process/sync/export_test.go (2)

36-46: ⚡ Quick win

Remove/adjust concern: no accessor name collision on baseBootstrap
func (boot *baseBootstrap) IsNodeSynchronized() and func (boot *baseBootstrap) HasLastBlock() only exist in core/process/sync/export_test.go; there are no non-_test.go receiver methods on baseBootstrap with those names, so this won’t trigger a duplicate-method compile error.


25-34: ⚡ Quick win

Resolve: slotManager.BeforeGenesis() gates the spawned goroutine, so it won’t nil-deref in these tests.

computeNodeState() only starts go boot.requestHeadersIfSyncIsStuck() when shouldTryToRequestHeaders() is true, and shouldTryToRequestHeaders() returns false immediately when boot.slotManager.BeforeGenesis() is true. In core/process/sync/klc1920_node_state_test.go, the consensusMock.SlotManagerMock passed into NewBaseBootstrapForKLC1920Test has BeforeGenesisCalled: func() bool { return true } (explicitly suppressing the requestHeadersIfSyncIsStuck path), so the partially-initialized baseBootstrap won’t hit nil dependencies in requestHeaders().

Comment thread core/process/sync/klc1920_node_state_test.go Outdated
Copy link
Copy Markdown
Member

@fbsobreira fbsobreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check comments and the tests file is an outlier vs. the package conventions. Pease fold them into the existing metaForkDetector_test.go, keeping the TestKLC1920_* names as regression markers. The computeNodeState boundary test (coverage comment) targets baseSync.go, so it should go in a new baseSync_test.go — next to the code it actually guards.

lastNonce = currentHeader.GetNonce()
lastSlot = currentHeader.GetSlot()
boot.hasLastBlock = boot.forkDetector.ProbableHighestNonce() <= boot.chainHandler.GetCurrentBlockHeader().GetNonce()
currentBlockNonce := boot.chainHandler.GetCurrentBlockHeader().GetNonce()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the refetch of lastNonce from L#302

assert.Equal(t, uint64(10), bfd.ProbableHighestNonce(),
"probableHighestNonce intentionally stays at last processed — BHProposed must not advance it (would break consensus during proposal rounds)")

gap := bfd.HighestNonceReceived() - bfd.ProbableHighestNonce()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test pins HighestNonceReceived − ProbableHighestNonce, but fix compares against currentBlockNonce. Coincide here only because no BHReceived headers were added.
Reframe the assertion around HighestNonceReceived − currentBlockNonce so the guard tracks what the fix actually evaluates.

// receiving fetched headers). If gossip reports the network ahead
// by more than the normal proposal/commit window, the node is not
// really synced even if probableHighestNonce equals currentBlockNonce.
if highestNonceReceived > currentBlockNonce+process.BlockFinality {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlockFinality is hardcoded to 1, so this guard trips whenever the gossiped nonce ceiling (highestNonceReceived) runs ≥ 2 blocks ahead of the committed tip. That gap is reached during normal propagation/commit latency or a single missed round — i.e. when the node is briefly one block behind while the next proposal is already gossiping in — which would cause transient false not-synced flapping. Suggest widening the tolerance (e.g. tie it to the existing "max rounds without a new block" value, or BlockFinality + k) so benign one-block lag doesn't flip the state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants