[KLC-1920] fix: force not-synced when gossip is ahead of probable highest nonce by nickgs1337 · Pull Request #69 · klever-io/klever-go

nickgs1337 · 2026-06-03T11:39:06Z

Both KLC-1920 (fallback desync after election period, 2+ years open) and KLC-2389 (fallback stalls in ~15-block bursts after mid-epoch restart) stem from the same invariant violation in the fork detector / synced-state computation. This PR fixes both with one change.

The fallback's heartbeat uses a freshly-generated observer key (KLC-2388 sibling work). When peer churn after an election (KLC-1920) or a mid-epoch restart (KLC-2389) breaks the BHReceived path while gossip (BHProposed) keeps flowing, probableHighestNonce freezes at the last processed nonce while highestNonceReceived climbs. computeNodeState then compares probableHighestNonce == currentBlockNonce and reports isNodeSynchronized=true — even though the chain is silently advancing past the node. The Slack-thread log captured this exactly: setHighestNonceReceived firing continuously, zero forkDetector.AddHeader state=0 lines, node declaring itself synced while 7+ blocks behind.

Fix

computeNodeState already uses probableHighestNonce <= currentBlockNonce for hasLastBlock. We add one extra condition: if the gossip-derived HighestNonceReceived is more than BlockFinality ahead of currentBlockNonce, force hasLastBlock=false. The fork detector's update semantics (processReceivedBlock, bfd.headers, probableHighestNonce) are untouched — only the synced-state interpretation changes. Consensus rounds, proposal/commit timing, and downstream listeners behave identically in healthy operation.

What changed

core/process/sync/baseSync.go::computeNodeState — one new if after the existing hasLastBlock line, plus updated debug log.
core/process/sync/baseForkDetector.go — expose HighestNonceReceived() uint64 (the private getter was already there).
core/process/interface.go — add HighestNonceReceived() uint64 to the ForkDetector interface.
common/mock/forkDetectorMock.go — corresponding mock impl.
core/process/sync/klc1920_repro_test.go — two regression tests pinning down the invariant the fix relies on.

5 files, 130 / 3 lines.

Validation

Unit tests: go test ./core/process/sync/... -run KLC green.

Empirical reproduction. Because the bug needs the asymmetric gossip-vs-fetch pattern, a healthy 3-node localnet cannot fire it organically. We use a controlled -infected build that drops BHReceived events on the fallback after 60s (env-var-gated, byte-equivalent production binaries when unset). Two clean 6-minute runs, same image base (cf9f612c), same infection patch, only the fix differs:

	PRE-FIX infected	POST-FIX infected
node0 final nonce	90	89
fallback final nonce	15 (permanently stalled)	87 (kept up with chain)
gap to chain	75 blocks behind	2 blocks (normal)
fallback `klv_is_syncing`	0 — lies, says synced while 75 behind	0 — correct, actually synced
infection drops applied	75+ BHReceived dropped	75 BHReceived dropped (identical pressure)
consensus on primaries	producing	producing

Same infection pressure, same scenario. Pre-fix exhibits the production bug shape (fallback stuck, false-synced). Post-fix the fallback keeps up despite the same disrupted-fetch pressure, and the synced metric is honest.

Full artifacts at sprint-97/KLC-1920/validation-artifacts/:

README.md — narrative, repro instructions, ruled-out alternatives
AB-A-prefix-infected/ — bug-reproduction run (poll.tsv, snapshots, full container logs)
AB-B-postfix-infected/ — fix-prevents-bug run (same shape)
run-AB-infected.sh — scenario runner (full docker reset → fresh boot → poll → capture)

Caveats

The localnet validation needs the -infected images because the production failure conditions don't occur naturally at 3-node scale. The infection is env-var-gated and only built into -infected tags; production binaries are byte-equivalent.
The original Slack-log evidence from the production fallback (sprint-97/KLC-1920/slack-thread/log.txt) matches the pre-fix infected run line-for-line — same symptom pattern, same metric lies.

Why not other shapes

Earlier iterations of this work bumped probableHighestNonce inline from BHProposed events in processReceivedBlock itself. That shape also catches the bug but mutates fork-detector internals and changes a counter's update timing. This PR keeps the fork detector alone and answers the synced-state question where it's actually asked. The discarded alternative is preserved on branch klc-1920-klc-2389-old-broken-attempt for reference.

Draft status: opened for review before squashing the regression-tests commit history. Will mark ready once a maintainer signs off on the approach.

Consensus and Synchronization Stability Fix

Blockchain-Critical Impact: Fixes a sync-state invariant in the bootstrap/fork-detection layer that could let a node report itself synchronized while gossiped headers (BHProposed) advanced the chain. Affects consensus/synchronization, networking/gossip handling, and node sync metrics; does not change transaction processing, KVM, block acceptance, or persistent chain data.

Root Cause: computeNodeState determined hasLastBlock/isNodeSynchronized solely from ProbableHighestNonce vs currentBlockNonce. If BHReceived/BHProcessed stalled but BHProposed gossip continued, highestNonceReceived (from gossip) could advance while probableHighestNonce stayed frozen, allowing a false synced state.

What changed

core/process/sync/baseSync.go
- computeNodeState now reads ForkDetector.HighestNonceReceived() and forces hasLastBlock = false when HighestNonceReceived > currentBlockNonce + process.BlockFinality. Adds debug logging for probableHighestNonce, highestNonceReceived, and currentBlockNonce. This is a defensive sync-state guard only — it does not mutate consensus state or fork-detector behavior.
core/process/sync/baseForkDetector.go
- Added exported HighestNonceReceived() uint64 accessor. Tightened mutex usage in setHighestNonceReceived to avoid an early-return race path.
core/process/interface.go
- Added HighestNonceReceived() uint64 to the ForkDetector interface.
common/mock/forkDetectorMock.go
- Mock updated to implement HighestNonceReceived callback.
Tests
- core/process/sync/klc1920_repro_test.go: regression tests showing HighestNonceReceived advances under BHProposed-only delivery and that the gossip/probable gap can exceed process.BlockFinality.
- core/process/sync/klc1920_node_state_test.go and core/process/sync/export_test.go: tests validating the new synced-state gate at the BlockFinality boundary (strict > behavior) and metric (klv_is_syncing) changes.

Node Stability & Data Integrity: Improves correctness of sync-state reporting and related metrics. No changes to block/header acceptance, consensus rules, transaction execution, or persistent chain data.

Concurrency & Error Handling: Small concurrency adjustment in setHighestNonceReceived (holding mutFork during compare/update to prevent missed early-return). No broad error-handling changes.

Validation: Unit tests for the sync package (including new regression tests) passed. An empirical repro using an instrumented build shows pre-fix nodes could report synced while falling ~75 blocks behind; post-fix behavior keeps fallback close (2-block gap) and reports sync accurately. Repro artifacts and runner script included in the PR.

Cross-cutting Notes: Monitoring and tooling that depend on probable/highest-nonce or isNodeSynchronized metrics will observe more accurate sync state. The change is defensive and backwards-compatible with consensus/state integrity.

coderabbitai · 2026-06-03T11:39:14Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5843c653-f1ff-49ef-817f-0716d05a73b2

📥 Commits

Reviewing files that changed from the base of the PR and between b860d55 and e97e103.

📒 Files selected for processing (1)

core/process/sync/klc1920_node_state_test.go

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: setup-and-lint / setup-and-lint
GitHub Check: Analyze (go)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.go

📄 CodeRabbit inference engine (Custom checks)

**/*.go: Verify that any new or modified concurrent code (goroutines, channels, mutexes, sync primitives) is free of race conditions. Check for: proper lock/unlock pairing, no goroutine leaks, correct channel lifecycle management, and proper context cancellation propagation.
Verify that errors are not silently discarded. Check for: unchecked error returns, error wrapping with context, proper error propagation up the call chain, and no bare panic() calls outside of init() functions.

Files:

core/process/sync/klc1920_node_state_test.go

**/*_test.go

⚙️ CodeRabbit configuration file

**/*_test.go: Test files. Review for: - Adequate coverage of edge cases and error paths - Proper use of test helpers and assertions - Race condition coverage (tests should use -race flag patterns) - No hardcoded sleep for synchronization (use channels or sync primitives) - Test isolation (no shared mutable state between tests)

Files:

core/process/sync/klc1920_node_state_test.go

🧠 Learnings (4)

📚 Learning: 2026-04-07T14:36:46.394Z

Learnt from: RomuloSiebra
Repo: klever-io/klever-go PR: 35
File: network/p2p/libp2p/peerid_stability_test.go:100-116
Timestamp: 2026-04-07T14:36:46.394Z
Learning: In `network/p2p/libp2p/peerid_stability_test.go` (Go, klever-go repo), the empty-seed tests (`TestCreateP2PPrivKey_EmptySeed_NoError` and `TestCreateP2PPrivKey_EmptySeed_LegacySeed_NoError`) intentionally only assert `NoError/NotNil`. A previous `NotEqual` assertion across two `crypto/rand`-backed calls was deliberately removed because it is a probabilistic assertion that re-verifies OS/stdlib entropy rather than project logic. Do not suggest adding `NotEqual` comparisons for empty-seed / `crypto/rand` paths in this codebase.

Applied to files:

core/process/sync/klc1920_node_state_test.go

📚 Learning: 2024-11-19T20:43:36.454Z

Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 879
File: core/statistics/tpsBenchmark.go:167-170
Timestamp: 2024-11-19T20:43:36.454Z
Learning: In the `core/statistics/tpsBenchmark.go` file, within the `updateStatistics` method, note that `Header.Nonce` can be zero (e.g., for the genesis block), so adding a zero nonce check is unnecessary.

Applied to files:

core/process/sync/klc1920_node_state_test.go

📚 Learning: 2026-04-21T20:12:22.959Z

Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 38
File: indexer/eventsProcessor.go:188-211
Timestamp: 2026-04-21T20:12:22.959Z
Learning: In Go structs that are JSON-marshaled, if a field is a `bool` and has the `json:"...,omitempty"` tag, then leaving that field at its zero value (`false`) is functionally equivalent (in the resulting JSON) to explicitly setting `Foundation: false`. Reviewers should not flag struct literals that omit such `bool` fields as an inconsistency; they will serialize identically because `omitempty` suppresses `false` values.

Applied to files:

core/process/sync/klc1920_node_state_test.go

📚 Learning: 2026-05-23T22:52:58.065Z

Learnt from: fbsobreira
Repo: klever-io/klever-go PR: 65
File: data/blockchain/blockchain.go:170-172
Timestamp: 2026-05-23T22:52:58.065Z
Learning: In Go, the pattern `append([]byte(nil), src...)` should be treated as preserving nil identity when `src` is a nil `[]byte`: spreading a nil slice contributes zero variadic arguments, so `append` performs no allocation and returns the original nil destination slice unchanged (i.e., result is nil, not an empty non-nil slice). Do not flag this as an incorrect empty-slice conversion; it intentionally maintains `nil`.

Applied to files:

core/process/sync/klc1920_node_state_test.go

🔇 Additional comments (4)

core/process/sync/klc1920_node_state_test.go (4)

33-35: LGTM!

93-119: LGTM!

121-144: LGTM!

145-166: LGTM!

Walkthrough

Adds HighestNonceReceived() to the ForkDetector API and mock, implements it in baseForkDetector with a guarded setter, updates computeNodeState to mark the node not synced when gossip is ahead by > BlockFinality, and adds regression tests and test harness reproducing KLC-1920.

Changes

Fork detector highest nonce tracking

Layer / File(s)	Summary
ForkDetector contract and mock support `core/process/interface.go`, `common/mock/forkDetectorMock.go`	`ForkDetector` interface gains `HighestNonceReceived() uint64`; `ForkDetectorMock` adds `HighestNonceReceivedCalled` callback and a mock `HighestNonceReceived()` method.
baseForkDetector implementation and setter guard `core/process/sync/baseForkDetector.go`	Implements exported `HighestNonceReceived()` delegating to internal state; updates `setHighestNonceReceived` to perform comparison while holding `mutFork` and early-return to avoid races.
Node state computation with finality guard `core/process/sync/baseSync.go`	`computeNodeState` reads `probableHighestNonce` and `highestNonceReceived`, computes gap relative to `currentBlockNonce`, and forces `hasLastBlock=false` when gap > `process.BlockFinality`; debug logging updated.
KLC-1920 regression tests and test harness `core/process/sync/klc1920_repro_test.go`, `core/process/sync/export_test.go`, `core/process/sync/klc1920_node_state_test.go`	Adds tests asserting `HighestNonceReceived` advances under `BHProposed`-only delivery while `ProbableHighestNonce` stays at the last processed nonce, verifies gap > `BlockFinality` triggers not-synced behavior, and provides exported test bootstrap and observable metrics helper.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

consensus-critical

Suggested reviewers

RomuloSiebra

🚥 Pre-merge checks | ✅ 8

✅ Passed checks (8 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title follows the required format [KLC-XXXX] type: description with KLC-1920 JIRA key, 'fix' type, and a clear description of the change.
Docstring Coverage	✅ Passed	Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Concurrency Safety	✅ Passed	All new and modified concurrent code is free of race conditions: proper lock/unlock pairing, no goroutine leaks, correct early returns with explicit unlocks, and no nested locks.
Error Handling	✅ Passed	All error returns are properly checked (assert.Nil), no bare panic() calls, no unchecked errors, and computeNodeState only invokes non-error-returning methods.
State Consistency	✅ Passed	PR modifies only synchronization state (boolean flags, uint64 nonces), not blockchain ledger state. Proper mutex protection with lock/defer ensures thread safety.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch klc-1920-klc-2389-track-gossip-in-probable-highest

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

core/process/sync/baseSync.go (1)

295-301: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Apply the gossip-gap guard when currentHeader is nil too.

Line 296 can still mark boot.hasLastBlock=true using only ProbableHighestNonce, so a genesis-only node can be misclassified as synced while gossip is already ahead by more than BlockFinality.

Proposed fix

 	lastNonce := genesisNonce
 	lastSlot := boot.chainHandler.GetGenesisHeader().GetSlot()
-	if check.IfNil(currentHeader) {
-		boot.hasLastBlock = boot.forkDetector.ProbableHighestNonce() == genesisNonce
-		log.Debug("computeNodeState",
-			"probableHighestNonce", boot.forkDetector.ProbableHighestNonce(),
-			"currentBlockNonce", nil,
-			"boot.hasLastBlock", boot.hasLastBlock)
-	} else {
+	currentBlockNonce := genesisNonce
+	if check.IfNil(currentHeader) {
+		// keep genesis nonce
+	} else {
 		lastNonce = currentHeader.GetNonce()
 		lastSlot = currentHeader.GetSlot()
-		currentBlockNonce := boot.chainHandler.GetCurrentBlockHeader().GetNonce()
-		probableHighestNonce := boot.forkDetector.ProbableHighestNonce()
-		highestNonceReceived := boot.forkDetector.HighestNonceReceived()
-		boot.hasLastBlock = probableHighestNonce <= currentBlockNonce
-		// KLC-1920: gossip-derived ceiling is the source of truth that
-		// probableHighestNonce can lag behind when the BHReceived path is
-		// disrupted (peer churn after an election, fallback observer not
-		// receiving fetched headers). If gossip reports the network ahead
-		// by more than the normal proposal/commit window, the node is not
-		// really synced even if probableHighestNonce equals currentBlockNonce.
-		if highestNonceReceived > currentBlockNonce+process.BlockFinality {
-			boot.hasLastBlock = false
-		}
-		log.Debug("computeNodeState",
-			"probableHighestNonce", probableHighestNonce,
-			"highestNonceReceived", highestNonceReceived,
-			"currentBlockNonce", currentBlockNonce,
-			"boot.hasLastBlock", boot.hasLastBlock)
+		currentBlockNonce = currentHeader.GetNonce()
 	}
+	probableHighestNonce := boot.forkDetector.ProbableHighestNonce()
+	highestNonceReceived := boot.forkDetector.HighestNonceReceived()
+	boot.hasLastBlock = probableHighestNonce <= currentBlockNonce
+	if highestNonceReceived > currentBlockNonce &&
+		highestNonceReceived-currentBlockNonce > uint64(process.BlockFinality) {
+		boot.hasLastBlock = false
+	}
+	log.Debug("computeNodeState",
+		"probableHighestNonce", probableHighestNonce,
+		"highestNonceReceived", highestNonceReceived,
+		"currentBlockNonce", currentBlockNonce,
+		"boot.hasLastBlock", boot.hasLastBlock)

Also applies to: 304-316

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/process/sync/baseSync.go` around lines 295 - 301, When currentHeader is
nil you still set boot.hasLastBlock based only on
boot.forkDetector.ProbableHighestNonce() which can misclassify a genesis-only
node as synced; change the check so boot.hasLastBlock is set true only if the
gossip gap is within BlockFinality (e.g.
boot.forkDetector.ProbableHighestNonce() - genesisNonce <= BlockFinality). Apply
the same gossip-gap guard to the existing branch that handles non-nil
currentHeader (the logic around check.IfNil(currentHeader), boot.hasLastBlock,
boot.forkDetector.ProbableHighestNonce(), and genesisNonce) so both paths use
the BlockFinality comparison before marking hasLastBlock.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/process/sync/baseForkDetector.go`:
- Around line 307-314: The highestNonceReceived updates can regress under
concurrency; update the implementation so reads/writes are atomic and monotonic:
replace the mutable field usage in baseForkDetector with an atomic uint64 (use
atomic.LoadUint64 for HighestNonceReceived() and an atomic CAS loop to set the
new value only if greater) or protect the update path in highestNonceReceived
(and anywhere that assigns to that field, e.g., the writer path currently doing
a compare outside the lock) by moving the compare-and-set inside the same mutex.
In short, ensure highestNonceReceived uses atomic.LoadUint64 for reads and an
atomic CompareAndSwap/Store-max loop (or a locked section) to only increase the
stored value.

In `@core/process/sync/klc1920_repro_test.go`:
- Line 86: The test currently ignores the error returned by bfd.AddHeader in the
repro loop; update the loop to check and fail on that error (e.g., using
t.Fatalf/t.Fatal or the test assertion helper in use) so BHProposed insertion
failures cause the test to fail. Locate the call to
bfd.AddHeader(&block.Block{Header: hdr}, hash, process.BHProposed, nil, nil) and
replace the silent discard with error handling that reports the error and aborts
the test.

---

Outside diff comments:
In `@core/process/sync/baseSync.go`:
- Around line 295-301: When currentHeader is nil you still set boot.hasLastBlock
based only on boot.forkDetector.ProbableHighestNonce() which can misclassify a
genesis-only node as synced; change the check so boot.hasLastBlock is set true
only if the gossip gap is within BlockFinality (e.g.
boot.forkDetector.ProbableHighestNonce() - genesisNonce <= BlockFinality). Apply
the same gossip-gap guard to the existing branch that handles non-nil
currentHeader (the logic around check.IfNil(currentHeader), boot.hasLastBlock,
boot.forkDetector.ProbableHighestNonce(), and genesisNonce) so both paths use
the BlockFinality comparison before marking hasLastBlock.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e6680200-113a-4926-8790-4fe62c354ba8

📥 Commits

Reviewing files that changed from the base of the PR and between 785b77c and 240a6e5.

📒 Files selected for processing (5)

common/mock/forkDetectorMock.go
core/process/interface.go
core/process/sync/baseForkDetector.go
core/process/sync/baseSync.go
core/process/sync/klc1920_repro_test.go

📜 Review details

🧰 Additional context used

📓 Path-based instructions (2)

**/*.go

📄 CodeRabbit inference engine (Custom checks)

**/*.go: Verify that any new or modified concurrent code (goroutines, channels, mutexes, sync primitives) is free of race conditions. Check for: proper lock/unlock pairing, no goroutine leaks, correct channel lifecycle management, and proper context cancellation propagation.
Verify that errors are not silently discarded. Check for: unchecked error returns, error wrapping with context, proper error propagation up the call chain, and no bare panic() calls outside of init() functions.

Files:

common/mock/forkDetectorMock.go
core/process/sync/baseForkDetector.go
core/process/interface.go
core/process/sync/klc1920_repro_test.go
core/process/sync/baseSync.go

**/*_test.go

⚙️ CodeRabbit configuration file

**/*_test.go: Test files. Review for: - Adequate coverage of edge cases and error paths - Proper use of test helpers and assertions - Race condition coverage (tests should use -race flag patterns) - No hardcoded sleep for synchronization (use channels or sync primitives) - Test isolation (no shared mutable state between tests)

Files:

core/process/sync/klc1920_repro_test.go

🧠 Learnings (7)

📓 Common learnings

Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 879
File: core/statistics/tpsBenchmark.go:167-170
Timestamp: 2024-11-19T20:43:36.454Z
Learning: In the `core/statistics/tpsBenchmark.go` file, within the `updateStatistics` method, note that `Header.Nonce` can be zero (e.g., for the genesis block), so adding a zero nonce check is unnecessary.

📚 Learning: 2026-04-21T20:12:22.959Z

Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 38
File: indexer/eventsProcessor.go:188-211
Timestamp: 2026-04-21T20:12:22.959Z
Learning: In Go structs that are JSON-marshaled, if a field is a `bool` and has the `json:"...,omitempty"` tag, then leaving that field at its zero value (`false`) is functionally equivalent (in the resulting JSON) to explicitly setting `Foundation: false`. Reviewers should not flag struct literals that omit such `bool` fields as an inconsistency; they will serialize identically because `omitempty` suppresses `false` values.

Applied to files:

common/mock/forkDetectorMock.go
core/process/sync/baseForkDetector.go
core/process/interface.go
core/process/sync/klc1920_repro_test.go
core/process/sync/baseSync.go

📚 Learning: 2026-05-23T22:52:58.065Z

Learnt from: fbsobreira
Repo: klever-io/klever-go PR: 65
File: data/blockchain/blockchain.go:170-172
Timestamp: 2026-05-23T22:52:58.065Z
Learning: In Go, the pattern `append([]byte(nil), src...)` should be treated as preserving nil identity when `src` is a nil `[]byte`: spreading a nil slice contributes zero variadic arguments, so `append` performs no allocation and returns the original nil destination slice unchanged (i.e., result is nil, not an empty non-nil slice). Do not flag this as an incorrect empty-slice conversion; it intentionally maintains `nil`.

Applied to files:

common/mock/forkDetectorMock.go
core/process/sync/baseForkDetector.go
core/process/interface.go
core/process/sync/klc1920_repro_test.go
core/process/sync/baseSync.go

📚 Learning: 2024-11-19T20:43:36.454Z

Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 879
File: core/statistics/tpsBenchmark.go:167-170
Timestamp: 2024-11-19T20:43:36.454Z
Learning: In the `core/statistics/tpsBenchmark.go` file, within the `updateStatistics` method, note that `Header.Nonce` can be zero (e.g., for the genesis block), so adding a zero nonce check is unnecessary.

Applied to files:

core/process/sync/baseForkDetector.go
core/process/sync/klc1920_repro_test.go
core/process/sync/baseSync.go

📚 Learning: 2026-04-07T14:36:46.394Z

Learnt from: RomuloSiebra
Repo: klever-io/klever-go PR: 35
File: network/p2p/libp2p/peerid_stability_test.go:100-116
Timestamp: 2026-04-07T14:36:46.394Z
Learning: In `network/p2p/libp2p/peerid_stability_test.go` (Go, klever-go repo), the empty-seed tests (`TestCreateP2PPrivKey_EmptySeed_NoError` and `TestCreateP2PPrivKey_EmptySeed_LegacySeed_NoError`) intentionally only assert `NoError/NotNil`. A previous `NotEqual` assertion across two `crypto/rand`-backed calls was deliberately removed because it is a probabilistic assertion that re-verifies OS/stdlib entropy rather than project logic. Do not suggest adding `NotEqual` comparisons for empty-seed / `crypto/rand` paths in this codebase.

Applied to files:

core/process/sync/klc1920_repro_test.go

📚 Learning: 2024-11-18T18:23:56.020Z

Learnt from: fbsobreira
Repo: klever-io/klever-go PR: 877
File: core/process/transaction/txProcess_test.go:2513-2516
Timestamp: 2024-11-18T18:23:56.020Z
Learning: In the test function `TestTxProcessor_ProcessClaimAllowanceKFIShouldFail` in `core/process/transaction/txProcess_test.go`, both validation and processing steps should be tested even when validation is expected to fail, to ensure that the processing correctly handles invalid inputs.

Applied to files:

core/process/sync/klc1920_repro_test.go

📚 Learning: 2024-11-18T18:25:28.208Z

Learnt from: fbsobreira
Repo: klever-io/klever-go PR: 877
File: core/process/transaction/txProcess_test.go:2481-2484
Timestamp: 2024-11-18T18:25:28.208Z
Learning: In the `TestTxProcessor_ProcessClaimAllowanceOkValsShouldWork` function in `core/process/transaction/txProcess_test.go`, the transaction processing and assertions are already included.

Applied to files:

core/process/sync/klc1920_repro_test.go

🔇 Additional comments (2)

core/process/interface.go (1)

65-65: LGTM!

common/mock/forkDetectorMock.go (1)

16-16: LGTM!

Also applies to: 65-70

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/process/sync/klc1920_node_state_test.go`:
- Around line 122-141: The test
TestKLC1920_ComputeNodeState_GossipWithinFinalityStaysSynced currently hardcodes
the `highest` value (51) to model `gap == BlockFinality`; change the test to
compute `highest` from the real constant by using `process.BlockFinality` (e.g.
set `highest := probable + process.BlockFinality`) when calling
buildKLC1920Bootstrap so the boundary remains correct if BlockFinality changes,
and optionally add a sibling test that sets `highest := probable +
process.BlockFinality + 1` to exercise the `gap == BlockFinality+1` case; update
references around TestKLC1920_ComputeNodeState_GossipWithinFinalityStaysSynced,
buildKLC1920Bootstrap and ComputeNodeState accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 767bcc76-5b47-468f-bb17-fe7a51543940

📥 Commits

Reviewing files that changed from the base of the PR and between 875fbfc and b860d55.

📒 Files selected for processing (2)

core/process/sync/export_test.go
core/process/sync/klc1920_node_state_test.go

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: setup-and-lint / setup-and-lint
GitHub Check: Analyze (go)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.go

📄 CodeRabbit inference engine (Custom checks)

**/*.go: Verify that any new or modified concurrent code (goroutines, channels, mutexes, sync primitives) is free of race conditions. Check for: proper lock/unlock pairing, no goroutine leaks, correct channel lifecycle management, and proper context cancellation propagation.
Verify that errors are not silently discarded. Check for: unchecked error returns, error wrapping with context, proper error propagation up the call chain, and no bare panic() calls outside of init() functions.

Files:

core/process/sync/export_test.go
core/process/sync/klc1920_node_state_test.go

**/*_test.go

⚙️ CodeRabbit configuration file

**/*_test.go: Test files. Review for: - Adequate coverage of edge cases and error paths - Proper use of test helpers and assertions - Race condition coverage (tests should use -race flag patterns) - No hardcoded sleep for synchronization (use channels or sync primitives) - Test isolation (no shared mutable state between tests)

Files:

core/process/sync/export_test.go
core/process/sync/klc1920_node_state_test.go

🧠 Learnings (3)

📚 Learning: 2026-04-21T20:12:22.959Z

Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 38
File: indexer/eventsProcessor.go:188-211
Timestamp: 2026-04-21T20:12:22.959Z
Learning: In Go structs that are JSON-marshaled, if a field is a `bool` and has the `json:"...,omitempty"` tag, then leaving that field at its zero value (`false`) is functionally equivalent (in the resulting JSON) to explicitly setting `Foundation: false`. Reviewers should not flag struct literals that omit such `bool` fields as an inconsistency; they will serialize identically because `omitempty` suppresses `false` values.

Applied to files:

core/process/sync/export_test.go
core/process/sync/klc1920_node_state_test.go

📚 Learning: 2026-05-23T22:52:58.065Z

Learnt from: fbsobreira
Repo: klever-io/klever-go PR: 65
File: data/blockchain/blockchain.go:170-172
Timestamp: 2026-05-23T22:52:58.065Z
Learning: In Go, the pattern `append([]byte(nil), src...)` should be treated as preserving nil identity when `src` is a nil `[]byte`: spreading a nil slice contributes zero variadic arguments, so `append` performs no allocation and returns the original nil destination slice unchanged (i.e., result is nil, not an empty non-nil slice). Do not flag this as an incorrect empty-slice conversion; it intentionally maintains `nil`.

Applied to files:

core/process/sync/export_test.go
core/process/sync/klc1920_node_state_test.go

📚 Learning: 2026-04-07T14:36:46.394Z

Learnt from: RomuloSiebra
Repo: klever-io/klever-go PR: 35
File: network/p2p/libp2p/peerid_stability_test.go:100-116
Timestamp: 2026-04-07T14:36:46.394Z
Learning: In `network/p2p/libp2p/peerid_stability_test.go` (Go, klever-go repo), the empty-seed tests (`TestCreateP2PPrivKey_EmptySeed_NoError` and `TestCreateP2PPrivKey_EmptySeed_LegacySeed_NoError`) intentionally only assert `NoError/NotNil`. A previous `NotEqual` assertion across two `crypto/rand`-backed calls was deliberately removed because it is a probabilistic assertion that re-verifies OS/stdlib entropy rather than project logic. Do not suggest adding `NotEqual` comparisons for empty-seed / `crypto/rand` paths in this codebase.

Applied to files:

core/process/sync/klc1920_node_state_test.go

🔇 Additional comments (3)

core/process/sync/klc1920_node_state_test.go (1)

28-53: LGTM!

core/process/sync/export_test.go (2)

36-46: ⚡ Quick win

Remove/adjust concern: no accessor name collision on baseBootstrap
func (boot *baseBootstrap) IsNodeSynchronized() and func (boot *baseBootstrap) HasLastBlock() only exist in core/process/sync/export_test.go; there are no non-_test.go receiver methods on baseBootstrap with those names, so this won’t trigger a duplicate-method compile error.

25-34: ⚡ Quick win

Resolve: slotManager.BeforeGenesis() gates the spawned goroutine, so it won’t nil-deref in these tests.

computeNodeState() only starts go boot.requestHeadersIfSyncIsStuck() when shouldTryToRequestHeaders() is true, and shouldTryToRequestHeaders() returns false immediately when boot.slotManager.BeforeGenesis() is true. In core/process/sync/klc1920_node_state_test.go, the consensusMock.SlotManagerMock passed into NewBaseBootstrapForKLC1920Test has BeforeGenesisCalled: func() bool { return true } (explicitly suppressing the requestHeadersIfSyncIsStuck path), so the partially-initialized baseBootstrap won’t hit nil dependencies in requestHeaders().

fbsobreira

Check comments and the tests file is an outlier vs. the package conventions. Pease fold them into the existing metaForkDetector_test.go, keeping the TestKLC1920_* names as regression markers. The computeNodeState boundary test (coverage comment) targets baseSync.go, so it should go in a new baseSync_test.go — next to the code it actually guards.

fbsobreira · 2026-06-03T17:37:56Z

 		lastNonce = currentHeader.GetNonce()
 		lastSlot = currentHeader.GetSlot()
-		boot.hasLastBlock = boot.forkDetector.ProbableHighestNonce() <= boot.chainHandler.GetCurrentBlockHeader().GetNonce()
+		currentBlockNonce := boot.chainHandler.GetCurrentBlockHeader().GetNonce()


this is the refetch of lastNonce from L#302

fbsobreira · 2026-06-03T17:40:45Z

+	assert.Equal(t, uint64(10), bfd.ProbableHighestNonce(),
+		"probableHighestNonce intentionally stays at last processed — BHProposed must not advance it (would break consensus during proposal rounds)")
+
+	gap := bfd.HighestNonceReceived() - bfd.ProbableHighestNonce()


Test pins HighestNonceReceived − ProbableHighestNonce, but fix compares against currentBlockNonce. Coincide here only because no BHReceived headers were added.
Reframe the assertion around HighestNonceReceived − currentBlockNonce so the guard tracks what the fix actually evaluates.

fbsobreira · 2026-06-03T17:48:04Z

+		// receiving fetched headers). If gossip reports the network ahead
+		// by more than the normal proposal/commit window, the node is not
+		// really synced even if probableHighestNonce equals currentBlockNonce.
+		if highestNonceReceived > currentBlockNonce+process.BlockFinality {


BlockFinality is hardcoded to 1, so this guard trips whenever the gossiped nonce ceiling (highestNonceReceived) runs ≥ 2 blocks ahead of the committed tip. That gap is reached during normal propagation/commit latency or a single missed round — i.e. when the node is briefly one block behind while the next proposal is already gossiping in — which would cause transient false not-synced flapping. Suggest widening the tolerance (e.g. tie it to the existing "max rounds without a new block" value, or BlockFinality + k) so benign one-block lag doesn't flip the state.

Force not-synced when gossip is ahead of probable highest nonce

240a6e5

nickgs1337 marked this pull request as ready for review June 3, 2026 11:52

coderabbitai Bot added the breaking-change label Jun 3, 2026

coderabbitai Bot requested changes Jun 3, 2026

View reviewed changes

Comment thread core/process/sync/baseForkDetector.go

Comment thread core/process/sync/klc1920_repro_test.go Outdated

Fix highestNonceReceived race + check AddHeader errors in repro test

875fbfc

coderabbitai Bot previously approved these changes Jun 3, 2026

View reviewed changes

Add computeNodeState coverage for KLC-1920 gossip-ahead branch

b860d55

nickgs1337 dismissed coderabbitai[bot]’s stale review via b860d55 June 3, 2026 12:46

coderabbitai Bot requested changes Jun 3, 2026

View reviewed changes

Comment thread core/process/sync/klc1920_node_state_test.go Outdated

Anchor synced-gate boundary tests to process.BlockFinality

e97e103

coderabbitai Bot added consensus-critical and removed breaking-change labels Jun 3, 2026

coderabbitai Bot approved these changes Jun 3, 2026

View reviewed changes

fbsobreira requested changes Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KLC-1920] fix: force not-synced when gossip is ahead of probable highest nonce#69

[KLC-1920] fix: force not-synced when gossip is ahead of probable highest nonce#69
nickgs1337 wants to merge 4 commits into
developfrom
klc-1920-klc-2389-track-gossip-in-probable-highest

nickgs1337 commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

fbsobreira left a comment

Uh oh!

fbsobreira Jun 3, 2026

Uh oh!

fbsobreira Jun 3, 2026

Uh oh!

fbsobreira Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nickgs1337 commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix

What changed

Validation

Caveats

Why not other shapes

Consensus and Synchronization Stability Fix

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fbsobreira left a comment

Choose a reason for hiding this comment

Uh oh!

fbsobreira Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

fbsobreira Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

fbsobreira Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nickgs1337 commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading