fix(leaderlog): verify and enforce chain-derived epoch nonces before calculating schedules by wcatz · Pull Request #71 · wcatz/goduckbot

wcatz · 2026-02-12T13:45:30Z

Motivation

Leader schedule output could be wrong when the DB contained stale or incorrect final_nonce values, causing cached schedules to be served with the wrong nonce.
The system needs a canonical, self-derived nonce for full mode (no Koios dependency) so schedule generation is guaranteed correct.

Description

Added GetVerifiedNonceForEpoch(epoch int) to NonceTracker, which in full mode recomputes the epoch nonce from local chain data (ComputeEpochNonce) and repairs the DB cache if it differs, while delegating to the existing path in lite mode.
Replaced nonce lookups used for schedule generation and validation to call the verified path, covering manual /leaderlog, /nextblock, automatic calculateAndPostLeaderlog, schedule cache validation (scheduleNonceMatches), and /nonce output.
Fixed historical backfill schedule logic to use the correct nonce epoch (schedule for epoch N uses the nonce from epoch N-1) and made the code skip cached schedules when their nonces no longer match the verified value.
Minor formatting cleanup (gofmt) and small struct/variable layout normalization.

Testing

Formatted modified files with gofmt -w nonce.go commands.go main.go and the formatter completed successfully.
Ran targeted tests with go test ./... -run 'Test(Nonce|Store)' -count=1 which completed and returned OK (no tests to run in this environment).
Ran full test suite with go test ./... under a timeout, which completed successfully (exit 0).

Summary by CodeRabbit

Bug Fixes
- Enhanced validation of cached leader schedules against current nonce states to prevent stale data usage
- Improved nonce verification consistency and accuracy across schedule calculation workflows
- Strengthened cache invalidation mechanisms to ensure live computation when necessary for schedule recalculation

wcatz · 2026-02-12T13:52:49Z

@CodeRabbit review

coderabbitai · 2026-02-12T13:52:59Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-02-12T13:53:07Z

📝 Walkthrough

Walkthrough

This PR enhances nonce verification by introducing GetVerifiedNonceForEpoch method to validate epoch nonces against cached values, and adds scheduleNonceMatches helper to guard cached leader schedules. The changes systematically replace nonce retrieval calls throughout commands and leaderlog flows to use verified nonces and prevent using stale cached data.

Changes

Cohort / File(s)	Summary
Nonce Verification Core `nonce.go`	Introduces `GetVerifiedNonceForEpoch` method that computes epoch nonce from local chain data, compares against cached value, and persists corrected nonce with source label. Full mode performs verification; lite mode delegates to existing `GetNonceForEpoch`.
Command Flows `commands.go`	Adds internal helper `scheduleNonceMatches` to validate cached LeaderSchedule against currently expected epoch nonce. Replaces `GetNonceForEpoch` calls with `GetVerifiedNonceForEpoch` and guards cached schedules with early nonce validation to avoid stale data.
Leaderlog Trigger & Calculation `main.go`	Updates `checkLeaderlogTrigger` to require `scheduleNonceMatches` check for nextEpoch schedules; changes nonce retrieval to verified path in `calculateAndPostLeaderlog` and `backfillSchedules`; aligns epoch-1 nonce fetching with verified nonce for backfill logic.

Sequence Diagram

sequenceDiagram
    participant Client
    participant CommandHandler as Command<br/>Handler
    participant NonceTracker as NonceTracker
    participant Cache as Local<br/>Cache
    participant Ledger as Ledger/Chain
    
    Client->>CommandHandler: Request (e.g., nextblock/leaderlog)
    CommandHandler->>Cache: Check cached LeaderSchedule
    alt Schedule exists in cache
        CommandHandler->>NonceTracker: GetVerifiedNonceForEpoch(epoch)
        NonceTracker->>Ledger: Compute nonce from chain data
        NonceTracker->>Cache: Compare computed vs cached final_nonce
        alt Nonces match
            NonceTracker-->>CommandHandler: Return verified nonce
            CommandHandler->>CommandHandler: scheduleNonceMatches = true
            CommandHandler-->>Client: Use cached schedule
        else Nonces mismatch (stale)
            NonceTracker->>Cache: Update cache with correction
            NonceTracker-->>CommandHandler: Return verified nonce
            CommandHandler->>CommandHandler: scheduleNonceMatches = false
            CommandHandler->>NonceTracker: Recompute schedule live
            CommandHandler-->>Client: Use fresh schedule
        end
    else No cached schedule
        CommandHandler->>NonceTracker: GetVerifiedNonceForEpoch(epoch)
        NonceTracker-->>CommandHandler: Return verified nonce
        CommandHandler->>NonceTracker: Compute schedule with verified nonce
        CommandHandler-->>Client: Return computed schedule
    end

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly Related PRs

fix(nonce): recompute era-aware nonce from raw VRF output #59: Recomputes epoch nonces from VRF outputs (ComputeEpochNonce/BackfillNonces), which are directly validated by the new GetVerifiedNonceForEpoch method introduced in this PR.
fix(nonce): remove wrong hardcoded epoch nonces #64: Modifies epoch nonce sources by removing hardcoded knownEpochNonces, which directly impacts the nonce verification logic added in this PR.
fix(core): v2.3.10 Koios epoch retry + leaderlog guard #28: Updates checkLeaderlogTrigger to add stronger guards against redundant scheduling; this PR extends that trigger with nonce verification checks.

Poem

🐰 Hop through the epochs, verify as you go,
Cache-checked nonces ensure schedules don't grow stale,
No more stale schedules lurking in the shadows—
Verified nonces stand guard, fresh and true! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: verification and enforcement of chain-derived epoch nonces before calculating leader schedules, which is the core motivation and primary change across commands.go, main.go, and nonce.go.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch codex/identify-issue-with-leaderlog-output-for-epoch-612

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

main.go (1)
1256-1263: ⚠️ Potential issue | 🟡 Minor

Canceled context passed to scheduleNonceMatches.

Line 1258 calls cancel(), then line 1259 passes the now-canceled ctx to scheduleNonceMatches. Currently this is harmless because scheduleNonceMatches never uses the ctx parameter (it creates its own contexts internally), but it's fragile — if someone later adds a DB call using ctx inside scheduleNonceMatches, it will silently fail.
Suggested fix: defer the cancel or create a fresh context
 		ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+		defer cancel()
 		existing, err := i.store.GetLeaderSchedule(ctx, nextEpoch)
-		cancel()
 		if err == nil && existing != nil && i.scheduleNonceMatches(ctx, existing) {

🤖 Fix all issues with AI agents

In `@commands.go`:
- Around line 922-927: The ctxShort scope is canceled via cancelShort() before
being passed to i.scheduleNonceMatches, which is fragile; either postpone
calling cancelShort() until after i.scheduleNonceMatches(...) completes or call
i.scheduleNonceMatches with a fresh context (e.g., context.Background() or a
newly created short-lived context) so you never pass a canceled ctx to
scheduleNonceMatches; update the code around ctxShort/cancelShort,
i.store.GetLeaderSchedule(ctxShort, currentEpoch) and the subsequent call to
i.scheduleNonceMatches to use one of these two approaches.

In `@nonce.go`:
- Around line 319-356: GetVerifiedNonceForEpoch currently always recomputes the
epoch nonce in fullMode which can block hot paths like
scheduleNonceMatches/checkLeaderlogTrigger; add an in-memory cache on
NonceTracker (e.g., verifiedNonceCache map[int][]byte) protected by the existing
mutex so GetVerifiedNonceForEpoch first checks the cache and returns immediately
if present, otherwise proceeds to ComputeEpochNonce and then stores the verified
result into verifiedNonceCache before returning; ensure you update/evict entries
from verifiedNonceCache when final nonces are updated from SetFinalNonce or
other write paths so cache stays consistent.

🧹 Nitpick comments (2)

commands.go (2)
1070-1087: scheduleNonceMatches accepts ctx but never uses it.

The ctx parameter is unused — GetVerifiedNonceForEpoch creates its own internal contexts. This is confusing for callers (some of whom pass already-canceled contexts). Either remove the parameter or propagate it into the nonce verification call.
Option A: drop the unused ctx parameter
-func (i *Indexer) scheduleNonceMatches(ctx context.Context, schedule *LeaderSchedule) bool {
+func (i *Indexer) scheduleNonceMatches(schedule *LeaderSchedule) bool {
Then update all call sites to remove the ctx argument.
637-637: /nonce command now returns verified nonce — note latency impact in full mode.

For /nonce next, this calls GetVerifiedNonceForEpoch(epoch+1) which in full mode triggers ComputeEpochNonce (streaming all blocks from genesis). The user-facing Telegram command could take several minutes to respond. Consider showing a "computing..." message before the call, similar to the leaderlog commands.

coderabbitai · 2026-02-12T13:56:35Z

commands.go

 	ctxShort, cancelShort := context.WithTimeout(context.Background(), 10*time.Second)
 	schedule, err := i.store.GetLeaderSchedule(ctxShort, currentEpoch)
 	cancelShort()
+	if err == nil && schedule != nil && !i.scheduleNonceMatches(ctxShort, schedule) {
+		schedule = nil
+	}


⚠️ Potential issue | 🟡 Minor

Same canceled-context pattern as checkLeaderlogTrigger.

ctxShort is canceled on line 924, then passed to scheduleNonceMatches on line 925. Currently safe because the ctx isn't used inside that function, but fragile.

Suggested fix

ctxShort, cancelShort := context.WithTimeout(context.Background(), 10*time.Second) schedule, err := i.store.GetLeaderSchedule(ctxShort, currentEpoch) - cancelShort() - if err == nil && schedule != nil && !i.scheduleNonceMatches(ctxShort, schedule) { + if err == nil && schedule != nil && !i.scheduleNonceMatches(ctxShort, schedule) { schedule = nil } + cancelShort()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

ctxShort, cancelShort := context.WithTimeout(context.Background(), 10*time.Second)

schedule, err := i.store.GetLeaderSchedule(ctxShort, currentEpoch)

cancelShort()

if err == nil && schedule != nil && !i.scheduleNonceMatches(ctxShort, schedule) {

schedule = nil

}

ctxShort, cancelShort := context.WithTimeout(context.Background(), 10*time.Second)

schedule, err := i.store.GetLeaderSchedule(ctxShort, currentEpoch)

if err == nil && schedule != nil && !i.scheduleNonceMatches(ctxShort, schedule) {

schedule = nil

}

cancelShort()

🤖 Prompt for AI Agents

In `@commands.go` around lines 922 - 927, The ctxShort scope is canceled via cancelShort() before being passed to i.scheduleNonceMatches, which is fragile; either postpone calling cancelShort() until after i.scheduleNonceMatches(...) completes or call i.scheduleNonceMatches with a fresh context (e.g., context.Background() or a newly created short-lived context) so you never pass a canceled ctx to scheduleNonceMatches; update the code around ctxShort/cancelShort, i.store.GetLeaderSchedule(ctxShort, currentEpoch) and the subsequent call to i.scheduleNonceMatches to use one of these two approaches.

coderabbitai · 2026-02-12T13:56:36Z

nonce.go

+// GetVerifiedNonceForEpoch returns a nonce that is verified against canonical
+// data for that epoch, repairing stale DB cache entries if needed.
+//
+// Full mode: always recompute from local chain data and upsert DB cache.
+// Lite mode: use existing lookup priority (DB -> Koios).
+func (nt *NonceTracker) GetVerifiedNonceForEpoch(epoch int) ([]byte, error) {
+	if !nt.fullMode {
+		return nt.GetNonceForEpoch(epoch)
+	}
+
+	computeCtx, computeCancel := context.WithTimeout(context.Background(), 10*time.Minute)
+	defer computeCancel()
+	computed, err := nt.ComputeEpochNonce(computeCtx, epoch)
+	if err != nil {
+		return nil, fmt.Errorf("failed to verify nonce for epoch %d: %w", epoch, err)
+	}
+
+	checkCtx, checkCancel := context.WithTimeout(context.Background(), 5*time.Second)
+	defer checkCancel()
+	cached, cacheErr := nt.store.GetFinalNonce(checkCtx, epoch)
+	if cacheErr == nil && cached != nil && bytes.Equal(cached, computed) {
+		return cached, nil
+	}
+
+	source := "computed-verified"
+	if cacheErr == nil && cached != nil && !bytes.Equal(cached, computed) {
+		log.Printf("Correcting stale cached nonce for epoch %d: cached %x != computed %x", epoch, cached, computed)
+		source = "computed-correction"
+	}
+
+	storeCtx, storeCancel := context.WithTimeout(context.Background(), 5*time.Second)
+	defer storeCancel()
+	if err := nt.store.SetFinalNonce(storeCtx, epoch, computed, source); err != nil {
+		log.Printf("Failed to persist verified nonce for epoch %d: %v", epoch, err)
+	}
+
+	return computed, nil
+}


⚠️ Potential issue | 🟠 Major

GetVerifiedNonceForEpoch always recomputes from genesis in full mode — consider callers on hot paths.

In full mode this streams every block from Shelley genesis (up to 10 minutes). This is fine for on-demand commands (/leaderlog, /nonce) but problematic when called from scheduleNonceMatches inside checkLeaderlogTrigger, which runs synchronously on the block-processing goroutine. After a restart with an existing schedule in DB, the first block past the stability window will block the adder pipeline's event loop for the entire recomputation duration.

Consider caching the verified result per epoch in-memory (e.g., a map[int][]byte guarded by the existing mutex) so repeated calls for the same epoch return instantly after the first verification pass.

Sketch: add in-memory verified nonce cache

type NonceTracker struct { mu sync.Mutex store Store koiosClient *koios.Client evolvingNonce []byte currentEpoch int blockCount int candidateFroze bool networkMagic int fullMode bool + verifiedNonces map[int][]byte // cache: epoch -> verified nonce }

Then in GetVerifiedNonceForEpoch:

func (nt *NonceTracker) GetVerifiedNonceForEpoch(epoch int) ([]byte, error) { if !nt.fullMode { return nt.GetNonceForEpoch(epoch) } + nt.mu.Lock() + if cached, ok := nt.verifiedNonces[epoch]; ok { + nt.mu.Unlock() + return cached, nil + } + nt.mu.Unlock() computeCtx, computeCancel := context.WithTimeout(context.Background(), 10*time.Minute) defer computeCancel() computed, err := nt.ComputeEpochNonce(computeCtx, epoch) if err != nil { return nil, fmt.Errorf("failed to verify nonce for epoch %d: %w", epoch, err) } + nt.mu.Lock() + nt.verifiedNonces[epoch] = computed + nt.mu.Unlock() // ... rest of DB cache check/repair

🤖 Prompt for AI Agents

In `@nonce.go` around lines 319 - 356, GetVerifiedNonceForEpoch currently always recomputes the epoch nonce in fullMode which can block hot paths like scheduleNonceMatches/checkLeaderlogTrigger; add an in-memory cache on NonceTracker (e.g., verifiedNonceCache map[int][]byte) protected by the existing mutex so GetVerifiedNonceForEpoch first checks the cache and returns immediately if present, otherwise proceeds to ComputeEpochNonce and then stores the verified result into verifiedNonceCache before returning; ensure you update/evict entries from verifiedNonceCache when final nonces are updated from SetFinalNonce or other write paths so cache stays consistent.

1. Fix canceled context in /nextblock command: - Move cancelShort() after scheduleNonceMatches() call - Prevents passing canceled context to function 2. Add in-memory verified nonce cache: - Add verifiedNonces map[int][]byte to NonceTracker - Check cache before recomputing from genesis (10min operation) - Prevents blocking adder pipeline on hot paths - Cache guards with existing mutex Resolves CodeRabbit findings from PR #71. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

wcatz · 2026-02-12T14:03:47Z

CodeRabbit issues addressed in #72:

Fixed canceled context in /nextblock command
Added in-memory cache to prevent repeated genesis recomputation (10min operation blocking hot paths)

Please review and merge #72 into this PR before final merge to master.

* fix(leaderlog): verify nonce from chain before serving schedules * fix(pr71): address CodeRabbit performance and context issues 1. Fix canceled context in /nextblock command: - Move cancelShort() after scheduleNonceMatches() call - Prevents passing canceled context to function 2. Add in-memory verified nonce cache: - Add verifiedNonces map[int][]byte to NonceTracker - Check cache before recomputing from genesis (10min operation) - Prevents blocking adder pipeline on hot paths - Cache guards with existing mutex Resolves CodeRabbit findings from PR #71. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

1. Fix canceled context in /nextblock command: - Move cancelShort() after scheduleNonceMatches() call - Prevents passing canceled context to function 2. Add in-memory verified nonce cache: - Add verifiedNonces map[int][]byte to NonceTracker - Check cache before recomputing from genesis (10min operation) - Prevents blocking adder pipeline on hot paths - Cache guards with existing mutex Resolves CodeRabbit findings from PR #71. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(leaderlog): verify nonce from chain before serving schedules * fix(pr71): address CodeRabbit performance and context issues 1. Fix canceled context in /nextblock command: - Move cancelShort() after scheduleNonceMatches() call - Prevents passing canceled context to function 2. Add in-memory verified nonce cache: - Add verifiedNonces map[int][]byte to NonceTracker - Check cache before recomputing from genesis (10min operation) - Prevents blocking adder pipeline on hot paths - Cache guards with existing mutex Resolves CodeRabbit findings from PR #71. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

fix(leaderlog): verify nonce from chain before serving schedules

f697f7c

wcatz added the codex label Feb 12, 2026 — with ChatGPT Codex Connector

wcatz removed the codex label Feb 12, 2026

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

wcatz mentioned this pull request Feb 12, 2026

fix: address CodeRabbit performance and context issues from PR #71 #72

Merged

3 tasks

wcatz closed this Feb 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(leaderlog): verify and enforce chain-derived epoch nonces before calculating schedules#71

fix(leaderlog): verify and enforce chain-derived epoch nonces before calculating schedules#71
wcatz wants to merge 1 commit intomasterfrom
codex/identify-issue-with-leaderlog-output-for-epoch-612

wcatz commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

wcatz commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

wcatz commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wcatz commented Feb 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Testing

Summary by CodeRabbit

Uh oh!

wcatz commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

wcatz commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wcatz commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 12, 2026 •

edited

Loading