Skip to content

fix(replay): pre-balance divergence and get rid of some panics#218

Open
sonicfromnewyoke wants to merge 1 commit intoOverclock-Validator:devfrom
sonicfromnewyoke:sonic/replay-panic
Open

fix(replay): pre-balance divergence and get rid of some panics#218
sonicfromnewyoke wants to merge 1 commit intoOverclock-Validator:devfrom
sonicfromnewyoke:sonic/replay-panic

Conversation

@sonicfromnewyoke
Copy link

Problem

incorrect leader schedule computation at epoch boundaries in the previous replay session - the stale vote account bug (fixed in 3bb4b15) caused fee distributions to go to wrong validators.

no bankhash verification existed to catch this

Summary of Changes

  • return zeroed TxFeeInfo when fee deduction fails, so accumulator doesn't count uncollected fees
  • add bankhash verification - to catch such bugs in async
  • fixed topsort panic on V0 + ALUT txns
  • added nil txMeta guard in CU divergence check
  • fixed nil txMeta deref
  • added tests to cover the real case validation based on the txn reported in the Panic on dev (again) #217

closes #217

cc @smcio @ananthb

@ananthb
Copy link

ananthb commented Mar 15, 2026

Mar 15 10:28:45 enterprise bash[2573393]: ┌──────────────────────────────────────────────────────────────────────────────┐
Mar 15 10:28:45 enterprise bash[2573393]: │ ACCOUNTSDB BEHIND CHAIN TIP                                                  │
Mar 15 10:28:45 enterprise bash[2573393]: ├──────────────────────────────────────────────────────────────────────────────┤
Mar 15 10:28:45 enterprise bash[2573393]: │ AccountsDB last slot:   405,551,839                                          │
Mar 15 10:28:45 enterprise bash[2573393]: │ Chain tip slot:         406,506,304                                          │
Mar 15 10:28:45 enterprise bash[2573393]: │ Slots behind:           954,465                                              │
Mar 15 10:28:45 enterprise bash[2573393]: ├──────────────────────────────────────────────────────────────────────────────┤
Mar 15 10:28:45 enterprise bash[2573393]: │ OPTIONS:                                                                     │
Mar 15 10:28:45 enterprise bash[2573393]: │  [1] Continue from AccountsDB (replay 954,465 slots)                         │
Mar 15 10:28:45 enterprise bash[2573393]: │  [2] Start fresh from latest snapshot (faster to catch up)                   │
Mar 15 10:28:45 enterprise bash[2573393]: │                                                                              │
Mar 15 10:28:45 enterprise bash[2573393]: └──────────────────────────────────────────────────────────────────────────────┘
Mar 15 10:28:47 enterprise bash[2573393]: Enter choice (1 or 2): (+    4s) AccountsDB is 954465 slots behind chain tip
Mar 15 10:28:47 enterprise bash[2573393]: (+    4s) mode=auto: Resuming from existing AccountsDB at slot 405551839
Mar 15 10:28:47 enterprise bash[2573393]: (+    4s) StoreAsync=false
Mar 15 10:28:49 enterprise bash[2573393]: (+    7s) Started RPC server on port 8899
Mar 15 10:28:49 enterprise bash[2573393]: (+    7s) Block fetching configured with 2 RPC endpoints (primary + 1 backups)
Mar 15 10:28:51 enterprise bash[2573393]: (+    8s) Loading vote and stake caches (aggregate-only mode, 1486631 stake accounts)
Mar 15 10:29:10 enterprise bash[2573393]: === Replay Start ===
Mar 15 10:29:11 enterprise bash[2573393]: (+   29s) startup stake check: rawScanTotal=634361130941316969 epochEffectiveTotal=421071332193356228 delta=213289798747960741 (expected difference from warmup/cooldown)
Mar 15 10:29:11 enterprise bash[2573393]: (+   29s) loaded RecentBlockhashes sysvar: 150 entries, newest=b0b0778f5f795341, oldest=1ed4cc25e0f6a432
Mar 15 10:29:11 enterprise bash[2573393]: (+   29s) ERROR: [run:3a7aaf54] DIVERGENCE in slot 405551840: tx 3qcpe2xySzdn48KBq7Zff4e1wZHuj9kcqqJABVQ8w9VFuP1H9r7h3d5xHEPnXQbwwF4qNmfjNzmMDSH66GBGxyAT pre-balance mismatch for 76rcGHdPvgs8G1XrzCXUTWtwgT59AFDvpB4VbTS2TBBJ: mithril=51492474108, onchain=51492669108
Mar 15 10:29:11 enterprise bash[2573393]: (+   29s) ERROR: [run:3a7aaf54] DIVERGENCE in slot 405551840: tx EZBnYQC1qZ4Ra6F64wZo3fgA4vFtxW2yVV6eX1U8ESaRxQPJXpwhLHL54efScyHDHFU5a4cQava7nZGR2ob3ALB pre-balance mismatch for BtsmiEEvnSuUnKxqXj2PZRYpPJAc7C34mGz8gtJ1DAaH: mithril=102353086495, onchain=102353286495
Mar 15 10:29:11 enterprise bash[2573393]: panic: tx 3qcpe2xySzdn48KBq7Zff4e1wZHuj9kcqqJABVQ8w9VFuP1H9r7h3d5xHEPnXQbwwF4qNmfjNzmMDSH66GBGxyAT pre-balance divergence: lamport balance for 76rcGHdPvgs8G1XrzCXUTWtwgT59AFDvpB4VbTS2TBBJ was 51492474108 but onchain lamport balance was 51492669108
Mar 15 10:29:11 enterprise bash[2573393]:         acct - slot: 405551839, pubkey: 76rcGHdPvgs8G1XrzCXUTWtwgT59AFDvpB4VbTS2TBBJ, owner: 11111111111111111111111111111111, lamports: 51492474108, executable: false, rent epoch: 18446744073709551615, data len: 0, data hash: CeeM54NJ6EoxLi4VGXLDC1jvNL9SGmppXXUMbAQmoaZw
Mar 15 10:29:11 enterprise bash[2573393]:
Mar 15 10:29:11 enterprise bash[2573393]: goroutine 5623 [running]:
Mar 15 10:29:11 enterprise bash[2573393]: github.com/Overclock-Validator/mithril/pkg/replay.ProcessTransaction(0xc020392680, 0xc0910eb890, 0xc07bf590e0, 0xc07bf7e420, 0xc000388510, 0xc00e217f20)
Mar 15 10:29:11 enterprise bash[2573393]:         github.com/Overclock-Validator/mithril/pkg/replay/transaction.go:410 +0x31b4
Mar 15 10:29:11 enterprise bash[2573393]: github.com/Overclock-Validator/mithril/pkg/replay.parallelTxLoop.func2()
Mar 15 10:29:11 enterprise bash[2573393]:         github.com/Overclock-Validator/mithril/pkg/replay/block.go:2101 +0x21c
Mar 15 10:29:11 enterprise bash[2573393]: created by github.com/Overclock-Validator/mithril/pkg/replay.parallelTxLoop in goroutine 1
Mar 15 10:29:11 enterprise bash[2573393]:         github.com/Overclock-Validator/mithril/pkg/replay/block.go:2096 +0x7f9
Mar 15 10:29:11 enterprise bash[2573393]: panic: tx EZBnYQC1qZ4Ra6F64wZo3fgA4vFtxW2yVV6eX1U8ESaRxQPJXpwhLHL54efScyHDHFU5a4cQava7nZGR2ob3ALB pre-balance divergence: lamport balance for BtsmiEEvnSuUnKxqXj2PZRYpPJAc7C34mGz8gtJ1DAaH was 102353086495 but onchain lamport balance was 102353286495
Mar 15 10:29:11 enterprise bash[2573393]:         acct - slot: 405551839, pubkey: BtsmiEEvnSuUnKxqXj2PZRYpPJAc7C34mGz8gtJ1DAaH, owner: 11111111111111111111111111111111, lamports: 102353086495, executable: false, rent epoch: 18446744073709551615, data len: 0, data hash: 8JANRduKgz9pkcBFoucNVNHKe3zYcQBbzgFzMzxjfu2c
Mar 15 10:29:11 enterprise bash[2573393]:
Mar 15 10:29:11 enterprise bash[2573393]: goroutine 5610 [running]:
Mar 15 10:29:11 enterprise bash[2573393]: github.com/Overclock-Validator/mithril/pkg/replay.ProcessTransaction(0xc020392680, 0xc0910eb890, 0xc07bf58960, 0xc07bf4f1e0, 0xc000388510, 0xc00e217cb0)
Mar 15 10:29:11 enterprise bash[2573393]:         github.com/Overclock-Validator/mithril/pkg/replay/transaction.go:410 +0x31b4
Mar 15 10:29:11 enterprise bash[2573393]: github.com/Overclock-Validator/mithril/pkg/replay.parallelTxLoop.func2()
Mar 15 10:29:11 enterprise bash[2573393]:         github.com/Overclock-Validator/mithril/pkg/replay/block.go:2101 +0x21c
Mar 15 10:29:11 enterprise bash[2573393]: created by github.com/Overclock-Validator/mithril/pkg/replay.parallelTxLoop in goroutine 1
Mar 15 10:29:11 enterprise bash[2573393]:         github.com/Overclock-Validator/mithril/pkg/replay/block.go:2096 +0x7f9

Got this crash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panic on dev (again)

2 participants