Refresh snap-sync account storage root via verified GetAccountRange by asdacap · Pull Request #12082 · NethermindEth/nethermind

asdacap · 2026-06-22T09:42:12Z

Changes

When a storage root mismatches during snap sync, the account was refreshed by fetching its storage-trie root node via GetTrieNodes and trusting the returned node's hash blindly. That refresh path is being removed in snaps/2 and never verified the new storage root against the pivot state root.

Refresh now sends a single-account GetAccountRange (startingHash = path, limitHash = path + 1) and adopts the storage root only after verifying the account against the pivot state root.
Verification reconstructs the account range the same way account-range sync does — fill the boundary proof nodes, set the returned account(s), then check the recomputed root. Crucially the account leaf is set from the response, so verification does not rely on the proof containing the leaf (the range proof can omit it).
The reconstruction runs in an isolated, empty-backed trie (a fresh MemDb-backed PatriciaSnapTrieFactory) and does not commit, so it writes nothing to the client state DB and cannot race the partition account-range sync. An incomplete proof fails verification instead of being completed from live client data.
The shared verification core is factored into SnapProviderHelper.BuildAndVerifyRoot, used by both the existing CommitRange and the new no-commit VerifyAccountRange, so refresh verification is identical to account-range verification.
Outcomes (SnapProvider.RefreshAccounts now returns AddRangeResult, so refresh responses feed AnalyzeResponsePerPeer): verified → adopt the storage root and re-queue storage; not found (deleted account) → terminal, no retry; expired (peer lacks the root) → retry; invalid proof → retry and score the peer.
ProgressTracker.DequeAccountToRefresh dequeues one account per request (one refresh per GetAccountRange). Refreshes are rare, so this is fine.
The now-unused refresh-specific GetTrieNodes(AccountsToRefreshRequest) overload is intentionally left in place for the snaps/2 cleanup; the generic GetTrieNodes(GetTrieNodesRequest) used by state healing is untouched.

Types of changes

What types of changes does your code introduce?

Bugfix (a non-breaking change that fixes an issue)
New feature (a non-breaking change that adds functionality)
Breaking change (a change that causes existing functionality not to work as expected)
Optimization
Refactoring
Documentation update
Build-related changes
Other: Description

Testing

Requires testing

Yes
No

If yes, did you write tests?

Yes
No

Notes on testing

Added parameterized regression tests in SnapServerTest: TestRefreshAccount (correct root verifies and propagates the real storage root; wrong root is rejected with DifferentRootHash) and TestRefreshAccount_LeafMissingFromProof (proof with leaf nodes stripped still verifies, since the account is set from the response — this fails under a proof-walk that assumes the leaf is present). Full Nethermind.Synchronization.Test SnapSync + state-sync-feed suites pass (940 passed, 8 fixture-conditional skips), including the end-to-end SnapSync_* integration tests and the account/storage range tests that exercise the refactored CommitRange.

Documentation

Requires documentation update

Yes
No

Requires explanation in Release Notes

Yes
No

claude · 2026-06-22T09:48:07Z

Claude finished @asdacap's task in 7m 23s —— View job

Code Review

2 Medium · 2 Low · 0 Critical · 0 High

Summary

The design is sound. Replacing the blind-trust GetTrieNodes refresh path with a GetAccountRange + Merkle-proof verification approach is a clear correctness improvement. The in-memory throwaway trie for verification is well-isolated and cannot corrupt client state. The ProgressTracker simplification to one-account-per-request is a clean side effect that matches the new one-shot GetAccountRange shape.

Two issues need attention before merge.

Medium

1. Exception catch is too narrow in VerifyRefreshedAccount (SnapProvider.cs:337)
The filter when (e is TrieException or TrieNodeException or RlpException) does not cover NullReferenceException, ArgumentException, IndexOutOfRangeException, etc. that can arise from malformed proof bytes supplied by a malicious peer. Since SnapSyncFeed.HandleResponse has no catch block, an unhandled exception here would crash the sync task. The fix is a plain catch (Exception) — this is a throwaway trie so swallowing all exceptions and returning InvalidProof is correct. See inline comment.

2. null decode from non-empty RLP returns NotFound instead of InvalidProof (SnapProvider.cs:335)
The NotFound path is reached only after accountRlp.IsEmpty is already false (line 331). If _accountDecoder.Decode returns null for non-empty RLP, the account data was in the verified trie but was not well-formed — that is InvalidProof (score the peer, allow retry), not a deleted-account signal. See inline comment.

Low

3. Misleading comment in SnapSyncDownloader.cs:46
"the protocol returns at least one account past the limit, so start == limit must be avoided" is inaccurate. The real concern is that some peers treat start == limit as an empty range. See inline comment.

4. IncrementPath no-op on Keccak.MaxValue (SnapSyncDownloader.cs:49)
When path == Keccak.MaxValue, IncrementPath() wraps back to MaxValue, producing startingHash == limitHash. Practically impossible for a keccak output, but the behaviour is worth a note. See inline comment.

Other observations (no action needed)

TrieNodeException and TrieException are not in the same hierarchy — the two-part filter is intentional and correct (TrieNodeException does not extend TrieException).
response is not disposed in RefreshAccounts — intentional; SnapSyncFeed.HandleResponse owns the lifecycle via batch.Dispose(). Consistent with the old pattern.
Test mutates stale in-place — PathWithAccount is a class, so requestedPath.PathAndAccount.Account = ... correctly mutates stale. The assertion stale.Account.StorageRoot == storageRoot is valid.
NotFound for deleted accounts — the comment "It remains tracked for healing" is accurate because EnqueueAccountRefresh → TrackAccountToHeal was called at the point of original enqueue.

claude · 2026-06-22T09:54:46Z

+                account = _accountDecoder.Decode(ref ctx);
+                return account is null ? RefreshVerifyResult.NotFound : RefreshVerifyResult.Verified;
+            }
+            catch (Exception e) when (e is TrieException or TrieNodeException or RlpException)


Medium — exception scope too narrow for untrusted P2P data

The filter catches TrieException, TrieNodeException, and RlpException, but malformed proof bytes from a malicious peer could also trigger NullReferenceException, ArgumentException, or IndexOutOfRangeException inside trie traversal or RLP decode — none of which are caught here. These would propagate through RefreshAccounts into SnapSyncFeed.HandleResponse (which has no catch block) and crash the sync task.

Per robustness.md: "Validate data from untrusted sources (P2P peers, …) before use." The safest fix is to broaden the catch to cover all exceptions from this throwaway trie:

Suggested change

catch (Exception e) when (e is TrieException or TrieNodeException or RlpException)

catch (Exception)

{

return RefreshVerifyResult.InvalidProof;

}

Fix this →

Done in 920ce83 — broadened to catch (Exception). The verification trie is a throwaway over untrusted P2P proof data, so any failure assembling/decoding it now returns InvalidProof instead of propagating into HandleResponse.

(agent comment — Claude Opus 4.8 (1M context))

claude · 2026-06-22T09:54:56Z

+
+                Rlp.ValueDecoderContext ctx = new(accountRlp);
+                account = _accountDecoder.Decode(ref ctx);
+                return account is null ? RefreshVerifyResult.NotFound : RefreshVerifyResult.Verified;


Medium — null decode from non-empty RLP should be InvalidProof, not NotFound

accountRlp.IsEmpty was already checked on line 331 (returns early with NotFound), so by this point the trie returned non-empty RLP for this path. If _accountDecoder.Decode returns null from non-empty data, the account entry exists in the verified trie but is not a well-formed Account. That is a tampered/corrupt proof, not a "deleted account" scenario.

The current NotFound result is a "terminal success" that silently drops the storage sync for this account. If this ever happens, it is safer to return InvalidProof so the peer is scored and the sync retries. Suggest:

Suggested change

return account is null ? RefreshVerifyResult.NotFound : RefreshVerifyResult.Verified;

account = _accountDecoder.Decode(ref ctx);

return account is null ? RefreshVerifyResult.InvalidProof : RefreshVerifyResult.Verified;

Resolved by the current approach: refresh no longer hashes or decodes a proof node itself. The account is taken from the verified GetAccountRange response (accounts[i].Account) only after VerifyAccountRange confirms the range reconstructs the state root; a malformed/null account makes that reconstruction fail and is caught as InvalidProof. So the non-empty-RLP→null-decode path no longer exists.

(agent comment — Claude Opus 4.8 (1M context))

claude · 2026-06-22T09:55:03Z

-                        batch.AccountsToRefreshResponse = await handler.GetTrieNodes(batch.AccountsToRefreshRequest, cancellationToken);
+                        // Refresh a single account via GetAccountRange so its storage root is verified
+                        // against the state root. Limit = path + 1 (the protocol returns at least one
+                        // account past the limit, so start == limit must be avoided).


Low — misleading comment

"the protocol returns at least one account past the limit" reads as if peers are obligated to overshoot the limit. The actual concern is that some peers treat start == limit as an empty range and return nothing. Suggest:

Suggested change

// account past the limit, so start == limit must be avoided).

// Use limit = path + 1 to avoid start == limit, which some peer

// implementations interpret as an empty range and return nothing.

Done in 920ce83 — reworded to: "Use limit = path + 1 to avoid start == limit, which some peers treat as an empty range."

(agent comment — Claude Opus 4.8 (1M context))

claude · 2026-06-22T09:55:11Z

+                        // account past the limit, so start == limit must be avoided).
+                        AccountWithStorageStartingHash account = batch.AccountsToRefreshRequest.Paths[0];
+                        ValueHash256 path = account.PathAndAccount.Path;
+                        AccountRange range = new(batch.AccountsToRefreshRequest.RootHash, path, path.IncrementPath());


Low — IncrementPath overflow when path == Keccak.MaxValue

If path is Keccak.MaxValue (all 0xFF bytes), path.IncrementPath() overflows and returns Keccak.MaxValue again, producing startingHash == limitHash == MaxValue. This is astronomically unlikely for a keccak path, but worth a guard or comment so the behaviour is explicit:

Suggested change

AccountRange range = new(batch.AccountsToRefreshRequest.RootHash, path, path.IncrementPath());

AccountRange range = new(batch.AccountsToRefreshRequest.RootHash, path, path.IncrementPath()); // IncrementPath is a no-op only for the astronomically-unlikely MaxValue path

Done in 920ce83 — added a note that IncrementPath is a no-op only for the unreachable MaxValue path.

(agent comment — Claude Opus 4.8 (1M context))

When a storage root mismatches during snap sync, the account was refreshed by fetching its storage-trie root node via GetTrieNodes and trusting the returned node's hash blindly. That refresh path is being removed in snaps/2 and never verified the new storage root against the pivot state root. Refresh now sends a single-account GetAccountRange (limit = path + 1) and verifies the returned account against the pivot state root by reconstructing the account range the same way account-range sync does: fill the boundary proof nodes, set the returned account(s), then check the recomputed root. The reconstruction runs in an isolated, empty-backed trie (no writes to the client state DB, so it cannot race the partition sync), and because the leaf is set from the response it does not rely on the proof containing it. Outcomes: verified (adopt storage root, re-queue storage), not found (deleted account, terminal), expired (stale pivot), or invalid proof (retry and score the peer). Refreshes are rare, so one account per request is fine. The shared verification is factored into SnapProviderHelper.BuildAndVerifyRoot, used by both CommitRange and the new no-commit VerifyAccountRange. The now-unused GetTrieNodes(AccountsToRefreshRequest) overload is left for the snaps/2 cleanup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

batrr · 2026-06-22T13:39:27Z

+            account = null;
+            IReadOnlyList<PathWithAccount> accounts = response.PathAndAccounts;
+            if (accounts.Count == 0)
+                return response.Proofs.Count == 0 ? RefreshVerifyResult.Expired : RefreshVerifyResult.NotFound;


Should NotFound require verification?

That is true, notfound still need proof.

Fixed in 920ce83. NotFound is now only returned after VerifyAccountRange succeeds — i.e. the verified range proves the account is absent at the state root. An empty response (no accounts to verify) returns Expired and retries instead of concluding deletion without proof. Added TestRefreshAccount_VerifiedNotFound covering the proven-absent path.

(agent comment — Claude Opus 4.8 (1M context), on behalf of @asdacap)

- Require proof for NotFound: an empty GetAccountRange response no longer concludes the account was deleted; it retries (Expired) instead, so a terminal NotFound is only reached via a verified range proof. - Fail closed on malformed proofs: VerifyRefreshedAccount now catches all exceptions from the throwaway verification trie (untrusted P2P data) and returns InvalidProof rather than letting them crash the sync task. - Clarify the limit = path + 1 comment (avoids start == limit, which some peers treat as an empty range) and note the MaxValue no-op. - Add a regression test for the verified-NotFound (proven-absent) path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

asdacap requested review from LukaszRozmej, benaadams, flcl42 and marcindsobczak as code owners June 22, 2026 09:42

github-actions Bot added refactoring snap sync sync labels Jun 22, 2026

claude Bot reviewed Jun 22, 2026

View reviewed changes

asdacap force-pushed the getaccountrange-storage-root-refresh branch from a6072fc to bea732f Compare June 22, 2026 10:07

AnkushinDaniil approved these changes Jun 22, 2026

View reviewed changes

benaadams approved these changes Jun 22, 2026

View reviewed changes

batrr approved these changes Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refresh snap-sync account storage root via verified GetAccountRange#12082

Refresh snap-sync account storage root via verified GetAccountRange#12082
asdacap wants to merge 2 commits into
masterfrom
getaccountrange-storage-root-refresh

asdacap commented Jun 22, 2026 •

edited

Loading

Uh oh!

claude Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

claude Bot Jun 22, 2026

Uh oh!

asdacap Jun 22, 2026

Uh oh!

claude Bot Jun 22, 2026

Uh oh!

asdacap Jun 22, 2026

Uh oh!

claude Bot Jun 22, 2026

Uh oh!

asdacap Jun 22, 2026

Uh oh!

claude Bot Jun 22, 2026

Uh oh!

asdacap Jun 22, 2026

Uh oh!

batrr Jun 22, 2026

Uh oh!

asdacap Jun 22, 2026

Uh oh!

asdacap Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-            catch (Exception e) when (e is TrieException or TrieNodeException or RlpException)
+            catch (Exception)
+            {
+                return RefreshVerifyResult.InvalidProof;
+            }

	return account is null ? RefreshVerifyResult.NotFound : RefreshVerifyResult.Verified;
	account = _accountDecoder.Decode(ref ctx);
	return account is null ? RefreshVerifyResult.InvalidProof : RefreshVerifyResult.Verified;

	// account past the limit, so start == limit must be avoided).
	// Use limit = path + 1 to avoid start == limit, which some peer
	// implementations interpret as an empty range and return nothing.

	AccountRange range = new(batch.AccountsToRefreshRequest.RootHash, path, path.IncrementPath());
	AccountRange range = new(batch.AccountsToRefreshRequest.RootHash, path, path.IncrementPath()); // IncrementPath is a no-op only for the astronomically-unlikely MaxValue path

Conversation

asdacap commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Types of changes

What types of changes does your code introduce?

Testing

Requires testing

If yes, did you write tests?

Notes on testing

Documentation

Requires documentation update

Requires explanation in Release Notes

Uh oh!

claude Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Summary

Medium

Low

Other observations (no action needed)

Uh oh!

claude Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

asdacap Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

asdacap Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

asdacap Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

asdacap Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

batrr Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

asdacap Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

asdacap Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

asdacap commented Jun 22, 2026 •

edited

Loading

claude Bot commented Jun 22, 2026 •

edited

Loading