Skip to content

Refresh snap-sync account storage root via verified GetAccountRange#12082

Open
asdacap wants to merge 2 commits into
masterfrom
getaccountrange-storage-root-refresh
Open

Refresh snap-sync account storage root via verified GetAccountRange#12082
asdacap wants to merge 2 commits into
masterfrom
getaccountrange-storage-root-refresh

Conversation

@asdacap

@asdacap asdacap commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Changes

When a storage root mismatches during snap sync, the account was refreshed by fetching its storage-trie root node via GetTrieNodes and trusting the returned node's hash blindly. That refresh path is being removed in snaps/2 and never verified the new storage root against the pivot state root.

  • Refresh now sends a single-account GetAccountRange (startingHash = path, limitHash = path + 1) and adopts the storage root only after verifying the account against the pivot state root.
  • Verification reconstructs the account range the same way account-range sync does — fill the boundary proof nodes, set the returned account(s), then check the recomputed root. Crucially the account leaf is set from the response, so verification does not rely on the proof containing the leaf (the range proof can omit it).
  • The reconstruction runs in an isolated, empty-backed trie (a fresh MemDb-backed PatriciaSnapTrieFactory) and does not commit, so it writes nothing to the client state DB and cannot race the partition account-range sync. An incomplete proof fails verification instead of being completed from live client data.
  • The shared verification core is factored into SnapProviderHelper.BuildAndVerifyRoot, used by both the existing CommitRange and the new no-commit VerifyAccountRange, so refresh verification is identical to account-range verification.
  • Outcomes (SnapProvider.RefreshAccounts now returns AddRangeResult, so refresh responses feed AnalyzeResponsePerPeer): verified → adopt the storage root and re-queue storage; not found (deleted account) → terminal, no retry; expired (peer lacks the root) → retry; invalid proof → retry and score the peer.
  • ProgressTracker.DequeAccountToRefresh dequeues one account per request (one refresh per GetAccountRange). Refreshes are rare, so this is fine.
  • The now-unused refresh-specific GetTrieNodes(AccountsToRefreshRequest) overload is intentionally left in place for the snaps/2 cleanup; the generic GetTrieNodes(GetTrieNodesRequest) used by state healing is untouched.

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

Added parameterized regression tests in SnapServerTest: TestRefreshAccount (correct root verifies and propagates the real storage root; wrong root is rejected with DifferentRootHash) and TestRefreshAccount_LeafMissingFromProof (proof with leaf nodes stripped still verifies, since the account is set from the response — this fails under a proof-walk that assumes the leaf is present). Full Nethermind.Synchronization.Test SnapSync + state-sync-feed suites pass (940 passed, 8 fixture-conditional skips), including the end-to-end SnapSync_* integration tests and the account/storage range tests that exercise the refactored CommitRange.

Documentation

Requires documentation update

  • Yes
  • No

Requires explanation in Release Notes

  • Yes
  • No

@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @asdacap's task in 7m 23s —— View job


Code Review

2 Medium · 2 Low · 0 Critical · 0 High


Summary

The design is sound. Replacing the blind-trust GetTrieNodes refresh path with a GetAccountRange + Merkle-proof verification approach is a clear correctness improvement. The in-memory throwaway trie for verification is well-isolated and cannot corrupt client state. The ProgressTracker simplification to one-account-per-request is a clean side effect that matches the new one-shot GetAccountRange shape.

Two issues need attention before merge.


Medium

1. Exception catch is too narrow in VerifyRefreshedAccount (SnapProvider.cs:337)
The filter when (e is TrieException or TrieNodeException or RlpException) does not cover NullReferenceException, ArgumentException, IndexOutOfRangeException, etc. that can arise from malformed proof bytes supplied by a malicious peer. Since SnapSyncFeed.HandleResponse has no catch block, an unhandled exception here would crash the sync task. The fix is a plain catch (Exception) — this is a throwaway trie so swallowing all exceptions and returning InvalidProof is correct. See inline comment.

2. null decode from non-empty RLP returns NotFound instead of InvalidProof (SnapProvider.cs:335)
The NotFound path is reached only after accountRlp.IsEmpty is already false (line 331). If _accountDecoder.Decode returns null for non-empty RLP, the account data was in the verified trie but was not well-formed — that is InvalidProof (score the peer, allow retry), not a deleted-account signal. See inline comment.


Low

3. Misleading comment in SnapSyncDownloader.cs:46
"the protocol returns at least one account past the limit, so start == limit must be avoided" is inaccurate. The real concern is that some peers treat start == limit as an empty range. See inline comment.

4. IncrementPath no-op on Keccak.MaxValue (SnapSyncDownloader.cs:49)
When path == Keccak.MaxValue, IncrementPath() wraps back to MaxValue, producing startingHash == limitHash. Practically impossible for a keccak output, but the behaviour is worth a note. See inline comment.


Other observations (no action needed)

  • TrieNodeException and TrieException are not in the same hierarchy — the two-part filter is intentional and correct (TrieNodeException does not extend TrieException).
  • response is not disposed in RefreshAccounts — intentional; SnapSyncFeed.HandleResponse owns the lifecycle via batch.Dispose(). Consistent with the old pattern.
  • Test mutates stale in-placePathWithAccount is a class, so requestedPath.PathAndAccount.Account = ... correctly mutates stale. The assertion stale.Account.StorageRoot == storageRoot is valid.
  • NotFound for deleted accounts — the comment "It remains tracked for healing" is accurate because EnqueueAccountRefreshTrackAccountToHeal was called at the point of original enqueue.

account = _accountDecoder.Decode(ref ctx);
return account is null ? RefreshVerifyResult.NotFound : RefreshVerifyResult.Verified;
}
catch (Exception e) when (e is TrieException or TrieNodeException or RlpException)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — exception scope too narrow for untrusted P2P data

The filter catches TrieException, TrieNodeException, and RlpException, but malformed proof bytes from a malicious peer could also trigger NullReferenceException, ArgumentException, or IndexOutOfRangeException inside trie traversal or RLP decode — none of which are caught here. These would propagate through RefreshAccounts into SnapSyncFeed.HandleResponse (which has no catch block) and crash the sync task.

Per robustness.md: "Validate data from untrusted sources (P2P peers, …) before use." The safest fix is to broaden the catch to cover all exceptions from this throwaway trie:

Suggested change
catch (Exception e) when (e is TrieException or TrieNodeException or RlpException)
catch (Exception)
{
return RefreshVerifyResult.InvalidProof;
}

Fix this →

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 920ce83 — broadened to catch (Exception). The verification trie is a throwaway over untrusted P2P proof data, so any failure assembling/decoding it now returns InvalidProof instead of propagating into HandleResponse.

(agent comment — Claude Opus 4.8 (1M context))


Rlp.ValueDecoderContext ctx = new(accountRlp);
account = _accountDecoder.Decode(ref ctx);
return account is null ? RefreshVerifyResult.NotFound : RefreshVerifyResult.Verified;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — null decode from non-empty RLP should be InvalidProof, not NotFound

accountRlp.IsEmpty was already checked on line 331 (returns early with NotFound), so by this point the trie returned non-empty RLP for this path. If _accountDecoder.Decode returns null from non-empty data, the account entry exists in the verified trie but is not a well-formed Account. That is a tampered/corrupt proof, not a "deleted account" scenario.

The current NotFound result is a "terminal success" that silently drops the storage sync for this account. If this ever happens, it is safer to return InvalidProof so the peer is scored and the sync retries. Suggest:

Suggested change
return account is null ? RefreshVerifyResult.NotFound : RefreshVerifyResult.Verified;
account = _accountDecoder.Decode(ref ctx);
return account is null ? RefreshVerifyResult.InvalidProof : RefreshVerifyResult.Verified;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by the current approach: refresh no longer hashes or decodes a proof node itself. The account is taken from the verified GetAccountRange response (accounts[i].Account) only after VerifyAccountRange confirms the range reconstructs the state root; a malformed/null account makes that reconstruction fail and is caught as InvalidProof. So the non-empty-RLP→null-decode path no longer exists.

(agent comment — Claude Opus 4.8 (1M context))

batch.AccountsToRefreshResponse = await handler.GetTrieNodes(batch.AccountsToRefreshRequest, cancellationToken);
// Refresh a single account via GetAccountRange so its storage root is verified
// against the state root. Limit = path + 1 (the protocol returns at least one
// account past the limit, so start == limit must be avoided).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — misleading comment

"the protocol returns at least one account past the limit" reads as if peers are obligated to overshoot the limit. The actual concern is that some peers treat start == limit as an empty range and return nothing. Suggest:

Suggested change
// account past the limit, so start == limit must be avoided).
// Use limit = path + 1 to avoid start == limit, which some peer
// implementations interpret as an empty range and return nothing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 920ce83 — reworded to: "Use limit = path + 1 to avoid start == limit, which some peers treat as an empty range."

(agent comment — Claude Opus 4.8 (1M context))

// account past the limit, so start == limit must be avoided).
AccountWithStorageStartingHash account = batch.AccountsToRefreshRequest.Paths[0];
ValueHash256 path = account.PathAndAccount.Path;
AccountRange range = new(batch.AccountsToRefreshRequest.RootHash, path, path.IncrementPath());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — IncrementPath overflow when path == Keccak.MaxValue

If path is Keccak.MaxValue (all 0xFF bytes), path.IncrementPath() overflows and returns Keccak.MaxValue again, producing startingHash == limitHash == MaxValue. This is astronomically unlikely for a keccak path, but worth a guard or comment so the behaviour is explicit:

Suggested change
AccountRange range = new(batch.AccountsToRefreshRequest.RootHash, path, path.IncrementPath());
AccountRange range = new(batch.AccountsToRefreshRequest.RootHash, path, path.IncrementPath()); // IncrementPath is a no-op only for the astronomically-unlikely MaxValue path

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 920ce83 — added a note that IncrementPath is a no-op only for the unreachable MaxValue path.

(agent comment — Claude Opus 4.8 (1M context))

When a storage root mismatches during snap sync, the account was refreshed
by fetching its storage-trie root node via GetTrieNodes and trusting the
returned node's hash blindly. That refresh path is being removed in snaps/2
and never verified the new storage root against the pivot state root.

Refresh now sends a single-account GetAccountRange (limit = path + 1) and
verifies the returned account against the pivot state root by reconstructing
the account range the same way account-range sync does: fill the boundary
proof nodes, set the returned account(s), then check the recomputed root. The
reconstruction runs in an isolated, empty-backed trie (no writes to the client
state DB, so it cannot race the partition sync), and because the leaf is set
from the response it does not rely on the proof containing it. Outcomes:
verified (adopt storage root, re-queue storage), not found (deleted account,
terminal), expired (stale pivot), or invalid proof (retry and score the peer).
Refreshes are rare, so one account per request is fine.

The shared verification is factored into SnapProviderHelper.BuildAndVerifyRoot,
used by both CommitRange and the new no-commit VerifyAccountRange. The
now-unused GetTrieNodes(AccountsToRefreshRequest) overload is left for the
snaps/2 cleanup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@asdacap asdacap force-pushed the getaccountrange-storage-root-refresh branch from a6072fc to bea732f Compare June 22, 2026 10:07
account = null;
IReadOnlyList<PathWithAccount> accounts = response.PathAndAccounts;
if (accounts.Count == 0)
return response.Proofs.Count == 0 ? RefreshVerifyResult.Expired : RefreshVerifyResult.NotFound;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should NotFound require verification?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, notfound still need proof.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 920ce83. NotFound is now only returned after VerifyAccountRange succeeds — i.e. the verified range proves the account is absent at the state root. An empty response (no accounts to verify) returns Expired and retries instead of concluding deletion without proof. Added TestRefreshAccount_VerifiedNotFound covering the proven-absent path.

(agent comment — Claude Opus 4.8 (1M context), on behalf of @asdacap)

- Require proof for NotFound: an empty GetAccountRange response no longer
  concludes the account was deleted; it retries (Expired) instead, so a
  terminal NotFound is only reached via a verified range proof.
- Fail closed on malformed proofs: VerifyRefreshedAccount now catches all
  exceptions from the throwaway verification trie (untrusted P2P data) and
  returns InvalidProof rather than letting them crash the sync task.
- Clarify the limit = path + 1 comment (avoids start == limit, which some
  peers treat as an empty range) and note the MaxValue no-op.
- Add a regression test for the verified-NotFound (proven-absent) path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants