Skip to content

triedb/pathdb: fix lookup sentinel collision with zero disk layer root#34680

Open
CPerezz wants to merge 2 commits intoethereum:masterfrom
CPerezz:pathdb-fix-lookup-zero-root
Open

triedb/pathdb: fix lookup sentinel collision with zero disk layer root#34680
CPerezz wants to merge 2 commits intoethereum:masterfrom
CPerezz:pathdb-fix-lookup-zero-root

Conversation

@CPerezz
Copy link
Copy Markdown
Contributor

@CPerezz CPerezz commented Apr 7, 2026

Summary

lookup.accountTip and storageTip use common.Hash{} as the "state is stale" sentinel while also returning common.Hash{} when the disk layer's root happens to be zero. lookupAccount/lookupStorage then compare the returned hash directly against common.Hash{} and misreport a legitimate disk-layer fallback as errSnapshotStale.

For a merkle path database this sentinel collision is invisible: an empty merkle trie hashes to types.EmptyRootHash (a concrete non-zero keccak), so the disk layer's root is never the zero hash in practice. The collision only shows up once the disk layer root can legitimately be zero — for example, a database whose empty root is itself common.Hash{}. In that configuration, any Account/Storage lookup for a key that was never written takes the "disk layer fallback" branch of accountTip, which returns base = common.Hash{}, which lookupAccount then misreads as "stale".

Fix

Return (common.Hash, bool) from accountTip/storageTip so the "found" signal is carried out of band from the hash value. lookupAccount/lookupStorage consult the boolean rather than comparing the returned hash to zero. The returned hash itself may still be common.Hash{} — callers no longer treat it as a sentinel.

lookup.accountTip and storageTip used common.Hash{} as the "state is
stale" sentinel while ALSO returning common.Hash{} when the disk layer
itself happened to be keyed by the zero hash. lookupAccount/Storage
then blindly compared the returned value against common.Hash{} and
misreported a legitimate disk-layer fallback as errSnapshotStale.

For a merkle path database this sentinel collision is invisible: an
empty merkle trie hashes to types.EmptyRootHash which is a concrete
non-zero keccak, so the disk layer's root never equals common.Hash{}.
The collision only shows up once the disk layer root can legitimately
be zero — for example, a fresh verkle/bintrie database where the empty
binary trie hashes to EmptyVerkleHash = common.Hash{}. In that
configuration, any Account/Storage lookup for a key that has never
been written ends up taking the disk-layer fallback branch, which
correctly returns base=common.Hash{}, which lookupAccount then
misreads as "stale" and bubbles an error up to the caller.

Fix: change accountTip/storageTip to return (common.Hash, bool) so the
"found the tip" signal is carried out of band from the hash value.
lookupAccount/Storage now consult the boolean rather than comparing
the returned hash to zero. The returned hash itself may still be zero
(that is a valid disk-layer root on the bintrie path) and callers
must not treat it as a sentinel.

Noticed while wiring the bintrie flat-state reader in a separate
branch; the fix is scheme-agnostic and lands here so it can flow into
master independently of that work.
@CPerezz CPerezz requested a review from rjl493456442 as a code owner April 7, 2026 22:23
TestLookupZeroBaseRootFallback constructs a layer tree whose disk-layer
root is common.Hash{} (mirroring the bintrie/verkle configuration where
an empty trie hashes to EmptyVerkleHash), stacks diff layers on top,
and exercises four cases:

  1. lookupAccount on a never-written key must fall through to the disk
     layer and return (diskLayer, nil). Before the previous commit this
     returned errSnapshotStale because the disk-layer fallback hash
     collided with the stale sentinel.
  2. Symmetric case for lookupStorage.
  3. lookupAccount on a written account must still return the diff
     layer that holds it — pins the normal resolution path.
  4. lookupAccount/lookupStorage for an unknown state root must still
     return errSnapshotStale. This pins the other half of the contract
     so a future refactor that always returned ok=true would be caught
     here rather than in production.

Verified by reverse-applying the previous commit: the test fails with
the exact pre-fix error ("layer stale") on cases 1 and 2, and passes
once the fix is restored.

The existing TestAccountLookup/TestStorageLookup tests use
newTestLayerTree which hard-codes common.Hash{0x1} as the disk-layer
root, so none of them could cover the zero-root case without a tailored
helper; this test inlines newDiskLayer(common.Hash{}, …) directly
rather than parameterize the shared helper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant