db/snaptype: add cache-busting hash particle to snapshot filenames#19150
db/snaptype: add cache-busting hash particle to snapshot filenames#19150
Conversation
9a059b8 to
5468163
Compare
|
Manually dispatched two additional workflow runs to validate the snapshot filename changes against real-world data:
These don't trigger automatically on this PR since it targets a non- |
|
The 3 failing checks ( |
…enames Support an optional hash particle in snapshot filenames for cache busting. The particle sits as a dot-separated component before the file extension: V2: v1.0-000000-001000-headers.abc123def0.seg V3: v12.13-accounts.100-164.abc123def0.efi Filenames without the particle parse identically to before. The existing extension-stripping loop in ParseFileName already handled extra dot components; this change captures the first one as FileInfo.Hash. Adds WithHash/As hash preservation, construction helpers, and hash-tolerant glob masks.
…fter compression Compute a truncated SHA256 hash of .seg file content after compression and rename the file to include it as a cache-busting particle. This ensures snapshot filenames change when content changes, preventing stale BitTorrent downloads. Changes: - Add ApplyContentHash/computeFileHash helpers in db/snaptype/files.go - Add hash field to DirtySegment, update FileName() and FileInfo() - Call ApplyContentHash after Compress() in all snapshot generation paths: dumpRange, ExtractRange, merge, caplin beacon/blob/state dumps - Update merge() to return updated FileInfo for correct error cleanup - Fix FileInfo.As() to strip hash (content-specific per type) - Fix ReplaceVersionWithMask to match optional hash in glob patterns
89a48cc to
7c21bf8
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds an optional cache-busting content-hash particle to snapshot filenames and propagates it through snapshot generation and snapshot-sync code paths to avoid stale BitTorrent artifacts when snapshot content changes.
Changes:
- Added
FileInfo.Hash,WithHash(), and helpers/masks to construct and match hashed snapshot filenames. - Applied content hashing after
Compress()across multiple snapshot generation paths (ExtractRange, dumpRange, merge, Caplin dumps) and carried the hash throughDirtySegment. - Updated tests to validate parsing/round-tripping of hashed filenames and to tolerate hashed outputs in merge tests.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| db/version/file_version.go | Expands version-masked patterns to also match optional hash particles. |
| db/snaptype/type.go | Applies content hash after compression in ExtractRange before index building. |
| db/snaptype/files.go | Adds hash-aware filename helpers, parsing into FileInfo.Hash, and file hashing/rename logic. |
| db/snaptype/files_test.go | Adds unit tests for hash parsing, naming helpers, masks, and ApplyContentHash. |
| db/snapshotsync/snapshots.go | Extends DirtySegment to carry and emit hashed snapshot filenames. |
| db/snapshotsync/snapshots_test.go | Updates merge test to locate merged output via a hash-tolerant glob mask. |
| db/snapshotsync/merger.go | Applies content hash after merge compression and propagates into DirtySegment. |
| db/snapshotsync/freezeblocks/caplin_snapshots.go | Applies content hash after compression for Caplin block/blob dumps. |
| db/snapshotsync/freezeblocks/block_snapshots.go | Applies content hash after compression for execution-layer snapshot dumps. |
| db/snapshotsync/caplin_state_snapshots.go | Applies content hash after compression for Caplin state dumps. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Agent-Logs-Url: https://github.com/erigontech/erigon/sessions/a0810048-9378-4978-b03b-4f118e34968b Co-authored-by: anacrolix <988750+anacrolix@users.noreply.github.com>
Agent-Logs-Url: https://github.com/erigontech/erigon/sessions/a0810048-9378-4978-b03b-4f118e34968b Co-authored-by: anacrolix <988750+anacrolix@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for ext := filepath.Ext(croppedFileName); ext != "" && !strings.Contains(ext, "-"); ext = filepath.Ext(croppedFileName) { | ||
| croppedFileName = strings.TrimSuffix(croppedFileName, ext) | ||
| if res.Hash == "" { | ||
| res.Hash = ext[1:] // strip leading dot | ||
| } | ||
| } |
There was a problem hiding this comment.
The hash-extraction loop sets res.Hash to the first stripped dotted suffix unconditionally. This will mis-classify non-hash suffixes (e.g. .torrent in ...ef.torrent.tmp..., or .seg/.idx when parsing .torrent* wrappers) as the content hash. A more robust approach is to only accept a hash particle when the suffix looks like a hex digest (and/or when it is not a known snapshot extension like .seg/.idx/.efi).
| f, err = ApplyContentHash(f) | ||
| if err != nil { | ||
| return lastKeyValue, err | ||
| } | ||
|
|
There was a problem hiding this comment.
Applying the content hash to the segment filename here means the .seg name changes when content changes, but the index-building code still writes .idx files using IdxFileName(...) (which does not incorporate FileInfo.Hash). If .idx files/torrents are distributed (they are treated as seedable extensions elsewhere), this can still allow stale .idx/.idx.torrent downloads and even mismatched index+segment pairs. Consider propagating the same hash particle into index filenames as well (or otherwise tying index identity to the segment hash).
Summary
v1.0-000000-001000-headers.abc123def0.seg).segfile content after compression and renames the file to include it, ensuring snapshot filenames change when content changes and preventing stale BitTorrent downloadsFileInfo.Hashfield,WithHash()/As()hash preservation, construction helpers, and hash-tolerant glob masksCompress()in all snapshot generation paths:dumpRange,ExtractRange,merge, caplin beacon/blob/state dumpsDirtySegmentto carry the hash throughFileName()andFileInfo()