downloader, snapshots: add erigon snapshots verify subcommand#19972
downloader, snapshots: add erigon snapshots verify subcommand#19972
Conversation
Adds `erigon snapshots verify` (also accessible as `erigon seg verify`) which checks the snapshot download directory against .torrent files and an optional preverified.toml. Checks performed: - Infohash of each .torrent matches preverified.toml (if present) - File paths within torrents are safe (no directory traversal) - info.Name matches the torrent filename - Data files exist, have correct size, and appropriate permissions - Piece hashes verified using the mmap/sparse-read storage layer (SEEK_DATA/SEEK_HOLE) from anacrolix/torrent - Preverified.toml entries with no corresponding .torrent file Closes #16660
Extracts the torrent file storage setup from newTorrentClient into a shared newSnapStorage function so that the snapshot verify command uses identical storage settings (UsePartFiles=true, mmap-based fileIo) as the main downloader.
Torrents are now verified in a worker pool (--workers flag, defaults to GOMAXPROCS/2). Each finding is emitted immediately as a JSON line to stdout as it is discovered, so progress is visible in real time. A summary JSON object (torrents/errors/warnings counts) is written last. The command handler logs the summary via the normal logger and exits non-zero if errors were found.
Snapshot discovery now takes the union of:
- All names in preverified.toml
- All files on disk with ".torrent" stripped
- All files on disk with ".part" stripped
This lets the verifier detect missing .torrent files, infohash mismatches
between preverified.toml and .torrent, .part-vs-full disagreements, and
data files that are absent entirely.
After each snapshot finishes verification, a {"type":"progress",...} line
is written to stdout with the snapshot name, overall % complete, and the
preverified status ("matches" | "doesn't match" | "not in preverified" |
"no preverified.toml"). Issue and summary lines retain their own types.
Replace JSON encoder output with plain fmt.Fprintf lines: - Issues: "ERROR <name>: <message>" - Progress: "[ 14.3%] <name padded to 60> <pv status>" - Summary: "verified N snapshots: E error(s), W warning(s)" Remove SnapshotVerifyIssue and SnapshotVerifyProgress types; keep SnapshotVerifySummary for programmatic use by the CLI action.
There was a problem hiding this comment.
Pull request overview
Adds a dedicated CLI path to verify Erigon snapshot downloads in-place (without --downloader.verify), reusing the downloader’s snapshot storage configuration so piece-hash verification matches runtime behavior.
Changes:
- Introduces
db/downloader.VerifySnapshotDownloads, which scanssnap/for snapshot-related artifacts and verifies against.torrent+preverified.toml. - Extracts
newSnapStoragefromnewTorrentClientso both the downloader and verifier use identical storage settings. - Adds
erigon snapshots verify(also available viaerigon seg verify) CLI subcommand.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
db/downloader/verify-snapshot-downloads.go |
New verifier implementation: preverified checks, on-disk presence/size/permissions checks, and piece-hash verification. |
db/downloader/downloader.go |
Extracts newSnapStorage helper to share snapshot storage configuration across downloader + verifier. |
cmd/utils/app/snapshots_cmd.go |
Adds snapshots verify CLI subcommand that runs the new verifier and fails the command on detected errors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if expectedSize >= 0 { | ||
| if fi, statErr := os.Stat(filepath.Join("", name)); statErr == nil && fi.Size() != expectedSize { | ||
| o.issue("error", name, | ||
| fmt.Sprintf("size mismatch: on-disk=%d torrent=%d", fi.Size(), expectedSize)) | ||
| } |
There was a problem hiding this comment.
In the size check, os.Stat(filepath.Join("", name)) stats the snapshot path relative to the current working directory (and doesn't convert from slash-style). This will commonly skip size validation or validate the wrong file. Use os.Stat(filepath.Join(snapDir, filepath.FromSlash(name))) (or pass snapDir into checkPartAndData).
| o *output, | ||
| ) { | ||
| client := newSnapStorage(snapDir, nil) | ||
| defer client.Close() |
There was a problem hiding this comment.
newSnapStorage returns a storage.ClientImplCloser whose Close() returns an error (see newTorrentClient in downloader.go which joins m.Close()). Here defer client.Close() drops that error. Consider capturing the close error and reporting it (e.g., as a warning/error via o.issue).
| defer client.Close() | |
| defer func() { | |
| if err := client.Close(); err != nil { | |
| o.issue("warn", snapshotName, fmt.Sprintf("failed to close torrent storage: %v", err)) | |
| } | |
| }() |
| //nolint:gosec | ||
| h := sha1.New() | ||
| piece := t.Piece(p) |
There was a problem hiding this comment.
sha1.New() is allocated for every piece. For large snapshots with many pieces this adds significant allocation/GC overhead during verification. Prefer allocating the hasher once outside the loop and calling Reset() each iteration.
| }, | ||
| { | ||
| Name: "verify", | ||
| Description: "verify snapshot downloads against .torrent files and preverified.toml; findings are written as JSON lines to stdout", |
There was a problem hiding this comment.
This subcommand help text says findings are written as "JSON lines", but VerifySnapshotDownloads prints human-readable formatted text (and a final summary line). Update the description (or change output format) so CLI help matches actual behavior.
| Description: "verify snapshot downloads against .torrent files and preverified.toml; findings are written as JSON lines to stdout", | |
| Description: "verify snapshot downloads against .torrent files and preverified.toml; findings are written as human-readable text to stdout", |
| o.issue("error", snapshotName, fmt.Sprintf("failed to open torrent storage: %v", err)) | ||
| return | ||
| } | ||
| defer t.Close() |
There was a problem hiding this comment.
defer t.Close() ignores any error returned by the torrent handle close. If Close() returns an error, consider capturing and reporting it (at least as a warning) so verification doesn't silently miss I/O/storage failures.
| defer t.Close() | |
| defer func() { | |
| if closeErr := t.Close(); closeErr != nil { | |
| o.issue("warning", snapshotName, fmt.Sprintf("failed to close torrent storage: %v", closeErr)) | |
| } | |
| }() |
| if baseName := path.Base(name); infoName != baseName { | ||
| o.issue("warn", name, | ||
| fmt.Sprintf("info.Name %q doesn't match torrent filename %q", infoName, baseName)) |
There was a problem hiding this comment.
This warning compares info.BestName() to path.Base(name), but Erigon's torrent builder sets info.Name to the full relative snapshot path (e.g. "history/v1...", see BuildTorrentIfNeed in db/downloader/util.go). As written, correct torrents for snapshots stored in subdirectories will likely warn. Consider comparing path.Clean(infoName) to path.Clean(name) (or comparing basenames on both sides).
| if baseName := path.Base(name); infoName != baseName { | |
| o.issue("warn", name, | |
| fmt.Sprintf("info.Name %q doesn't match torrent filename %q", infoName, baseName)) | |
| cleanInfoName := path.Clean(infoName) | |
| cleanName := path.Clean(name) | |
| if cleanInfoName != cleanName { | |
| o.issue("warn", name, | |
| fmt.Sprintf("info.Name %q doesn't match torrent path %q", infoName, cleanName)) |
| } | ||
|
|
||
| var expectedSize int64 | ||
| for _, fi := range info.UpvertedFiles() { |
There was a problem hiding this comment.
The PR description says per-file paths inside each torrent are validated as safe, but the verifier currently only checks info.Name. If a multi-file torrent is encountered, validate each file path (e.g. fi.Path/fi.BestPath() from info.UpvertedFiles*) with storage.ToSafeFilePath to prevent traversal via per-file paths.
| for _, fi := range info.UpvertedFiles() { | |
| for _, fi := range info.UpvertedFiles() { | |
| filePath := path.Join(fi.BestPath()...) | |
| if _, pathErr := storage.ToSafeFilePath(filePath); pathErr != nil { | |
| o.issue("error", name, fmt.Sprintf("torrent file path %q is unsafe: %v", filePath, pathErr)) | |
| return | |
| } |
| func VerifySnapshotDownloads(ctx context.Context, dirs datadir.Dirs, workers int, out io.Writer) (summary SnapshotVerifySummary, err error) { | ||
| if workers <= 0 { | ||
| workers = max(1, runtime.GOMAXPROCS(0)/2) | ||
| } |
There was a problem hiding this comment.
VerifySnapshotDownloads introduces non-trivial new verification behavior (preverified mismatch handling, permissions checks, size checks, piece hashing). The db/downloader package already has unit tests, but this verifier currently has none; adding a small tempdir-based test (e.g., via BuildTorrentIfNeed) would help prevent regressions.
Closes #16660
Adds
erigon snapshots verify(alsoerigon seg verify) to verify snapshot downloads in place without going through--downloader.verify.What it checks
.torrentfile's infohash is matched against the recorded hash; entries with no.torrenton disk are reportedinfo.Nameand per-file paths within each torrent are validated as safe (no directory traversal)errorif the file is in preverified.toml,warnotherwisestorage.NewFileOptsconfiguration as the main downloader (part-file-aware, mmap-based fileIo withSEEK_DATA/SEEK_HOLEsparse read support)Implementation notes
newSnapStorageis extracted fromnewTorrentClientso both the running downloader and the verify command use identical storage settings.