Skip to content

downloader, snapshots: add erigon snapshots verify subcommand#19972

Draft
anacrolix wants to merge 5 commits intomainfrom
anacrolix/snapshots-verify
Draft

downloader, snapshots: add erigon snapshots verify subcommand#19972
anacrolix wants to merge 5 commits intomainfrom
anacrolix/snapshots-verify

Conversation

@anacrolix
Copy link
Copy Markdown
Contributor

Closes #16660

Adds erigon snapshots verify (also erigon seg verify) to verify snapshot downloads in place without going through --downloader.verify.

What it checks

  • preverified.toml (if present): each .torrent file's infohash is matched against the recorded hash; entries with no .torrent on disk are reported
  • File paths: info.Name and per-file paths within each torrent are validated as safe (no directory traversal)
  • Data file presence and size: missing or wrong-size files are reported; severity is error if the file is in preverified.toml, warn otherwise
  • File permissions: files must be owner-readable; writable completed snapshots are warned about; world-writable files are an error
  • Piece hashes: verified using the same storage.NewFileOpts configuration as the main downloader (part-file-aware, mmap-based fileIo with SEEK_DATA/SEEK_HOLE sparse read support)

Implementation notes

newSnapStorage is extracted from newTorrentClient so both the running downloader and the verify command use identical storage settings.

Adds `erigon snapshots verify` (also accessible as `erigon seg verify`)
which checks the snapshot download directory against .torrent files and
an optional preverified.toml.

Checks performed:
- Infohash of each .torrent matches preverified.toml (if present)
- File paths within torrents are safe (no directory traversal)
- info.Name matches the torrent filename
- Data files exist, have correct size, and appropriate permissions
- Piece hashes verified using the mmap/sparse-read storage layer
  (SEEK_DATA/SEEK_HOLE) from anacrolix/torrent
- Preverified.toml entries with no corresponding .torrent file

Closes #16660
Extracts the torrent file storage setup from newTorrentClient into a
shared newSnapStorage function so that the snapshot verify command uses
identical storage settings (UsePartFiles=true, mmap-based fileIo) as
the main downloader.
Torrents are now verified in a worker pool (--workers flag, defaults to
GOMAXPROCS/2). Each finding is emitted immediately as a JSON line to
stdout as it is discovered, so progress is visible in real time. A
summary JSON object (torrents/errors/warnings counts) is written last.

The command handler logs the summary via the normal logger and exits
non-zero if errors were found.
Snapshot discovery now takes the union of:
- All names in preverified.toml
- All files on disk with ".torrent" stripped
- All files on disk with ".part" stripped

This lets the verifier detect missing .torrent files, infohash mismatches
between preverified.toml and .torrent, .part-vs-full disagreements, and
data files that are absent entirely.

After each snapshot finishes verification, a {"type":"progress",...} line
is written to stdout with the snapshot name, overall % complete, and the
preverified status ("matches" | "doesn't match" | "not in preverified" |
"no preverified.toml"). Issue and summary lines retain their own types.
Replace JSON encoder output with plain fmt.Fprintf lines:
- Issues: "ERROR  <name>: <message>"
- Progress: "[ 14.3%]  <name padded to 60>  <pv status>"
- Summary: "verified N snapshots: E error(s), W warning(s)"

Remove SnapshotVerifyIssue and SnapshotVerifyProgress types; keep
SnapshotVerifySummary for programmatic use by the CLI action.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a dedicated CLI path to verify Erigon snapshot downloads in-place (without --downloader.verify), reusing the downloader’s snapshot storage configuration so piece-hash verification matches runtime behavior.

Changes:

  • Introduces db/downloader.VerifySnapshotDownloads, which scans snap/ for snapshot-related artifacts and verifies against .torrent + preverified.toml.
  • Extracts newSnapStorage from newTorrentClient so both the downloader and verifier use identical storage settings.
  • Adds erigon snapshots verify (also available via erigon seg verify) CLI subcommand.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
db/downloader/verify-snapshot-downloads.go New verifier implementation: preverified checks, on-disk presence/size/permissions checks, and piece-hash verification.
db/downloader/downloader.go Extracts newSnapStorage helper to share snapshot storage configuration across downloader + verifier.
cmd/utils/app/snapshots_cmd.go Adds snapshots verify CLI subcommand that runs the new verifier and fails the command on detected errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +336 to +340
if expectedSize >= 0 {
if fi, statErr := os.Stat(filepath.Join("", name)); statErr == nil && fi.Size() != expectedSize {
o.issue("error", name,
fmt.Sprintf("size mismatch: on-disk=%d torrent=%d", fi.Size(), expectedSize))
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the size check, os.Stat(filepath.Join("", name)) stats the snapshot path relative to the current working directory (and doesn't convert from slash-style). This will commonly skip size validation or validate the wrong file. Use os.Stat(filepath.Join(snapDir, filepath.FromSlash(name))) (or pass snapDir into checkPartAndData).

Copilot uses AI. Check for mistakes.
o *output,
) {
client := newSnapStorage(snapDir, nil)
defer client.Close()
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newSnapStorage returns a storage.ClientImplCloser whose Close() returns an error (see newTorrentClient in downloader.go which joins m.Close()). Here defer client.Close() drops that error. Consider capturing the close error and reporting it (e.g., as a warning/error via o.issue).

Suggested change
defer client.Close()
defer func() {
if err := client.Close(); err != nil {
o.issue("warn", snapshotName, fmt.Sprintf("failed to close torrent storage: %v", err))
}
}()

Copilot uses AI. Check for mistakes.
Comment on lines +392 to +394
//nolint:gosec
h := sha1.New()
piece := t.Piece(p)
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sha1.New() is allocated for every piece. For large snapshots with many pieces this adds significant allocation/GC overhead during verification. Prefer allocating the hasher once outside the loop and calling Reset() each iteration.

Copilot uses AI. Check for mistakes.
},
{
Name: "verify",
Description: "verify snapshot downloads against .torrent files and preverified.toml; findings are written as JSON lines to stdout",
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subcommand help text says findings are written as "JSON lines", but VerifySnapshotDownloads prints human-readable formatted text (and a final summary line). Update the description (or change output format) so CLI help matches actual behavior.

Suggested change
Description: "verify snapshot downloads against .torrent files and preverified.toml; findings are written as JSON lines to stdout",
Description: "verify snapshot downloads against .torrent files and preverified.toml; findings are written as human-readable text to stdout",

Copilot uses AI. Check for mistakes.
o.issue("error", snapshotName, fmt.Sprintf("failed to open torrent storage: %v", err))
return
}
defer t.Close()
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer t.Close() ignores any error returned by the torrent handle close. If Close() returns an error, consider capturing and reporting it (at least as a warning) so verification doesn't silently miss I/O/storage failures.

Suggested change
defer t.Close()
defer func() {
if closeErr := t.Close(); closeErr != nil {
o.issue("warning", snapshotName, fmt.Sprintf("failed to close torrent storage: %v", closeErr))
}
}()

Copilot uses AI. Check for mistakes.
Comment on lines +293 to +295
if baseName := path.Base(name); infoName != baseName {
o.issue("warn", name,
fmt.Sprintf("info.Name %q doesn't match torrent filename %q", infoName, baseName))
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning compares info.BestName() to path.Base(name), but Erigon's torrent builder sets info.Name to the full relative snapshot path (e.g. "history/v1...", see BuildTorrentIfNeed in db/downloader/util.go). As written, correct torrents for snapshots stored in subdirectories will likely warn. Consider comparing path.Clean(infoName) to path.Clean(name) (or comparing basenames on both sides).

Suggested change
if baseName := path.Base(name); infoName != baseName {
o.issue("warn", name,
fmt.Sprintf("info.Name %q doesn't match torrent filename %q", infoName, baseName))
cleanInfoName := path.Clean(infoName)
cleanName := path.Clean(name)
if cleanInfoName != cleanName {
o.issue("warn", name,
fmt.Sprintf("info.Name %q doesn't match torrent path %q", infoName, cleanName))

Copilot uses AI. Check for mistakes.
}

var expectedSize int64
for _, fi := range info.UpvertedFiles() {
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says per-file paths inside each torrent are validated as safe, but the verifier currently only checks info.Name. If a multi-file torrent is encountered, validate each file path (e.g. fi.Path/fi.BestPath() from info.UpvertedFiles*) with storage.ToSafeFilePath to prevent traversal via per-file paths.

Suggested change
for _, fi := range info.UpvertedFiles() {
for _, fi := range info.UpvertedFiles() {
filePath := path.Join(fi.BestPath()...)
if _, pathErr := storage.ToSafeFilePath(filePath); pathErr != nil {
o.issue("error", name, fmt.Sprintf("torrent file path %q is unsafe: %v", filePath, pathErr))
return
}

Copilot uses AI. Check for mistakes.
Comment on lines +119 to +122
func VerifySnapshotDownloads(ctx context.Context, dirs datadir.Dirs, workers int, out io.Writer) (summary SnapshotVerifySummary, err error) {
if workers <= 0 {
workers = max(1, runtime.GOMAXPROCS(0)/2)
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VerifySnapshotDownloads introduces non-trivial new verification behavior (preverified mismatch handling, permissions checks, size checks, piece hashing). The db/downloader package already has unit tests, but this verifier currently has none; adding a small tempdir-based test (e.g., via BuildTorrentIfNeed) would help prevent regressions.

Copilot uses AI. Check for mistakes.
@anacrolix anacrolix self-assigned this Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Command to verify torrent file data

2 participants