Skip to content

Conversation

@AJStonewee
Copy link
Contributor

We're downloading 2TB+ mainnet snapshots that take 48+ hours but had zero visibility into what's happening.

Before:

Downloading and extracting... 42.31% (904.23 GB / 2.08 TB)

That's it. No way to track via Prometheus, no failure stats, nothing.

After:

// Now exposed as Prometheus metrics:
cli_download_downloads_started_total: 3
cli_download_downloads_success_total: 2  
cli_download_downloads_failed_total: 1
cli_download_download_speed_bytes_per_second: 125829120  // ~120MB/s
cli_download_download_progress_percent: 42.31
.....

Can now properly monitor downloads:

  • Track success rates across retries
  • Alert on slow speeds (rate(cli_download_downloaded_bytes_total[1m]) < 10MB)
  • Predict completion time
  • Identify if extraction or download is the bottleneck

Copy link
Member

@Rjected Rjected left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding metrics for this is reasonable, I have a few suggestions

@github-project-automation github-project-automation bot moved this from Backlog to In Progress in Reth Tracker Nov 21, 2025
Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsure if we have have a prometheus supported for this command
so we should check this as well

let metrics = DownloadMetrics::global();
let overall_start = Instant::now();

metrics.downloads_started_total.increment(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this metric is only useful if we make use of this command a lot?

because this command impl only ever does one download

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep removed both - global static is gone and dropped downloads_started_total since it doesn't make sense here

good point about prometheus though, this command doesn't actually expose metrics (no --metrics flag). so kinda pointless right now?

should i just remove them or add the endpoint like stage run has?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should i just remove them or add the endpoint like stage run has?

if the motivation for this pr is to make them observable via prometheus then we should add support for this via a metrics arg imo and launch the prometheus listener if set

- no more global static, just pass it around
- removed unused _started_at field
- simplified the blocking_download function (no closure needed)
- dropped downloads_started_total metric, doesn't make sense here
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants