[BACKPORT 2025.1][build] Improve error reporting and retry for archive downloads (#31364)#31554
[BACKPORT 2025.1][build] Improve error reporting and retry for archive downloads (#31364)#31554hari90 wants to merge 2 commits into
Conversation
…e downloads (yugabyte#31364) ## Summary When the third-party archive checksum download returned an HTML error page instead of the expected `.sha256` file, `download_and_extract_archive.py` only reported the file size, which made the failure hard to diagnose. There was also no retry, so a single transient failure (e.g. a 5xx from GitHub) would fail the build. Example failure: Checksum file size is too big: 55118 bytes ([failing job](https://github.com/yugabyte/yugabyte-db/actions/runs/25144698357/job/73701866705?pr=31359)) ## Changes - `download_url` now passes `-f` to curl so HTTP error responses no longer get written to disk as the requested artifact, and uses `--retry` / `--retry-delay` to retry transient failures (5xx, connection errors). - The "checksum file size is too big" error now includes the first 1024 bytes of the file so the underlying error (e.g. an HTML page) is visible in build logs. --------- Co-authored-by: Claude <noreply@anthropic.com> Original commit: 3ce79c6 / yugabyte#31364
There was a problem hiding this comment.
Code Review
This pull request enhances the download process by adding retry logic to the curl command and improving error diagnostics for invalid checksum files. It introduces constants for maximum download attempts and retry delays, and updates the checksum verification to include a content preview when file size limits are exceeded. I have no feedback to provide.
|
Trigger Jenkins |
|
Jenkins build has been triggered. Results will be posted once it completes. CSI JenkinsBot |
…t the Python level on any curl failure (yugabyte#31427) ## Summary Thirdparty archive downloads occasionally fail with curl exit status 22 on GitHub Actions because curl's `--retry` only retries timeouts and HTTP 408/429/5xx, not transient 403s on GitHub's signed release-asset redirects. `--retry-all-errors` would cover this but requires curl >= 7.71.0, which is unavailable on AlmaLinux 8 / RHEL 8 (curl 7.61.1) and similar runners. Wrap the curl invocation in a Python retry loop instead, so any curl failure is retried regardless of curl version. Fixes yugabyte#31426. ## Test Plan Jenkins: compile only Original commit: 9afef01 / yugabyte#31427
|
Trigger Jenkins |
|
Jenkins build has been triggered. Results will be posted once it completes. CSI JenkinsBot |
|
❌ Jenkins build for commit Errors:
Checking test failure count per build versus limit of 20 (0 on mac).
🔨 DB Build/Test Job Summary
JenkinsBot |
Summary
When the third-party archive checksum download returned an HTML error
page
instead of the expected
.sha256file,download_and_extract_archive.pyonly reported the file size, which made the failure hard to diagnose.
There
was also no retry, so a single transient failure (e.g. a 5xx from
GitHub)
would fail the build.
Example failure:
Checksum file size is too big: 55118 bytes
(failing
job)
Changes
download_urlnow passes-fto curl so HTTP error responses nolonger
get written to disk as the requested artifact, and uses
--retry/--retry-delayto retry transient failures (5xx, connection errors).bytes of the file so the underlying error (e.g. an HTML page) is visible
in build logs.
Co-authored-by: Claude noreply@anthropic.com
Original commit: 3ce79c6 / #31364, 9afef01 / #31427
CSI