[BACKPORT 2025.2][build] Improve error reporting and retry for archive downloads (#31364)#31553
Open
hari90 wants to merge 2 commits into
Open
[BACKPORT 2025.2][build] Improve error reporting and retry for archive downloads (#31364)#31553hari90 wants to merge 2 commits into
hari90 wants to merge 2 commits into
Conversation
…e downloads (yugabyte#31364) ## Summary When the third-party archive checksum download returned an HTML error page instead of the expected `.sha256` file, `download_and_extract_archive.py` only reported the file size, which made the failure hard to diagnose. There was also no retry, so a single transient failure (e.g. a 5xx from GitHub) would fail the build. Example failure: Checksum file size is too big: 55118 bytes ([failing job](https://github.com/yugabyte/yugabyte-db/actions/runs/25144698357/job/73701866705?pr=31359)) ## Changes - `download_url` now passes `-f` to curl so HTTP error responses no longer get written to disk as the requested artifact, and uses `--retry` / `--retry-delay` to retry transient failures (5xx, connection errors). - The "checksum file size is too big" error now includes the first 1024 bytes of the file so the underlying error (e.g. an HTML page) is visible in build logs. --------- Co-authored-by: Claude <noreply@anthropic.com> Original commit: 3ce79c6 / yugabyte#31364
Contributor
There was a problem hiding this comment.
Code Review
This pull request implements retry logic for archive downloads by adding MAX_DOWNLOAD_ATTEMPTS and RETRY_DELAY_SEC constants and updating the curl command with retry flags. It also improves error reporting in verify_sha256sum by providing a preview of the file content when the checksum file size is unexpectedly large. I have no feedback to provide.
Contributor
Author
|
Trigger Jenkins |
Contributor
Author
|
Jenkins build has been triggered. Results will be posted once it completes. CSI JenkinsBot |
…t the Python level on any curl failure (yugabyte#31427) ## Summary Thirdparty archive downloads occasionally fail with curl exit status 22 on GitHub Actions because curl's `--retry` only retries timeouts and HTTP 408/429/5xx, not transient 403s on GitHub's signed release-asset redirects. `--retry-all-errors` would cover this but requires curl >= 7.71.0, which is unavailable on AlmaLinux 8 / RHEL 8 (curl 7.61.1) and similar runners. Wrap the curl invocation in a Python retry loop instead, so any curl failure is retried regardless of curl version. Fixes yugabyte#31426. ## Test Plan Jenkins: compile only Original commit: 9afef01 / yugabyte#31427
Contributor
Author
|
Trigger Jenkins |
Contributor
Author
|
Jenkins build has been triggered. Results will be posted once it completes. CSI JenkinsBot |
Contributor
Author
|
❌ Jenkins build for commit Errors:
Checking for number of tests planned versus executed.
🔨 DB Build/Test Job Summary
JenkinsBot |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When the third-party archive checksum download returned an HTML error
page
instead of the expected
.sha256file,download_and_extract_archive.pyonly reported the file size, which made the failure hard to diagnose.
There
was also no retry, so a single transient failure (e.g. a 5xx from
GitHub)
would fail the build.
Example failure:
Checksum file size is too big: 55118 bytes
(failing
job)
Changes
download_urlnow passes-fto curl so HTTP error responses nolonger
get written to disk as the requested artifact, and uses
--retry/--retry-delayto retry transient failures (5xx, connection errors).bytes of the file so the underlying error (e.g. an HTML page) is visible
in build logs.
Co-authored-by: Claude noreply@anthropic.com
Original commit: 3ce79c6 / #31364, 9afef01 / #31427
CSI