[BACKPORT 2026.1][build] Improve error reporting and retry for archive downloads (#31364)#31552
[BACKPORT 2026.1][build] Improve error reporting and retry for archive downloads (#31364)#31552hari90 wants to merge 2 commits into
Conversation
…e downloads (yugabyte#31364) ## Summary When the third-party archive checksum download returned an HTML error page instead of the expected `.sha256` file, `download_and_extract_archive.py` only reported the file size, which made the failure hard to diagnose. There was also no retry, so a single transient failure (e.g. a 5xx from GitHub) would fail the build. Example failure: Checksum file size is too big: 55118 bytes ([failing job](https://github.com/yugabyte/yugabyte-db/actions/runs/25144698357/job/73701866705?pr=31359)) ## Changes - `download_url` now passes `-f` to curl so HTTP error responses no longer get written to disk as the requested artifact, and uses `--retry` / `--retry-delay` to retry transient failures (5xx, connection errors). - The "checksum file size is too big" error now includes the first 1024 bytes of the file so the underlying error (e.g. an HTML page) is visible in build logs. --------- Co-authored-by: Claude <noreply@anthropic.com> Original commit: 3ce79c6 / yugabyte#31364
|
Trigger Jenkins |
|
Jenkins build has been triggered. Results will be posted once it completes. CSI JenkinsBot |
There was a problem hiding this comment.
Code Review
This pull request enhances the download and archive extraction utility by implementing retry logic and improved error reporting. Key changes include the addition of download attempt constants, a preview mechanism for oversized checksum files to aid in debugging server errors, and the inclusion of retry and failure flags in the curl command. Feedback was provided regarding the compatibility of the --retry-connrefused flag with older versions of curl present in some supported environments like CentOS 7.
| 'curl', '-LsSf', | ||
| '--retry', str(MAX_DOWNLOAD_ATTEMPTS - 1), | ||
| '--retry-delay', str(RETRY_DELAY_SEC), | ||
| '--retry-connrefused', |
There was a problem hiding this comment.
The --retry-connrefused flag was introduced in curl 7.52.0. Some older environments supported by YugabyteDB, such as CentOS 7, typically come with an older version of curl (e.g., 7.29.0) that does not support this flag. Please verify if the build environment is guaranteed to have a sufficiently recent curl version, or consider making this flag conditional to avoid breaking builds on older platforms.
References
- Focus on substantive issues including correctness and portability across supported environments.
…t the Python level on any curl failure (yugabyte#31427) ## Summary Thirdparty archive downloads occasionally fail with curl exit status 22 on GitHub Actions because curl's `--retry` only retries timeouts and HTTP 408/429/5xx, not transient 403s on GitHub's signed release-asset redirects. `--retry-all-errors` would cover this but requires curl >= 7.71.0, which is unavailable on AlmaLinux 8 / RHEL 8 (curl 7.61.1) and similar runners. Wrap the curl invocation in a Python retry loop instead, so any curl failure is retried regardless of curl version. Fixes yugabyte#31426. ## Test Plan Jenkins: compile only Original commit: 9afef01 / yugabyte#31427
|
Trigger Jenkins |
|
Jenkins build has been triggered. Results will be posted once it completes. CSI JenkinsBot |
|
❌ Jenkins build for commit Errors:
🔨 DB Build/Test Job Summary
JenkinsBot |
Summary
When the third-party archive checksum download returned an HTML error
page
instead of the expected
.sha256file,download_and_extract_archive.pyonly reported the file size, which made the failure hard to diagnose.
There
was also no retry, so a single transient failure (e.g. a 5xx from
GitHub)
would fail the build.
Example failure:
Checksum file size is too big: 55118 bytes
(failing
job)
Changes
download_urlnow passes-fto curl so HTTP error responses nolonger
get written to disk as the requested artifact, and uses
--retry/--retry-delayto retry transient failures (5xx, connection errors).bytes of the file so the underlying error (e.g. an HTML page) is visible
in build logs.
Co-authored-by: Claude noreply@anthropic.com
Original commit: 3ce79c6 / #31364, 9afef01 / #31427
CSI