Skip to content

[BACKPORT 2026.1][build] Improve error reporting and retry for archive downloads (#31364)#31552

Open
hari90 wants to merge 2 commits into
yugabyte:2026.1from
hari90:backport-3ce79c6e6-2026.1
Open

[BACKPORT 2026.1][build] Improve error reporting and retry for archive downloads (#31364)#31552
hari90 wants to merge 2 commits into
yugabyte:2026.1from
hari90:backport-3ce79c6e6-2026.1

Conversation

@hari90
Copy link
Copy Markdown
Contributor

@hari90 hari90 commented May 11, 2026

Summary

When the third-party archive checksum download returned an HTML error
page
instead of the expected .sha256 file,
download_and_extract_archive.py
only reported the file size, which made the failure hard to diagnose.
There
was also no retry, so a single transient failure (e.g. a 5xx from
GitHub)
would fail the build.

Example failure:

Checksum file size is too big: 55118 bytes

(failing
job
)

Changes

  • download_url now passes -f to curl so HTTP error responses no
    longer
    get written to disk as the requested artifact, and uses --retry /
    --retry-delay to retry transient failures (5xx, connection errors).
  • The "checksum file size is too big" error now includes the first 1024
    bytes of the file so the underlying error (e.g. an HTML page) is visible
    in build logs.

Co-authored-by: Claude noreply@anthropic.com

Original commit: 3ce79c6 / #31364, 9afef01 / #31427


CSI

…e downloads (yugabyte#31364)

## Summary

When the third-party archive checksum download returned an HTML error
page
instead of the expected `.sha256` file,
`download_and_extract_archive.py`
only reported the file size, which made the failure hard to diagnose.
There
was also no retry, so a single transient failure (e.g. a 5xx from
GitHub)
would fail the build.

Example failure:

Checksum file size is too big: 55118 bytes

([failing
job](https://github.com/yugabyte/yugabyte-db/actions/runs/25144698357/job/73701866705?pr=31359))

## Changes

- `download_url` now passes `-f` to curl so HTTP error responses no
longer
  get written to disk as the requested artifact, and uses `--retry` /
  `--retry-delay` to retry transient failures (5xx, connection errors).
- The "checksum file size is too big" error now includes the first 1024
bytes of the file so the underlying error (e.g. an HTML page) is visible
  in build logs.

---------

Co-authored-by: Claude <noreply@anthropic.com>

Original commit: 3ce79c6 / yugabyte#31364
@hari90 hari90 requested review from es1024 and svarnau May 11, 2026 23:56
@hari90
Copy link
Copy Markdown
Contributor Author

hari90 commented May 11, 2026

Trigger Jenkins

@hari90
Copy link
Copy Markdown
Contributor Author

hari90 commented May 11, 2026

Jenkins build has been triggered. Results will be posted once it completes. CSI


JenkinsBot

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the download and archive extraction utility by implementing retry logic and improved error reporting. Key changes include the addition of download attempt constants, a preview mechanism for oversized checksum files to aid in debugging server errors, and the inclusion of retry and failure flags in the curl command. Feedback was provided regarding the compatibility of the --retry-connrefused flag with older versions of curl present in some supported environments like CentOS 7.

'curl', '-LsSf',
'--retry', str(MAX_DOWNLOAD_ATTEMPTS - 1),
'--retry-delay', str(RETRY_DELAY_SEC),
'--retry-connrefused',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The --retry-connrefused flag was introduced in curl 7.52.0. Some older environments supported by YugabyteDB, such as CentOS 7, typically come with an older version of curl (e.g., 7.29.0) that does not support this flag. Please verify if the build environment is guaranteed to have a sufficiently recent curl version, or consider making this flag conditional to avoid breaking builds on older platforms.

References
  1. Focus on substantive issues including correctness and portability across supported environments.

…t the Python level on any curl failure (yugabyte#31427)

## Summary

Thirdparty archive downloads occasionally fail with curl exit status 22
on
GitHub Actions because curl's `--retry` only retries timeouts and HTTP
408/429/5xx, not transient 403s on GitHub's signed release-asset
redirects.
`--retry-all-errors` would cover this but requires curl >= 7.71.0, which
is
unavailable on AlmaLinux 8 / RHEL 8 (curl 7.61.1) and similar runners.

Wrap the curl invocation in a Python retry loop instead, so any curl
failure
is retried regardless of curl version.

Fixes yugabyte#31426.

## Test Plan
Jenkins: compile only

Original commit: 9afef01 / yugabyte#31427
@hari90
Copy link
Copy Markdown
Contributor Author

hari90 commented May 12, 2026

Trigger Jenkins

@hari90
Copy link
Copy Markdown
Contributor Author

hari90 commented May 12, 2026

Jenkins build has been triggered. Results will be posted once it completes. CSI


JenkinsBot

@hari90
Copy link
Copy Markdown
Contributor Author

hari90 commented May 12, 2026

Jenkins build for commit 3b6fb731: Fail
CSI
Reason: CSI status: FAIL

Errors:


🔨 DB Build/Test Job Summary

Build Total Passed Failed Failed After Retries
PR31552-alma8-clang21-tsan 11897 10231 12 6
PR31552-arm-alma8-clang21-release 11371 10948 10 7
PR31552-mac14-clang-release 2 2 0 0
PR31552-ubuntu22.04-clang21-debug 2 2 0 0
PR31552-alma8-clang21-release 11373 10951 9 6
PR31552-arm-mac14-clang-release 17 17 0 0
PR31552-alma9-clang21-asan 11990 11253 6 6
PR31552-alma8-gcc12-fastdebug 12115 11651 10 4

JenkinsBot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant