Skip to content

ci: pin oasdiff version to avoid GitHub API rate-limit flake#3739

Merged
VascoSch92 merged 1 commit into
mainfrom
vasco/fix-oasdiff-rate-limit-flake
Jun 15, 2026
Merged

ci: pin oasdiff version to avoid GitHub API rate-limit flake#3739
VascoSch92 merged 1 commit into
mainfrom
vasco/fix-oasdiff-rate-limit-flake

Conversation

@VascoSch92

@VascoSch92 VascoSch92 commented Jun 15, 2026

Copy link
Copy Markdown
Member

HUMAN:

API breakage tests are sometime flaky. This is an effort to make them more reliable.

  • A human has tested these changes.

AGENT:


Why

The REST API breakage checks workflow (.github/workflows/agent-server-rest-api-breakage.yml) is intermittently red in the Install oasdiff step with:

Error: Failed to get oasdiff version.
This could be due to GitHub API rate limiting or network issues.
##[error]Process completed with exit code 1.

This is a CI infra flake, not a real API-breakage finding — the actual Run agent server REST API breakage check step never runs when the install fails.

Root cause (verified): the step pipes the upstream install.sh into sh with no version pin. With no version set, install.sh calls get_latest_release()https://api.github.com/repos/oasdiff/oasdiff/releases/latest to resolve "latest". Unauthenticated GitHub API requests are capped at 60/hr per IP; on shared GitHub-hosted runners that limit is easily exhausted, so the lookup fails intermittently.

Evidence (all from run logs on 2026-06-15, gh CLI):

  • main push (unrelated code), run 27550782864 — failed in Install oasdiff with the exact error above. A failure on main with unrelated code confirms it is environmental, not PR-specific.
  • run 27551171682 — failed in Install oasdiff.
  • run 27551356083 — now success on run_attempt: 2, i.e. a plain re-run cleared it.

Resolves #3738.

Summary

  • Pin oasdiff to version=1.19.1 in the Install oasdiff step so install.sh short-circuits before the GitHub API "latest release" lookup (if [ -n "$version" ]) and downloads the release asset directly — removing the rate-limited API call that caused the flake.
  • Switch curl -Lcurl -fsSL so a transient HTTP error fails fast with a clear message instead of piping an error page into sh.

Note: the linked issue suggested version=1.11.7, but that is only the placeholder in install.sh's error message. The workflow currently tracks "latest" (v1.19.1), so this PR pins to 1.19.1 to keep behavior identical rather than silently downgrading ~8 minor versions.

Issue Number

#3738

How to Test

This is a CI-only change (single workflow step). Verified the pinned install command locally on macOS into a temp dir, observing that it skips the API lookup and installs the pinned version:

$ curl -fsSL https://raw.githubusercontent.com/oasdiff/oasdiff/main/install.sh \
    | INSTALL_DIR=/tmp/oasdiff-test version=1.19.1 sh
Downloading https://github.com/oasdiff/oasdiff/releases/download/v1.19.1/oasdiff_1.19.1_darwin_all.tar.gz
Validating checksum
Extracting tar file
Installed oasdiff to /tmp/oasdiff-test

$ /tmp/oasdiff-test/oasdiff --version
oasdiff version 1.19.1

The download comes from the releases/download/... asset URL (a CDN/release download, not the rate-limited API), so no api.github.com/.../releases/latest request is made. The workflow YAML was also validated with yaml.safe_load.

On CI, the Install oasdiff step should now print oasdiff version 1.19.1 deterministically and never hit the rate-limit error.

Video/Screenshots

N/A — CI workflow change; install output and version captured above.

Type

  • Bug fix
  • Feature
  • Refactor
  • Breaking change
  • Docs / chore

Notes

  • Bump the pin when adopting newer oasdiff features; a comment on the step explains the rationale.
  • A GITHUB_TOKEN on the step (to raise the API limit to 5000/hr) is intentionally not added: with the version pinned, install.sh makes no GitHub API calls at all, so the token would be unused.

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:7393101-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-7393101-python \
  ghcr.io/openhands/agent-server:7393101-python

All tags pushed for this build

ghcr.io/openhands/agent-server:7393101-golang-amd64
ghcr.io/openhands/agent-server:7393101f65a438ef9b4290e8aacb0fd358ad5a50-golang-amd64
ghcr.io/openhands/agent-server:vasco-fix-oasdiff-rate-limit-flake-golang-amd64
ghcr.io/openhands/agent-server:7393101-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:7393101-golang-arm64
ghcr.io/openhands/agent-server:7393101f65a438ef9b4290e8aacb0fd358ad5a50-golang-arm64
ghcr.io/openhands/agent-server:vasco-fix-oasdiff-rate-limit-flake-golang-arm64
ghcr.io/openhands/agent-server:7393101-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:7393101-java-amd64
ghcr.io/openhands/agent-server:7393101f65a438ef9b4290e8aacb0fd358ad5a50-java-amd64
ghcr.io/openhands/agent-server:vasco-fix-oasdiff-rate-limit-flake-java-amd64
ghcr.io/openhands/agent-server:7393101-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:7393101-java-arm64
ghcr.io/openhands/agent-server:7393101f65a438ef9b4290e8aacb0fd358ad5a50-java-arm64
ghcr.io/openhands/agent-server:vasco-fix-oasdiff-rate-limit-flake-java-arm64
ghcr.io/openhands/agent-server:7393101-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:7393101-python-amd64
ghcr.io/openhands/agent-server:7393101f65a438ef9b4290e8aacb0fd358ad5a50-python-amd64
ghcr.io/openhands/agent-server:vasco-fix-oasdiff-rate-limit-flake-python-amd64
ghcr.io/openhands/agent-server:7393101-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:7393101-python-arm64
ghcr.io/openhands/agent-server:7393101f65a438ef9b4290e8aacb0fd358ad5a50-python-arm64
ghcr.io/openhands/agent-server:vasco-fix-oasdiff-rate-limit-flake-python-arm64
ghcr.io/openhands/agent-server:7393101-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:7393101-golang
ghcr.io/openhands/agent-server:7393101f65a438ef9b4290e8aacb0fd358ad5a50-golang
ghcr.io/openhands/agent-server:vasco-fix-oasdiff-rate-limit-flake-golang
ghcr.io/openhands/agent-server:7393101-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:7393101-java
ghcr.io/openhands/agent-server:7393101f65a438ef9b4290e8aacb0fd358ad5a50-java
ghcr.io/openhands/agent-server:vasco-fix-oasdiff-rate-limit-flake-java
ghcr.io/openhands/agent-server:7393101-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:7393101-python
ghcr.io/openhands/agent-server:7393101f65a438ef9b4290e8aacb0fd358ad5a50-python
ghcr.io/openhands/agent-server:vasco-fix-oasdiff-rate-limit-flake-python
ghcr.io/openhands/agent-server:7393101-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 7393101-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 7393101-python-amd64) are also available if needed

The "Install oasdiff" step runs upstream install.sh, which queries the
GitHub API to resolve the latest oasdiff release. Unauthenticated API
requests are capped at 60/hr per IP, so on shared GitHub-hosted runners
the lookup intermittently fails with "Failed to get oasdiff version. This
could be due to GitHub API rate limiting or network issues", failing the
whole check before the breakage step runs.

Pin the version so install.sh skips the API lookup (per the script's own
error-message guidance) and downloads the release asset directly. This
also makes the check deterministic and bumpable via a PR.
@github-actions

Copy link
Copy Markdown
Contributor

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions

Copy link
Copy Markdown
Contributor

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

all-hands-bot commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #3739

🟢 Good taste — Elegant, simple solution.


This is a textbook example of a minimal, well-documented fix for a real CI infrastructure problem.

Analysis

Problem Solved: The PR correctly identifies and addresses a genuine GitHub API rate-limit flake in the CI pipeline. The evidence provided (multiple failed runs with timestamps, root cause traced to get_latest_release()) is compelling and well-documented.

Why Version 1.19.1: The PR description correctly notes that version 1.19.1 was chosen to match "latest" rather than downgrading. This is the right call — pinning to a specific version should match the current behavior until the team intentionally wants to upgrade.

Change Quality:

  • -fsSL flags are appropriate: -f fails fast on HTTP errors (better than piping error pages to shell), -s silences progress, -S shows errors.
  • The inline comments explaining the "why" are justified here — the rate-limit problem is CI infrastructure knowledge that isn't obvious from reading the code alone.

Risk Assessment:

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW
  • This is a CI-only change affecting workflow reliability. No impact on production code paths. Actually improves reliability by eliminating an intermittent failure mode.

VERDICT:Worth merging — Clean, focused fix with excellent documentation. No blocking issues.

KEY INSIGHT: Pinning tool versions in CI is a pragmatic pattern for avoiding external API dependencies. The -f flag addition is a nice hardening touch that prevents silent failures.


This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the workflow’s pinned oasdiff install path installs oasdiff 1.19.1 and removes the unauthenticated GitHub releases/latest API lookup that caused the flake.

Does this PR achieve its stated goal?

Yes. The PR’s goal is to make the REST API breakage workflow’s Install oasdiff step deterministic by pinning oasdiff and avoiding the rate-limited latest-release API call. In a before/after execution of the install script, the old unpinned invocation called https://api.github.com/repos/oasdiff/oasdiff/releases/latest once, while the pinned version=1.19.1 invocation made 0 such calls, downloaded the v1.19.1 release asset directly, and produced oasdiff version 1.19.1. The exact PR workflow command also completed successfully locally.

Phase Result
Environment Setup ✅ Checked out PR branch at 7393101; no project dependency install needed for this CI-only install-step change.
CI Status 🟡 Relevant REST API (OpenAPI) check is green; some broader Agent Server / QA jobs were still in progress when checked.
Functional Verification ✅ Exercised old and new oasdiff install flows, exact pinned workflow command, YAML parsing, and curl HTTP-error behavior.
Functional Verification

Test 1: Baseline unpinned install performs the rate-limited lookup

Step 1 — Reproduce / establish baseline without the fix:
Ran an isolated equivalent of the old unpinned install command:

curl -L https://raw.githubusercontent.com/oasdiff/oasdiff/main/install.sh \
  | INSTALL_DIR=/tmp/oasdiff-qa-z65aun/old-curlL sh -x
/tmp/oasdiff-qa-z65aun/old-curlL/oasdiff --version

Observed:

+ get_latest_release oasdiff/oasdiff
+ curl --silent -L https://api.github.com/repos/oasdiff/oasdiff/releases/latest
+ version=1.19.1
Downloading https://github.com/oasdiff/oasdiff/releases/download/v1.19.1/oasdiff_1.19.1_linux_amd64.tar.gz
Installed oasdiff to /tmp/oasdiff-qa-z65aun/old-curlL
oasdiff version 1.19.1
old-curlL: api.github.com latest-release occurrences = 1

This confirms the old behavior depended on the unauthenticated GitHub latest-release API endpoint before downloading the release asset.

Step 2 — Apply the PR's changes:
Used the PR’s pinned version=1.19.1 environment variable while keeping installation isolated to a temp directory.

Step 3 — Re-run with the fix in place:
Ran:

curl -fsSL https://raw.githubusercontent.com/oasdiff/oasdiff/main/install.sh \
  | INSTALL_DIR=/tmp/oasdiff-qa-z65aun/new-env version=1.19.1 sh -x
/tmp/oasdiff-qa-z65aun/new-env/oasdiff --version

Observed:

+ [ -n 1.19.1 ]
+ version=1.19.1
Downloading https://github.com/oasdiff/oasdiff/releases/download/v1.19.1/oasdiff_1.19.1_linux_amd64.tar.gz
Installed oasdiff to /tmp/oasdiff-qa-z65aun/new-env
oasdiff version 1.19.1
new-env: api.github.com latest-release occurrences = 0

This shows the pin makes install.sh short-circuit the latest-release lookup and still installs the intended 1.19.1 binary.

Test 2: Exact PR workflow command succeeds

Ran the exact install step command from the PR:

curl -fsSL https://raw.githubusercontent.com/oasdiff/oasdiff/main/install.sh \
  | version=1.19.1 sh -s -- -b /usr/local/bin
oasdiff --version

Observed:

Downloading https://github.com/oasdiff/oasdiff/releases/download/v1.19.1/oasdiff_1.19.1_linux_amd64.tar.gz
Validating checksum
Extracting tar file
Installed oasdiff to /usr/local/bin
oasdiff version 1.19.1

This confirms the workflow command a runner would execute completes and leaves oasdiff available at the expected version.

Test 3: curl -fsSL fails fast on HTTP errors

Ran old vs new curl flags against a missing raw GitHub file under bash -o pipefail:

OLD curl -L: downloaded the 404 body, then sh failed with `sh: 1: 404:: not found` and rc=127.
NEW curl -fsSL: failed at curl with `curl: (22) The requested URL returned error: 404` and rc=22.

This confirms the new curl flags produce a clearer failure before piping an HTTP error body into sh.

Test 4: Workflow YAML still parses

Ran:

python - <<'PY'
import pathlib, yaml
path = pathlib.Path('.github/workflows/agent-server-rest-api-breakage.yml')
with path.open() as f:
    data = yaml.safe_load(f)
print(type(data).__name__)
print(data['name'])
print('jobs:', ', '.join(data['jobs'].keys()))
PY

Observed:

dict
REST API breakage checks
jobs: agent-server-rest-api

This confirms the changed workflow file remains valid YAML.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

@VascoSch92 VascoSch92 marked this pull request as ready for review June 15, 2026 14:32

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

The workflow install step was exercised under a simulated GitHub API rate-limit failure; the pinned command avoided the failing API lookup, installed oasdiff 1.19.1, and the installed binary completed a real OpenAPI comparison.

Does this PR achieve its stated goal?

Yes. The stated goal is to remove the REST API breakage workflow flake caused by oasdiff install.sh resolving latest through unauthenticated api.github.com. In a simulated rate-limit scenario, the unpinned baseline failed with Failed to get oasdiff version after calling https://api.github.com/repos/oasdiff/oasdiff/releases/latest, while the PR command with version=1.19.1 completed successfully and made only release-asset/checksum requests.

Phase Result
Environment Setup ✅ PR checkout confirmed at 7393101f65a438ef9b4290e8aacb0fd358ad5a50; exercised the CI shell step with real curl, sh, and oasdiff install behavior.
CI Status 🟡 Observed 39 successful checks, 9 skipped checks, and 1 qa-changes check in progress at review time.
Functional Verification ✅ Baseline failure reproduced under simulated API rate limit; pinned install succeeded and the installed CLI performed a real OpenAPI breaking-change check.
Functional Verification

Test 1: Baseline unpinned install fails when the latest-release API lookup is rate-limited

Step 1 — Reproduce / establish baseline without the fix:
I ran the unpinned install command with a PATH-local curl wrapper that returned an API-rate-limit style JSON response only for https://api.github.com/repos/oasdiff/oasdiff/releases/latest and delegated all other URLs to real curl:

curl -fsSL https://raw.githubusercontent.com/oasdiff/oasdiff/main/install.sh \
  | PATH="$QA_DIR/fakebin:$PATH" sh -s -- -b "$QA_DIR/base-bin"

Observed output:

exit_code=1
Error: Failed to get oasdiff version.
This could be due to GitHub API rate limiting or network issues.

internal curl calls:
--silent -L https://api.github.com/repos/oasdiff/oasdiff/releases/latest

This confirms the old behavior is vulnerable to the reported flake: without version, the installer depends on the unauthenticated GitHub latest-release API call before it can download oasdiff.

Step 2 — Apply the PR's changes:
I reran the same installer path with the PR's version=1.19.1 environment variable while keeping the same simulated API-rate-limit wrapper active.

Step 3 — Re-run with the fix in place:

curl -fsSL https://raw.githubusercontent.com/oasdiff/oasdiff/main/install.sh \
  | PATH="$QA_DIR/fakebin:$PATH" version=1.19.1 sh -s -- -b "$QA_DIR/pr-bin"

Observed output:

exit_code=0
Downloading https://github.com/oasdiff/oasdiff/releases/download/v1.19.1/oasdiff_1.19.1_linux_amd64.tar.gz
Validating checksum
Extracting tar file
Installed oasdiff to /usr/local/bin

internal curl calls:
-sL -O https://github.com/oasdiff/oasdiff/releases/download/v1.19.1/oasdiff_1.19.1_linux_amd64.tar.gz
-sL https://github.com/oasdiff/oasdiff/releases/download/v1.19.1/checksums.txt

api lookup present in PR path? no

This shows the PR path bypasses the releases/latest API lookup entirely and downloads the pinned release asset directly, so the simulated rate-limit condition no longer blocks installation.

Test 2: Installed oasdiff binary performs a real OpenAPI operation

After the pinned install, I invoked the installed CLI against two small OpenAPI specs where the revision adds a non-breaking endpoint:

oasdiff --version
oasdiff breaking "$QA_DIR/specs/base.yaml" "$QA_DIR/specs/revision.yaml"

Observed output:

oasdiff version 1.19.1
No breaking changes to report, but the specs are different.
Run 'oasdiff diff' to see structural differences.
exit_code=0

This verifies more than argument parsing: the pinned install produced an executable oasdiff 1.19.1 binary that can run the workflow-relevant OpenAPI comparison operation successfully.

Issues Found

None.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

@enyst enyst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@VascoSch92 VascoSch92 merged commit c494211 into main Jun 15, 2026
53 checks passed
@VascoSch92 VascoSch92 deleted the vasco/fix-oasdiff-rate-limit-flake branch June 15, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: 'REST API breakage checks' flaky — oasdiff install fails on GitHub API rate limit

3 participants