Skip to content

[KLC-1927] Skip actions/cache on self-hosted runners to fix tar "Cannot open: File exists" errors#71

Open
fbsobreira wants to merge 3 commits into
developfrom
chore/KLC-1927-cache-errors
Open

[KLC-1927] Skip actions/cache on self-hosted runners to fix tar "Cannot open: File exists" errors#71
fbsobreira wants to merge 3 commits into
developfrom
chore/KLC-1927-cache-errors

Conversation

@fbsobreira
Copy link
Copy Markdown
Member

@fbsobreira fbsobreira commented Jun 3, 2026

Problem

Workflows on the self-hosted klever-pipe runners flood the Actions UI with /usr/bin/tar: Cannot open: File exists annotations during cache restore (hundreds of thousands per run), masking real failures. The runners persist ~/go/pkg/mod and ./vendor between runs, so actions/cache@v4 extracts its tarball over files that already exist and errors on every collision.

Change

  • Gate the Cache Go dependencies step to ephemeral GitHub-hosted runners only:
    if: inputs.cache-enabled == 'true' && runner.environment == 'github-hosted'.
    On self-hosted runners the step is skipped; the existing cache-hit-gated
    download/vendor steps regenerate from the persistent dirs.
  • Remove the legacy enableCrossOsArchive: true flag.

Why runner.environment

The root cause is self-hosted persistence, not the klever-pipe label specifically. Gating on runner.environment covers any self-hosted runner and fails safe (a missing value still skips the cache on self-hosted). Ephemeral ubuntu/macos jobs keep caching unchanged.

Also: Node.js 24 runtime bumps

Node.js 20 is deprecated on GitHub Actions runners (forced to Node 24 from 2026-06-16, removed 2026-09-16). Bumped every action still on node20 to its latest Node 24 major. Inputs we use are unchanged across the bumps.

GitHub-owned:

Action From To
actions/setup-go v5 v6
actions/cache v4 v5
actions/upload-artifact v5 v7
actions/download-artifact v6 v8

Third-party:

Action From To
sonarsource/sonarqube-scan-action v6 v8
softprops/action-gh-release v2 v3
google-github-actions/auth v2 v3
google-github-actions/upload-cloud-storage v2 v3
docker/setup-qemu-action v3 v4
docker/setup-buildx-action v3 v4
docker/login-action v3 v4

These majors require Actions Runner ≥ 2.327.1; klever-pipe is on 2.334.0. Already Node 24 and left as-is: actions/checkout@v6, golangci/golangci-lint-action@v9. Composite (unaffected): sonarsource/sonarqube-quality-gate-action, twingate/github-action.

Notable behavior changes verified safe for our usage: sonarqube-scan-action@v8 enables scanner signature verification by default; auth@v3 retains credentials_json; upload-cloud-storage@v3 retains predefinedAcl/parent; setup-buildx@v4 retains driver-opts.

Notes for reviewers

klever-pipe is a multi-runner pool, so standalone creds-less jobs (release-docker → build-multi-arch, push → validate-and-build, pr-qa-sec → test) rely on each runner's persistent module dirs after the cache change. The common case is unaffected. If a new private klever-io/* module is added and a creds-less job lands on a runner that lacks it, pass GIT_USER/GIT_PASS to that job's go-setup-action step.

Verification

  • Trigger release-docker.yaml on klever-pipe → 0 Cannot open: File exists annotations
  • Trigger go-setup-lint.yaml (PR / workflow_call) → 0 annotations
  • No Node.js 20 deprecation warnings remain
  • Go module / vendor resolution still succeeds (cache hit on ephemeral, cold rebuild on self-hosted)
  • No CI duration regression

Summary

This PR updates CI to skip the GitHub Actions cache restore on self-hosted runners and bumps several actions to Node.js 24–compatible majors.

What changed

  • .github/actions/go-setup-action/action.yml
    • "Cache Go dependencies" step now runs only when:
      if: inputs.cache-enabled == 'true' && runner.environment == 'github-hosted'
      (previously only checked inputs.cache-enabled).
    • actions/cache upgraded v4 → v5; removed enableCrossOsArchive: true.
    • actions/setup-go upgraded v5 → v6.
    • Resolve/download/vendor steps remain and run when cache miss.
  • Workflows updated to newer action majors (no logic changes):
    • actions/upload-artifact v5 → v7 (go-setup-lint.yaml, push.yaml, pr-qa-sec.yaml)
    • actions/download-artifact v6 → v8 (pr-qa-sec.yaml)
    • sonarsource/sonarqube-scan-action v6 → v8
    • docker/setup-qemu-action, docker/setup-buildx-action, docker/login-action v3 → v4 (release-docker.yaml)
    • softprops/action-gh-release v2 → v3; google-github-actions/auth v2 → v3; google-github-actions/upload-cloud-storage v2 → v3 (release.yaml)
    • Other noted bumps to align with Node.js 24 and Actions Runner ≥ 2.327.1 (klever-pipe runs 2.334.0).

Rationale

Self-hosted klever-pipe runners persist ~/go/pkg/mod and ./vendor between runs. actions/cache was extracting tarballs over existing files on those runners, producing many "/usr/bin/tar: Cannot open: File exists" annotations that hid real failures. Restricting the cache restore to GitHub-hosted runners avoids these noisy tar errors while preserving caching behavior on ephemeral hosted runners. On self-hosted runners, persistent directories continue to be used and the download/vendor steps regenerate dependencies when needed.

Verification checklist (kept)

  • release-docker.yaml on klever-pipe → 0 "Cannot open: File exists" annotations
  • go-setup-lint.yaml → 0 such annotations
  • No Node.js 20 deprecation warnings
  • Go module/vendor resolution: cache hits on ephemeral runners; cold rebuilds on self-hosted
  • No CI duration regression

Impact on blockchain-critical components

  • No changes to application source code touching consensus, transaction processing, state management, KVM, or networking.
  • Build and packaging commands (make build-*, tar creation, upload to GCS) are unchanged; produced artifacts (including KVM/wasmer2 libs) and their packaging remain identical.
  • No impact to node stability, runtime behavior, or on-disk data integrity.
  • Cross-cutting: improves CI signal by removing spurious tar-related annotations; does not change production concurrency or error-handling logic.

Reviewer notes

  • klever-pipe is a multi-runner pool; some creds-less jobs rely on persisted module dirs. If a new private klever-io/* module is added and a creds-less job runs on a runner lacking it, pass GIT_USER/GIT_PASS to that job’s go-setup-action step.

…sions

The klever-pipe self-hosted runners persist ~/go/pkg/mod and ./vendor
across runs, so actions/cache restores its tarball over files that already
exist and emits "/usr/bin/tar: Cannot open: File exists" for every
collision, flooding the Actions UI with error annotations even on success.

Gate the cache step to GitHub-hosted runners only (runner.environment ==
'github-hosted'), where the runner is ephemeral and the cache still helps.
On self-hosted runners the step is skipped and the existing
cache-hit-gated download/vendor steps regenerate from the persistent dirs.

Also drop the legacy enableCrossOsArchive flag.
Copilot AI review requested due to automatic review settings June 3, 2026 20:28
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

Walkthrough

Upgrade GitHub Actions step major versions across CI and release workflows; refine the Go setup composite action by switching to actions/setup-go@v6, requiring GitHub-hosted runners for dependency caching, updating cache to actions/cache@v5 and removing enableCrossOsArchive, and bumping upload/download and other third-party action majors (upload-artifact, download-artifact, sonarsource, docker tooling, softprops, google auth/upload).

Changes

Actions and caching updates

Layer / File(s) Summary
Go setup: setup-go and cache refinements
.github/actions/go-setup-action/action.yml
Use actions/setup-go@v6; run cache step only when inputs.cache-enabled == 'true' AND runner.environment == 'github-hosted'; update cache action to actions/cache@v5; remove enableCrossOsArchive: true.
Workflow artifact action upgrades
.github/workflows/go-setup-lint.yaml, .github/workflows/pr-qa-sec.yaml, .github/workflows/push.yaml
Bump actions/upload-artifact usages to @v7 and actions/download-artifact usages to @v8 where present; artifact names/paths unchanged.
PR QA: SonarQube scan action updates
.github/workflows/pr-qa-sec.yaml
Upgrade sonarsource/sonarqube-scan-action steps from @v6 to @v8; download/upload artifact steps already updated.
Release Docker: build tooling action upgrades
.github/workflows/release-docker.yaml
Upgrade Docker/QEMU/Buildx/login actions from @v3 to @v4 in build-multi-arch job; parameters unchanged.
Release pipeline: GitHub release and GCS upload action upgrades
.github/workflows/release.yaml
Upgrade softprops/action-gh-release to @v3, google-github-actions/auth to @v3, and google-github-actions/upload-cloud-storage to @v3; upload paths and inputs unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

needs-tests

🚥 Pre-merge checks | ✅ 8
✅ Passed checks (8 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title follows required format [KLC-XXXX] type: description with 'chore' type, and accurately describes the core change: skipping cache on self-hosted runners to fix tar errors.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Concurrency Safety ✅ Passed PR modifies only GitHub Actions workflow configuration files (YAML), not concurrent Go source code. No goroutines, channels, mutexes, or sync primitives are affected.
Error Handling ✅ Passed PR contains only GitHub Actions version updates and a conditional runner check; no new error suppression patterns, unchecked errors, or bare panic calls introduced.
State Consistency ✅ Passed PR only modifies .github/ (workflows, actions) and build configs. No blockchain state, accounts, balances, or storage code changes detected. Check not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/KLC-1927-cache-errors

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the shared Go composite action to avoid noisy tar: Cannot open: File exists cache-restore annotations on persistent self-hosted runners by skipping actions/cache when the runner isn’t GitHub-hosted.

Changes:

  • Gate the actions/cache@v4 “Cache Go dependencies” step to GitHub-hosted runners only using runner.environment == 'github-hosted'.
  • Remove the legacy enableCrossOsArchive: true cache option.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 3, 2026
Node.js 20 is deprecated on GitHub Actions runners (forced to Node 24 from
2026-06-16, removed 2026-09-16). Bump the actions still on node20 to their
latest Node 24 majors:

  actions/setup-go        v5 -> v6
  actions/cache           v4 -> v5
  actions/upload-artifact v5 -> v7
  actions/download-artifact v6 -> v8

The klever-pipe self-hosted runner is 2.334.0, above the 2.327.1 minimum
these majors require. No params used here changed across the bumps.
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 3, 2026
Remaining node20 JS actions flagged by the deprecation notice. Bumped to
their latest Node 24 majors (inputs we use are unchanged):

  sonarsource/sonarqube-scan-action          v6 -> v8
  softprops/action-gh-release                v2 -> v3
  google-github-actions/auth                 v2 -> v3
  google-github-actions/upload-cloud-storage v2 -> v3
  docker/setup-qemu-action                   v3 -> v4
  docker/setup-buildx-action                 v3 -> v4
  docker/login-action                        v3 -> v4

sonarqube-scan-action v8 enables scanner signature verification by default
(skipSignatureVerification now false); auth v3 keeps credentials_json;
upload-cloud-storage v3 keeps predefinedAcl/parent; setup-buildx v4 keeps
driver-opts. quality-gate-action and twingate are composite (unaffected);
golangci-lint-action and actions/checkout are already Node 24.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
.github/workflows/pr-qa-sec.yaml (1)

126-142: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Set skipSignatureVerification (or install gpg/dirmngr) for sonarqube-scan-action@v8 on klever-pipe

pr-qa-sec.yaml runs sonarsource/sonarqube-scan-action@v8 on runs-on: klever-pipe (two scan steps) without skipSignatureVerification. In v8 the default is false, so the scan enforces Scanner CLI signature verification and requires gpg + dirmngr on self-hosted runners; if either is missing on any runner, the job can hard-fail.

🛡️ Fallback if the pool can’t guarantee gpg/dirmngr (apply to both Sonar scan steps)
       - name: SonarQube Scan
         uses: sonarsource/sonarqube-scan-action@v8
         env:
           SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
           SONAR_HOST_URL: ${{ secrets.SONAR_HOST_URL }}
         with:
+          skipSignatureVerification: true
           args: >

Prefer installing gpg and dirmngr on all klever-pipe runners and only use skipSignatureVerification: true as a compatibility fallback.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/pr-qa-sec.yaml around lines 126 - 142, The SonarQube scan
steps using sonarsource/sonarqube-scan-action@v8 can hard-fail on the
self-hosted klever-pipe runners because signature verification requires
gpg/dirmngr; update both Sonar scan steps to either install gpg and dirmngr on
the runner before the action (e.g., add a prep step that installs gpg + dirmngr)
or add the action input skipSignatureVerification: true to both
sonarqube-scan-action@v8 usages (preferred only as a fallback) so signature
verification won’t hard-fail when gpg/dirmngr are absent.
.github/workflows/release.yaml (1)

52-56: ⚠️ Potential issue | 🟠 Major | ⚖️ Poor tradeoff

Pass git credentials to the build job’s go-setup-action (cache miss can break private module fetch).

In .github/workflows/release.yaml (Go Setup, ~lines 52-56), the build matrix runs ./.github/actions/go-setup-action without git-user/git-pass. The composite action configures authenticated Git access only when both inputs are set, so go mod tidy / go mod download on GitHub-hosted runners can fail when private github.com/klever-io/* modules are required and the cache is cold.

Although setup-and-lint passes credentials, it runs on runs-on: klever-pipe, and the action’s cache step is gated to runner.environment == 'github-hosted', so it won’t populate the cache that the build job relies on.

🔐 Proposed fix
     - name: Go Setup
       uses: ./.github/actions/go-setup-action
       with:
         go-version: ${{ vars.GO_VERSION || '1.25.7' }}
+        git-user: ${{ secrets.GIT_USER }}
+        git-pass: ${{ secrets.GIT_PASS }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/release.yaml around lines 52 - 56, The Go setup composite
action is invoked without credentials, so private module fetches can fail on
cache miss; update the step that uses ./.github/actions/go-setup-action to pass
git-user and git-pass inputs (e.g., git-user: ${{ github.actor }} and git-pass:
${{ secrets.GITHUB_TOKEN }}) so the action’s authenticated Git configuration is
enabled when running the build; modify the step that contains uses:
./.github/actions/go-setup-action to include these two inputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/release-docker.yaml:
- Around line 80-82: The current workflow uses docker/setup-qemu-action@v4 with
platforms: linux/amd64,linux/arm64 which is valid because the action still
accepts a comma-separated platforms input; leave the platforms line as-is but
for supply-chain hardening replace the floating tag docker/setup-qemu-action@v4
with a specific commit SHA (i.e., pin the action) while keeping the platforms
input unchanged.

In @.github/workflows/release.yaml:
- Line 85: Replace mutable action tags with immutable commit SHAs for each
"uses:" entry (e.g., change softprops/action-gh-release@v3 to the specific
commit SHA) and add the human-readable tag as an inline comment for clarity;
update all occurrences called out in the review (the softprops/action-gh-release
usage and the other listed "uses:" lines) so the workflow pins to the exact
commit SHA while keeping the original tag as a comment.

---

Outside diff comments:
In @.github/workflows/pr-qa-sec.yaml:
- Around line 126-142: The SonarQube scan steps using
sonarsource/sonarqube-scan-action@v8 can hard-fail on the self-hosted
klever-pipe runners because signature verification requires gpg/dirmngr; update
both Sonar scan steps to either install gpg and dirmngr on the runner before the
action (e.g., add a prep step that installs gpg + dirmngr) or add the action
input skipSignatureVerification: true to both sonarqube-scan-action@v8 usages
(preferred only as a fallback) so signature verification won’t hard-fail when
gpg/dirmngr are absent.

In @.github/workflows/release.yaml:
- Around line 52-56: The Go setup composite action is invoked without
credentials, so private module fetches can fail on cache miss; update the step
that uses ./.github/actions/go-setup-action to pass git-user and git-pass inputs
(e.g., git-user: ${{ github.actor }} and git-pass: ${{ secrets.GITHUB_TOKEN }})
so the action’s authenticated Git configuration is enabled when running the
build; modify the step that contains uses: ./.github/actions/go-setup-action to
include these two inputs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6e346c22-bc65-4f80-93de-4d28e0a2bd84

📥 Commits

Reviewing files that changed from the base of the PR and between 43ed415 and a94cd5a.

📒 Files selected for processing (3)
  • .github/workflows/pr-qa-sec.yaml
  • .github/workflows/release-docker.yaml
  • .github/workflows/release.yaml
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🧰 Additional context used
🪛 zizmor (1.25.2)
.github/workflows/release.yaml

[error] 85-85: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)


[info] 85-85: action functionality is already included by the runner (superfluous-actions): use gh release in a script step

(superfluous-actions)


[error] 106-106: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)


[error] 111-111: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)


[error] 118-118: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)


[error] 125-125: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)


[error] 132-132: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

.github/workflows/release-docker.yaml

[error] 80-80: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)


[error] 86-86: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)


[error] 94-94: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

.github/workflows/pr-qa-sec.yaml

[error] 127-127: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)


[error] 145-145: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🔇 Additional comments (8)
.github/workflows/release-docker.yaml (2)

94-97: ⚡ Quick win

Validate docker/login-action@v4 input compatibility (username/password).

docker/login-action@v4 exists and continues to support the with.username and with.password inputs for authenticating to a container registry—your workflow snippet is aligned with the expected interface.


86-90: ⚡ Quick win

docker/setup-buildx-action@v4 is compatible with driver-opts. driver-opts remains a supported input in v4 and is passed through to docker buildx create as a newline-delimited list, so the multiline image=moby/buildkit:latest / network=host configuration is still valid.

.github/workflows/pr-qa-sec.yaml (3)

144-162: Same v8 signature-verification breaking change applies here.

The SonarQube Go detailed scan PR step has the identical exposure as the SonarQube Scan step above; whatever resolution is chosen (install gpg/dirmngr on the pool, or set skipSignatureVerification: true) must be applied to both.


84-94: LGTM!


112-120: download-artifact@v8 is compatible with the upload-artifact@v7 producers.

The test-artifacts upload (Line 86) and the lint-results upload in go-setup-lint.yaml both use the default archive behavior (zipped), and to support direct uploads, download-artifact@v8 no longer attempts to unzip all downloaded files; instead it checks the Content-Type header ahead of unzipping and skips non-zipped files. Since the producers stay zipped, decompression still works. Note that v8 lets you configure behavior on a download hash mismatch via the digest-mismatch parameter, and the default is now error which will fail the workflow run — fail-fast on corruption, which is the desired behavior here.

.github/workflows/release.yaml (3)

106-108: ⚡ Quick win

google-github-actions/auth@v3 keeps credentials_json as a supported input.
The official v3 docs/README and the v3.0.0 release notes still document credentials_json as the required Service Account Key JSON input for google-github-actions/auth@v3 (same with: credentials_json: ... pattern) with no documented breaking change for this field, so the workflow snippet should remain compatible.


85-85: ⚡ Quick win

softprops/action-gh-release@v3 input compatibility is OK
files, generate_release_notes, draft, and prerelease are supported inputs in softprops/action-gh-release@v3 and align with the workflow’s usage. The v2→v3 breaking change is the Node runtime upgrade (Node 20 → Node 24); there are no documented functional breaking changes affecting these inputs.


111-111: ⚡ Quick win

Verify google-github-actions/upload-cloud-storage@v3 input compatibility (predefinedAcl, parent). [verify_review_comment: critical_fixes_required]
The v3.0.0 release notes mainly call out the Node 24 runner requirement, but there’s no evidence here that predefinedAcl (used in all steps) and parent (used only in the last step: parent: false) keep the same schema/semantics in v3. Confirm v3 still supports these inputs and that they apply the same ACL/path behavior; otherwise uploads could end up with incorrect object ACLs and/or different destination path structure.

Comment thread .github/workflows/release-docker.yaml
Comment thread .github/workflows/release.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants