fix: pin supervisor image to match openshell CLI version#2795
Conversation
The OpenShell gateway defaults to pulling ghcr.io/nvidia/openshell/supervisor:latest for the sandbox supervisor binary. When NVIDIA released v0.0.73 (2026-06-30 15:31 UTC), the :latest tag was re-pointed to a supervisor that drops the Linux capability bounding set (NVIDIA/OpenShell#2001), which crashes with EINVAL in rootless Podman on GitHub Actions runners. Write a gateway.toml that pins supervisor_image to the version from openshell-version.sh so the supervisor always matches the installed gateway and is immune to upstream :latest tag changes. Fixes #2792 Assisted-by: Claude (investigation, fix) Signed-off-by: Wayne Sun <gsun@redhat.com>
PR Summary by QodoPin OpenShell supervisor image to installed CLI version via gateway.toml
AI Description
Diagram
High-Level Assessment
Files changed (2)
|
Site previewPreview: https://4f8247bf-site.fullsend-ai.workers.dev Commit: |
|
🤖 Review · ❌ Terminated · Started 6:42 PM UTC · Ended 6:58 PM UTC |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
🤖 Finished Review · ❌ Failure · Started 6:42 PM UTC · Completed 6:58 PM UTC |
The composite action (action.yml) is the entry point for all agent runs in CI. Changes like pinning the supervisor image (#2795) affect sandbox creation but were not triggering e2e tests. Add action.yml to both the paths trigger and the relevance grep. Motivated-by: #2792 Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
The composite action (action.yml) is the entry point for all agent runs in CI. Changes like pinning the supervisor image (#2795) affect sandbox creation but were not triggering e2e tests. Add action.yml to both the paths trigger and the relevance grep. Motivated-by: #2792 Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
|
🤖 Finished Retro · ❌ Failure · Started 7:13 PM UTC · Completed 7:20 PM UTC |
Summary
ghcr.io/nvidia/openshell/supervisor:${OPENSHELL_VERSION}viagateway.tomlso the in-container supervisor always matches the installed gateway/CLI version.action.yml(production agent runs) andfunctional-tests.yml(e2e tests).Root Cause
OpenShell v0.0.73 was released at 15:31 UTC on June 30, re-pointing the
supervisor:latesttag to a build containing NVIDIA/OpenShell#2001 (fix(supervisor): drop sandbox child capability bounding set). Thecap_drop_bound()call fails withEINVALin rootless Podman on GitHub Actions runners becauseCAP_SETPCAPis unavailable in user namespaces. The supervisor crashes immediately:The gateway defaults to
supervisor:latest(not pinned to its own version), so every new runner after 15:31 UTC pulls the broken v0.0.73 supervisor regardless of what CLI version is installed. This is why reverting the CLI to 0.0.63 (PR #2787) did not fix sandbox creation.Fix
Write
$HOME/.config/openshell/gateway.tomlin the gateway configuration step, settingsupervisor_imageto the version-tagged image matchingOPENSHELL_VERSIONfromopenshell-version.sh. This eliminates the dependency on the:latesttag and ensures version lock between CLI ↔ supervisor.Test plan
functional-tests.ymle2e tests pass with the pinned supervisorFixes #2792