Skip to content

BATS: helm-install-rancher: wait for Available instead of helm --wait#10366

Merged
mook-as merged 1 commit into
rancher-sandbox:mainfrom
jandubois:bats-helm-rancher-progress-deadline
May 27, 2026
Merged

BATS: helm-install-rancher: wait for Available instead of helm --wait#10366
mook-as merged 1 commit into
rancher-sandbox:mainfrom
jandubois:bats-helm-rancher-progress-deadline

Conversation

@jandubois
Copy link
Copy Markdown
Member

@jandubois jandubois commented May 27, 2026

Summary

  • The rancher Deployment uses the default progressDeadlineSeconds (10m), which is shorter than the initial rancher/rancher image pull (~487 MB) on slow networks (notably WSL2 on Windows). When the deadline lapses Kubernetes marks the Deployment Failed with ProgressDeadlineExceeded, and helm --wait reads that and bails out before its own --timeout would have allowed the pull to complete.
  • Drop helm --wait and wait on the Deployment's Available condition directly with kubectl wait --for=condition=Available --timeout=30m. Available is driven by Ready replica count and is unaffected by ProgressDeadlineExceeded, so once the image pull eventually completes and the pod goes Ready the wait succeeds.
  • Also --set replicas=1 since this is a single-node cluster — running three Rancher replicas adds no test coverage and just consumes scarce VM memory.

Symptom before the fix

On a Windows runner, deploy_rancher consistently failed:

Release "rancher" does not exist. Installing it now.
Error: resource Deployment/cattle-system/rancher not ready. status: Failed,
  message: Progress deadline exceeded

while k3s.log showed the rancher/rancher:v2.11.0 image still downloading at ~700 kB/s; the 487 MB pull took ~11 minutes vs the chart Deployment's 10 minute progress deadline.

The rancher Deployment uses the default `progressDeadlineSeconds` (10m),
which is shorter than the initial `rancher/rancher` image pull (~487 MB)
on slow networks (notably WSL2 on Windows). When that deadline lapses,
Kubernetes marks the Deployment Failed with `ProgressDeadlineExceeded`;
`helm --wait` reads that condition and bails out before its own
`--timeout` would have allowed the pull to complete.

Drop `helm --wait` and wait on `Available` directly with
`kubectl wait --for=condition=Available --timeout=30m`. `Available` is
driven by Ready replica count and is unaffected by
`ProgressDeadlineExceeded`, so once the pull eventually completes and
the pod goes Ready the wait succeeds.

Also `--set replicas=1` since this is a single-node cluster — running
three Rancher replicas adds no test coverage and consumes scarce VM
memory.

Signed-off-by: Jan Dubois <jan.dubois@suse.com>
@jandubois jandubois force-pushed the bats-helm-rancher-progress-deadline branch from 20d4400 to d742a10 Compare May 27, 2026 00:35
@mook-as mook-as merged commit 45f602a into rancher-sandbox:main May 27, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants