Open
Description
Problem statement
Rarely we observe failed E2E test runs which do not exhibit any particular symptoms on the surface, e.g. https://github.com/Kong/gateway-operator/actions/runs/10179601076/job/28155676111
test_helm_install_upgrade.go:390: Deployment 538fc6a5-e3b1-4855-afc8-946870a96092/kgo-nightly-to-e2e-4ca1-gateway-operator-controller-manager has no AvailableReplicas
test_helm_install_upgrade.go:390: Failed to get logs from operator pod 538fc6a5-e3b1-4855-afc8-946870a96092/kgo-nightly-to-e2e-4ca1-gateway-operator-controller-manages77wt: container "manager" in pod "kgo-nightly-to-e2e-4ca1-gateway-operator-controller-manages77wt" is waiting to start: trying and failing to pull image
test_helm_install_upgrade.go:390:
Error Trace: /home/runner/work/gateway-operator/gateway-operator/test/e2e/test_helm_install_upgrade.go:390
Error: Received unexpected error:
timed out waiting for operator deployment in namespace 538fc6a5-e3b1-4855-afc8-946870a96092
Test: TestE2E/TestHelmUpgrade/upgrade_from_nightly_to_current
TestE2E/TestHelmUpgrade/upgrade_from_nightly_to_current 2024-07-31T12:05:46Z logger.go:66: Running command helm with args [uninstall --namespace 538fc6a5-e3b1-4855-afc8-946870a96092 kgo-nightly-to-e2e-4ca1]
TestE2E/TestHelmUpgrade/upgrade_from_nightly_to_current 2024-07-31T12:05:47Z logger.go:66: release "kgo-nightly-to-e2e-4ca1" uninstalled
After closer inspects in diagnostics it occurs that the problem is in exceeding the docker hub pull quota:
- apiVersion: v1
count: 3
eventTime: null
firstTimestamp: "2024-07-31T12:02:48Z"
involvedObject:
apiVersion: v1
fieldPath: spec.containers{manager}
kind: Pod
name: kgo-nightly-to-e2e-4ca1-gateway-operator-controller-manages77wt
namespace: 538fc6a5-e3b1-4855-afc8-946870a96092
resourceVersion: "2329"
uid: d713fa65-d7cf-4ab0-b2bf-87fa039e0a57
kind: Event
lastTimestamp: "2024-07-31T12:03:36Z"
message: 'Failed to pull image "docker.io/kong/gateway-operator-oss:nightly": failed
to pull and unpack image "docker.io/kong/gateway-operator-oss:nightly": failed
to copy: httpReadSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/kong/gateway-operator-oss/manifests/sha256:984c624b124aa5d10d5f5c2cf4915e14d5293d930203c8f00228accd736e33ed:
429 Too Many Requests - Server message: toomanyrequests: You have reached your
pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit'
metadata:
creationTimestamp: "2024-07-31T12:02:48Z"
name: kgo-nightly-to-e2e-4ca1-gateway-operator-controller-manages77wt.17e74a90ef43e1a6
namespace: 538fc6a5-e3b1-4855-afc8-946870a96092
resourceVersion: "2431"
uid: c31178bb-13d7-45fb-8e33-2f92dd60e75a
reason: Failed
reportingComponent: kubelet
reportingInstance: 13bd010b-62c4-4262-80fe-ab572c485c89-control-plane
source:
component: kubelet
host: 13bd010b-62c4-4262-80fe-ab572c485c89-control-plane
type: Warning
Proposed solution
- Create a DockerHub token
- Use said token in both integration and E2E tests suite
- https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/#create-a-secret-by-providing-credentials-on-the-command-line can be followed for detailed instructions.
Acceptance criteria
- Sporadic failures due to exceeding quota are not observed.