Description
What happened:
- Background:
ingress-nginx-controller
zero downtime upgrade investigation. - Strategy: I used
helm upgrade --reuse-values
command to complete upgrade.
The system operates smoothly if no requests are sent during the upgrade period. However, when using Grafana K6
to monitor the frequency of HTTPS requests, an error occurs as the new controller pod is fully initialized and the old pod begins to terminate. This issue only lasts for a brief moment, yet it can be consistently reproduced.
And here is the K6
test log:
$ sh run.sh
/\ Grafana /‾‾/
/\ / \ |\ __ / /
/ \/ \ | |/ / / ‾‾\
/ \ | ( | (‾) |
/ __________ \ |_|\_\ \_____/
execution: local
script: script.js
output: -
scenarios: (100.00%) 1 scenario, 1024 max VUs, 2m30s max duration (incl. graceful stop):
* default: 1024 looping VUs for 2m0s (gracefulStop: 30s)
WARN[0087] Request Failed error="Post \"http://my-hostname/v1/tests/post\": EOF"
WARN[0087] Request Failed error="Post \"http://my-hostname/v1/tests/post\": read tcp 10.59.89.82:59064->10.47.104.129:80: wsarecv: An existing connection was forcibly closed by the remote host."
WARN[0087] Request Failed error="Post \"http://my-hostname/v1/tests/post\": EOF"
WARN[0087] Request Failed error="Post \"http://my-hostname/v1/tests/post\": read tcp 10.59.89.82:59082->10.47.104.129:80: wsarecv: An existing connection was forcibly closed by the remote host."
WARN[0087] Request Failed error="Post \"http://my-hostname/v1/tests/post\": EOF"
data_received..................: 37 MB 295 kB/s
data_sent......................: 12 MB 93 kB/s
http_req_blocked...............: avg=23.5ms min=0s med=0s max=731.46ms p(90)=0s p(95)=510.49µs
http_req_connecting............: avg=14.79ms min=0s med=0s max=343.54ms p(90)=0s p(95)=0s
http_req_duration..............: avg=2.81s min=3.12ms med=2.8s max=10.18s p(90)=4.82s p(95)=5.07s
{ expected_response:true }...: avg=2.81s min=313.71ms med=2.81s max=10.18s p(90)=4.83s p(95)=5.07s
http_req_failed................: 0.26% 117 out of 43956
http_req_receiving.............: avg=468.21µs min=0s med=0s max=14.93ms p(90)=987µs p(95)=2.21ms
http_req_sending...............: avg=21.26µs min=0s med=0s max=8.52ms p(90)=0s p(95)=0s
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=2.81s min=3.12ms med=2.8s max=10.18s p(90)=4.82s p(95)=5.07s
http_reqs......................: 43956 350.979203/s
iteration_duration.............: avg=2.83s min=13.56ms med=2.82s max=10.18s p(90)=4.85s p(95)=5.09s
iterations.....................: 43956 350.979203/s
vus............................: 10 min=10 max=1024
vus_max........................: 1024 min=1024 max=1024
running (2m05.2s), 0000/1024 VUs, 43956 complete and 0 interrupted iterations
default ✓ [======================================] 1024 VUs 2m0s
During this period, I encounter numerous empty responses, and there are no error logs in the ingress-nginx-controller pod. However, if a TCP connection has been established prior to this, it remains uninterrupted (tested it by telnet ${my-tcp-service} ${port}
command).
So I want to confirm if it's the upgrade caused short-lived service interruption of the ingress-nginx-controller
?
What you expected to happen:
No warnings should occur throughout the upgrade process, and any requests should be handled whether or not the returned status code is 200
.
NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version
): v1.11.2 & v1.11.3
Kubernetes version (use kubectl version
):
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.5
Environment:
-
Cloud provider or hardware configuration: I used Gardener to control all clusters, so I have no permissions to check it.
-
OS (e.g. from /etc/os-release): linux-amd64
-
Kernel (e.g.
uname -a
): -
Install tools:
Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
-
Basic cluster related info:
kubectl get nodes -o wide
$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME shoot--gtlcdevqa--dylan-test-worker-tt7i7-z1-6db57-25mdw Ready <none> 88m v1.30.5 10.180.0.213 <none> Garden Linux 1592.3 6.6.62-cloud-amd64 containerd://1.7.20 shoot--gtlcdevqa--dylan-test-worker-tt7i7-z1-6db57-mn8zv Ready <none> 89m v1.30.5 10.180.0.187 <none> Garden Linux 1592.3 6.6.62-cloud-amd64 containerd://1.7.20
-
How was the ingress-nginx-controller installed:
- If helm was used then please show output of
helm ls -A | grep -i ingress
$ helm ls -A | grep -i ingress ingress-nginx ingress-nginx 28 2024-11-18 16:34:27.1373854 +0800 CST deployed ingress-nginx-4.11.3 1.11.3
- If helm was used then please show output of
helm -n <ingresscontrollernamespace> get values <helmreleasename>
$ helm -n ingress-nginx get values ingress-nginx USER-SUPPLIED VALUES: controller: allowSnippetAnnotations: true config: client-body-timeout: "360" proxy-body-size: 1024m proxy-buffer-size: 16k proxy-connect-timeout: "30" proxy-read-timeout: "3600" proxy-send-timeout: "900" proxy-set-headers: ingress-nginx/custom-headers extraArgs: configmap: $(POD_NAMESPACE)/ingress-nginx-controller controller-class: k8s.io/ingress-nginx default-ssl-certificate: ingress-nginx/gtlconlycert enable-ssl-passthrough: "true" ingress-class: nginx publish-service: $(POD_NAMESPACE)/ingress-nginx-controller tcp-services-configmap: $(POD_NAMESPACE)/ingress-nginx-tcp validating-webhook: :8443 validating-webhook-certificate: /usr/local/certificates/cert validating-webhook-key: /usr/local/certificates/key watch-ingress-without-class: "true" metrics: enabled: true service: annotations: prometheus.io/port: "10254" prometheus.io/scrape: "true" serviceMonitor: enabled: true namespace: kube-prometheus-stack scrapeInterval: 500ms tcp: "31080": prod/blackduck-report:1081
- If helm was used then please show output of
-
Current State of the controller:
kubectl describe ingressclasses
Name: nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.11.3 helm.sh/chart=ingress-nginx-4.11.3 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx Controller: k8s.io/ingress-nginx Events: <none>
kubectl -n <ingresscontrollernamespace> get all -A -o wide
$ kubectl -n ingress-nginx get all -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/ingress-nginx-controller-67fbb67c7b-tpfpt 1/1 Running 0 3d22h 100.64.1.23 shoot--gtlcdevqa--dylan-test-worker-tt7i7-z1-6db57-25mdw <none> <none> NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR service/ingress-nginx-controller LoadBalancer 100.111.24.47 10.47.104.129 80:31686/TCP,443:32033/TCP,31080:31568/TCP 25d app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx service/ingress-nginx-controller-admission ClusterIP 100.106.5.80 <none> 443/TCP 25d app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx service/ingress-nginx-controller-metrics ClusterIP 100.110.133.77 <none> 10254/TCP 14d app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR deployment.apps/ingress-nginx-controller 1/1 1 1 25d controller registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR replicaset.apps/ingress-nginx-controller-56bcbbf9bc 0 0 0 4d1h controller registry.k8s.io/ingress-nginx/controller:v1.11.2@sha256:d5f8217feeac4887cb1ed21f27c2674e58be06bd8f5184cacea2a69abaf78dce app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=56bcbbf9bc replicaset.apps/ingress-nginx-controller-67fbb67c7b 1 1 1 4d1h controller registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=67fbb67c7b
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
$ kubectl describe po -n ingress-nginx ingress-nginx-controller-67fbb67c7b-tpfpt Name: ingress-nginx-controller-67fbb67c7b-tpfpt Namespace: ingress-nginx Priority: 0 Service Account: ingress-nginx Node: shoot--gtlcdevqa--dylan-test-worker-tt7i7-z1-6db57-25mdw/10.180.0.213 Start Time: Fri, 22 Nov 2024 16:11:19 +0800 Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.11.3 helm.sh/chart=ingress-nginx-4.11.3 pod-template-hash=67fbb67c7b Annotations: cni.projectcalico.org/containerID: 6b2b57de91e25a2c7dbdac5dc865f7c3c09ae62b4b1a1269a1eb4c3070328020 cni.projectcalico.org/podIP: 100.64.1.23/32 cni.projectcalico.org/podIPs: 100.64.1.23/32 Status: Running IP: 100.64.1.23 IPs: IP: 100.64.1.23 Controlled By: ReplicaSet/ingress-nginx-controller-67fbb67c7b Containers: controller: Container ID: containerd://cd4e18fc7e76caaabc2fed13acd26af7fef665f2e01a645503c3d8661a091831 Image: registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7 Image ID: registry.k8s.io/ingress-nginx/controller@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7 Ports: 80/TCP, 443/TCP, 10254/TCP, 8443/TCP, 31080/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP SeccompProfile: RuntimeDefault Args: /nginx-ingress-controller --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller --election-id=ingress-nginx-leader --controller-class=k8s.io/ingress-nginx --ingress-class=nginx --configmap=$(POD_NAMESPACE)/ingress-nginx-controller --tcp-services-configmap=$(POD_NAMESPACE)/ingress-nginx-tcp --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key --configmap=$(POD_NAMESPACE)/ingress-nginx-controller --controller-class=k8s.io/ingress-nginx --default-ssl-certificate=ingress-nginx/gtlconlycert --enable-ssl-passthrough=true --ingress-class=nginx --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller --tcp-services-configmap=$(POD_NAMESPACE)/ingress-nginx-tcp --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key --watch-ingress-without-class=true State: Running Started: Fri, 22 Nov 2024 16:13:05 +0800 Ready: True Restart Count: 0 Requests: cpu: 100m memory: 90Mi Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5 Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAME: ingress-nginx-controller-67fbb67c7b-tpfpt (v1:metadata.name) POD_NAMESPACE: ingress-nginx (v1:metadata.namespace) LD_PRELOAD: /usr/local/lib/libmimalloc.so KUBERNETES_SERVICE_HOST: api.dylan-test.gtlcdevqa.internal.canary.k8s.ondemand.com Mounts: /usr/local/certificates/ from webhook-cert (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d2v66 (ro) Conditions: Type Status PodReadyToStartContainers True Initialized True Ready True ContainersReady True PodScheduled True Volumes: webhook-cert: Type: Secret (a volume populated by a Secret) SecretName: ingress-nginx-admission Optional: false kube-api-access-d2v66: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: <none>
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
$ kubectl -n ingress-nginx describe svc ingress-nginx-controller Name: ingress-nginx-controller Namespace: ingress-nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.11.3 helm.sh/chart=ingress-nginx-4.11.3 Annotations: loadbalancer.openstack.org/load-balancer-address: 10.47.104.129 loadbalancer.openstack.org/load-balancer-id: 54ef842a-05c0-482a-b3bf-255012af91d8 meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 100.111.24.47 IPs: 100.111.24.47 LoadBalancer Ingress: 10.47.104.129 Port: http 80/TCP TargetPort: http/TCP NodePort: http 31686/TCP Endpoints: 100.64.1.23:80 Port: https 443/TCP TargetPort: https/TCP NodePort: https 32033/TCP Endpoints: 100.64.1.23:443 Port: 31080-tcp 31080/TCP TargetPort: 31080-tcp/TCP NodePort: 31080-tcp 31568/TCP Endpoints: 100.64.1.23:31080 Session Affinity: None External Traffic Policy: Cluster Events: <none>
-
Current state of ingress object, if applicable:
kubectl -n <appnamespace> get all,ing -o wide
$ kubectl -n web-service get ingress -owide NAME CLASS HOSTS ADDRESS PORTS AGE web-service-gin-ingress <none> my-host 10.47.104.129 80 8d
kubectl -n <appnamespace> describe ing <ingressname>
$ kubectl describe ingress web-service-gin-ingress -n web-service Name: web-service-gin-ingress Labels: <none> Namespace: web-service Address: 10.47.104.129 Ingress Class: <none> Default backend: <default> Rules: Host Path Backends ---- ---- -------- my-host / web-service-gin-service:8080 (100.64.1.4:8080,100.64.1.5:8080,100.64.1.6:8080) Annotations: nginx.ingress.kubernetes.io/configuration-snippet: more_set_headers "X-Ingress-Pod-Name: $HOSTNAME"; Events: <none>
- If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag
$ GUID=1
$ DATETIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
$ curl -X POST "http://my-host/v1/tests/post" -H "Content-Type: application/json" -d "{
\"id\": \"$GUID\",
\"create_time\": \"$DATETIME\",
\"sleep_time_ms\": 10
}"
{"id":"1","ingress_pod_name_form":"ingress-nginx-controller-67fbb67c7b-tpfpt","create_time":"2024-11-22T09:58:05Z","receive_time":"2024-11-22T09:58:52.425334756Z","finish_time":"2024-11-22T09:58:52.435498011Z","consume_sec":0.010163236}
$ curl -vX POST "http://my-host/v1/tests/post" -H "Content-Type: application/json" -d "{
\"id\": \"$GUID\",
\"create_time\": \"$DATETIME\",
\"sleep_time_ms\": 10
}"
Note: Unnecessary use of -X or --request, POST is already inferred.
* Host my-host:80 was resolved.
* IPv6: (none)
* IPv4: 10.47.104.129
* Trying 10.47.104.129:80...
* Connected to dylan-test.gtlc.only.sap (10.47.104.129) port 80
* using HTTP/1.x
> POST /v1/tests/post HTTP/1.1
> Host: dylan-test.gtlc.only.sap
> User-Agent: curl/8.10.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 87
>
* upload completely sent off: 87 bytes
< HTTP/1.1 200 OK
< Date: Fri, 22 Nov 2024 10:00:18 GMT
< Content-Type: application/json; charset=utf-8
< Content-Length: 236
< Connection: keep-alive
< Access-Control-Allow-Credentials: true
< Access-Control-Allow-Headers: Content-Type, Content-Length, Accept-Encoding, X-CSRF-Token, Authorization, accept, origin, Cache-Control, X-Requested-With
< Access-Control-Allow-Methods: POST, OPTIONS, GET, PUT, DELETE
< Access-Control-Allow-Origin: *
< X-Ingress-Pod-Name-From: ingress-nginx-controller-67fbb67c7b-tpfpt
< X-Ingress-Pod-Name: ingress-nginx-controller-67fbb67c7b-tpfpt
<
{"id":"1","ingress_pod_name_form":"ingress-nginx-controller-67fbb67c7b-tpfpt","create_time":"2024-11-22T09:58:05Z","receive_time":"2024-11-22T10:00:18.300485598Z","finish_time":"2024-11-22T10:00:18.310760665Z","consume_sec":0.010275065}* Connection #0 to host my-host left intact
- Others:
- Any other related information like ;
- copy/paste of the snippet (if applicable)
kubectl describe ...
of any custom configmap(s) created and in use- Any other related information that may help
- Any other related information like ;
How to reproduce this issue:
To reproduce it, you just need one web-service (any pod can receive HTTP request is ok). Then you can use this K6 script:
import http from 'k6/http';
import { uuidv4 } from 'https://jslib.k6.io/k6-utils/1.4.0/index.js';
export const options = {
vus: 1024,
duration: '120s',
};
function getFormattedDateTimeNow() {
const now = new Date();
const isoString = now.toISOString();
return isoString;
}
function formattedResponseOutput(res) {
const status = res.status;
const statusText = res.status_text;
const to = res.headers['X-Ingress-Pod-Name'];
const from = res.headers['X-Ingress-Pod-Name-From'];
if (res.status != 200) {
console.log(`[${from}] --> [${to}] : { Status: ${status}, Status Text: ${statusText} }`);
} else {
console.log(`[${from}] --> [${to}] : { Status: ${status}, ResponseBody: ${res.body} }`);
}
}
export default function () {
const url = 'http://my-host/v1/tests/post';
const sleep_upper_limit_ms = 5000
const playload = JSON.stringify({
"id": uuidv4(),
"create_time": getFormattedDateTimeNow(),
"sleep_time_ms": Math.floor(Math.random() * (sleep_upper_limit_ms + 1)),
})
const params = {
headers: {
'Content-Type': 'application/json',
},
};
const res = http.post(url, playload, params);
formattedResponseOutput(res);
}
Anything else we need to know:
You can use my test image implemented by Go: image: doublebiao/web-service-gin:v1.0-beta
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
No status