Downtime after upgrading to 1.12.0 - Open "/tmp/nginx/nginx.pid" failed 

Keeping this open while our investigation is running. We cannot explain it yet.
Will fill up with more details as soon as we have understood it deeper.
As it broke only a few environments it is harder to debug.
But it it is warning to check your log lines duriung upgrade
```
/tmp/nginx/nginx.pid
```

**What happened**:
Upgraded our ingress-controller via helm from 
```
version: 4.11.3
```
to
```
version: 4.12.0
```

Causing a major outage on 4/10 clusters. We can not understand yet why.
Kubernetes version 1.31.x

```
  |   | 2025-01-09 07:45:35.583 | nginx: [error] open() "/tmp/nginx/nginx.pid" failed (2: No such file or directory) |  
-- | -- | -- | -- | --
  |   | 2025-01-09 07:45:35.583 | 2025/01/09 07:45:35 [error] 215#215: open() "/tmp/nginx/nginx.pid" failed (2: No such file or directory) |  
  |   | 2025-01-09 07:45:35.583 | nginx: [error] open() "/tmp/nginx/nginx.pid" failed (2: No such file or directory) |  
  |   | 2025-01-09 07:45:35.583 | 2025/01/09 07:45:35 [error] 215#215: open() "/tmp/nginx/nginx.pid" failed (2: No such file or directory) |  
  |   | 2025-01-09 07:45:35.583 | nginx: [error] open() "/tmp/nginx/nginx.pid" failed (2: No such file or directory) |  
  |   | 2025-01-09 07:45:35.583 | 2025/01/09 07:45:35 [error] 215#215: open() "/tmp/nginx/nginx.pid" failed (2: No such file or directory) |  
  |   | 2025-01-09 07:45:35.000 | name=ingress-nginx-general-r6-controller-565c5966f7-8p4rq kind=Pod objectAPIversion=v1 objectRV=2931225444 eventRV=2931226571 reportingcontroller=nginx-ingress-controller sourcecomponent=nginx-ingress-controller reason=RELOAD type=Warning count=1 msg="Error reloading NGINX: exit status 1\n2025/01/09 07:45:35 [notice] 215#215: signal process started\n2025/01/09 07:45:35 [error] 215#215: open() \"/tmp/nginx/nginx.pid\" failed (2: No such file or directory)\nnginx: [error] open() \"/tmp/nginx/nginx.pid\" failed (2: No such file or directory)\n" |  
  |
```

**What you expected to happen**:
Ingress controller continues to work.


I am not sure yet. I keep it open while we investigate deeper.

**Kubernetes version** (use `kubectl version`):
 v1.31.3-eks-59bf375

**Environment**:
AWS / EKS

- **Cloud provider or hardware configuration**:
AWS
- **OS** (e.g. from /etc/os-release):
- **Kernel** (e.g. `uname -a`):
- **Install tools**:
  - `Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc. `
- **Basic cluster related info**:
  - `kubectl version`
  - `kubectl get nodes -o wide`

Other data is going to follow after we did a breakdown

**How to reproduce this issue**:
Hard to reproduce as it is currently happening on the nodes which we cannot test again.

Update 10.01 - 00:10 - Tested again a deployment of the faulty version. Ssl certs were sendings as K8s Fake certs on some domains but the old version were sending the real letsencrypt certs. Looks like a TLS issue after upgrade.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Downtime after upgrading to 1.12.0 - Open "/tmp/nginx/nginx.pid" failed #12645

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Downtime after upgrading to 1.12.0 - Open "/tmp/nginx/nginx.pid" failed #12645

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions