`Client.Timeout exceeded` (30s) on validation webhooks when updating Ingress objects

**What happened**:

We continue to hit the (max) timeout on our validation webhook when applying ingress manifests.

`failed to call webhook: Post "https://ingress-nginx-controller-admission.ingress.svc:443/networking/v1/ingresses?timeout=30s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)`

It is consistently high, in the 20s mark, while general load or ingress applies in quick succession might push it to 30s where deploy pipelines start to fail.

![image](https://github.com/kubernetes/ingress-nginx/assets/117844/e4532750-c8e3-4bd9-beb8-d2d7e82611fa)
The above image shows a graph of validation time, metric given by nginx itself, over 24 hours earlier this week.

This is me adding a label, to illustrate one simple update:

```bash
torvald@surdeig ~ $ time kubectl patch ing <ingress> --type='json' -p='[{"op": "add", "path": "/metadata/labels/testing", "value": "testing"}]'
ingress.networking.k8s.io/<ingress> patched

real	0m17.724s
user	0m0.396s
sys	0m0.057s
```

This is in a medium sized cluster, 
 - ~130 nodes 
 - 270 ingresses
 - 3 pods for nginx a 8GB RAM (request/limit) and 5 CPUs (request)
 - ~1000 rps at peak (see graph below)
 - 9.9 MB nginx config file (296k lines, 187 `server_names`, 4778 `locations`)

**Request rate**
![image](https://github.com/kubernetes/ingress-nginx/assets/117844/42c95a2c-901b-4aba-b4aa-f61453ca7b8c)
Over the same time period as above.

**Performance of pods**
![image](https://github.com/kubernetes/ingress-nginx/assets/117844/bd17d027-f104-4f76-937e-5a26e7a16251)
To comment on this, it looks and feels quite bearable. Spikes in CPUs are assumed to be nginx reloads and validations runs. Over the same time period as above.

**90 days trends:**
![image](https://github.com/kubernetes/ingress-nginx/assets/117844/bc2646dd-ca31-4366-ba30-6dcdeca10fc0)
The image above show the number of ingresses over the last 90 days.

![image](https://github.com/kubernetes/ingress-nginx/assets/117844/edfd33df-9811-41c3-b0ff-258de96877cb)
The image above shows the validation webhook duration over the last 90 days. This mostly support an organic growth of sorts, except the the quick changed marked in the picture above; this has been tracked down to 10 ingresses (serving the same host) that changed from 1 host to 3 so the collection of ~60 paths over 1 host became ~180 over 3 hosts.

[See an example of such ingress post change](https://gist.github.com/torvald/6ccee33f99e5621a3044980821460579)

**What you expected to happen**:

I've seen [people mention far better performance](https://github.com/kubernetes/ingress-nginx/issues/11115) then 20-30s on their validation webhook in other issues around here, and that with larger clusters and larger nginx config files. So my expectations would be in the 1-5s mark.

[This PR](https://github.com/kubernetes/ingress-nginx/pull/10884) will probably help us in the cases where multiple ingresses at the same time gets applied - but one or a few single applies should probably not take 20s?
 
**NGINX Ingress controller version**

<details><summary>nginx/1.21.6, release v1.9.5</summary>

```
torvald@surdeig ~ $ kubectl exec -it nginx-ingress-controller-5d66477fb7-jttwl -- /nginx-ingress-controller --version  
Defaulted container "nginx-ingress-controller" out of: nginx-ingress-controller, opentelemetry (init), sysctl (init), geoip-database-download (init)
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.9.5
  Build:         f503c4bb5fa7d857ad29e94970eb550c2bc00b7c
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6
-------------------------------------------------------------------------------
``` 

</details>

**Kubernetes version** (use `kubectl version`):

```bash
torvald@surdeig ~ $ kubectl version --short
Client Version: v1.25.0
Kustomize Version: v4.5.7
Server Version: v1.27.10-gke.1055000
```

**Environment**:

- **Cloud provider or hardware configuration**: GCP, managed GKE; e2-custom-16-32768 
- **OS** (e.g. from /etc/os-release): Container-Optimized OS with containerd (cos_containerd) 
- **Kernel** (e.g. `uname -a`): 5.15.133+ 
- **How was the ingress-nginx-controller installed**: It probably originated from a helm chart once, but everything has evolved in our own git repo since then. I'll attach the relevant files. 
  - [cat nginx-ingress-deployment-controller.yaml](https://gist.github.com/torvald/dc713491b9e4c74011364bbffb4a6884)
  - [cat configmaps.yaml](https://gist.github.com/torvald/05debf1afa36e78da108c5c99df28fe2)
  - [cat validatingwebhookconfiguration.yaml](https://gist.github.com/torvald/da6f5adee31a103b1144eac5886d17cc)
  - <details><summary>kubectl get -n ingress all -o wide</summary>

``` 
NAME                                                           READY   STATUS    RESTARTS   AGE     IP             NODE                                              NOMINATED NODE   READINESS GATES
pod/nginx-ingress-controller-5d66477fb7-8qtfs                  1/1     Running   0          20h     10.4.117.224   gke-k8s-prod-k8s-prod-standard-v8-611652ec-5lt5   <none>           <none>
pod/nginx-ingress-controller-5d66477fb7-jttwl                  1/1     Running   0          3h9m    10.4.71.52     gke-k8s-prod-k8s-prod-standard-v8-68b36906-wgdl   <none>           <none>
pod/nginx-ingress-controller-5d66477fb7-wlw6t                  1/1     Running   0          20h     10.4.1.143     gke-k8s-prod-k8s-prod-standard-v8-7c2e0d29-s7vz   <none>           <none>

NAME                                                               TYPE           CLUSTER-IP    EXTERNAL-IP           PORT(S)                      AGE      SELECTOR
service/ingress-nginx-controller-admission                         ClusterIP      10.6.12.178   <none>                443/TCP                      2y198d   app=nginx-ingress,component=controller
service/ingress-nginx-controller-collector-metrics                 ClusterIP      10.6.4.251    <none>                8888/TCP                     574d     app=nginx-ingress,component=controller
service/ingress-nginx-controller-metrics                           ClusterIP      10.6.6.244    <none>                10254/TCP                    2y198d   app=nginx-ingress,component=controller
service/nginx-ingress-controller                                   LoadBalancer   10.6.8.95     <redacted>          80:31151/TCP,443:30321/TCP   2y198d   app=nginx-ingress,component=controller

NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE      CONTAINERS                      IMAGES                                                                                      SELECTOR
deployment.apps/nginx-ingress-controller        3/3     3            3           2y198d   nginx-ingress-controller        registry.k8s.io/ingress-nginx/controller:v1.9.5                                             app=nginx-ingress,component=controller

NAME                                                       DESIRED   CURRENT   READY   AGE      CONTAINERS                        IMAGES                                                                                                                         SELECTOR
replicaset.apps/nginx-ingress-controller-5d66477fb7        3         3         3       20h      nginx-ingress-controller          registry.k8s.io/ingress-nginx/controller:v1.9.5                                                                                app=nginx-ingress,component=controller,pod-template-hash=5d66477fb7
```

</details>

- **Current state of ingress object, if applicable**:

[See an example of an ingress](https://gist.github.com/torvald/6ccee33f99e5621a3044980821460579), the same as mentioned above in the «**What happened**» section.

**How to reproduce this issue**:

I think this would be unfeasible, but I'm happy to assist with more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`Client.Timeout exceeded` (30s) on validation webhooks when updating Ingress objects #11255

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Client.Timeout exceeded (30s) on validation webhooks when updating Ingress objects #11255

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`Client.Timeout exceeded` (30s) on validation webhooks when updating Ingress objects #11255