-
Notifications
You must be signed in to change notification settings - Fork 37
Description
So the step-issuer workloads gets OOMKilled when the CertificateRequest object count in X Kubernetes cluster reaches 4334. We experienced over the just passed weekend.
We're on:
- step-issuer v0.6.0
- Kubernetes K3s v1.24.6+k3s1
- cert-manager v1.9.1
When we troubleshooted the issue we restarted the Pod by simply deleting it. Then we followed its startup flow and saw that as it comes up healthy it starts parsing through all CertificateRequest objects on the cluster. This logically uses memory. Apparently so much memory that the step-issuer workload is OOMKilled.
We managed to WORK AROUND it by bumping the resources that the step-issuer can use. From the default value on the Memory limits of 128Mi ( https://github.com/smallstep/helm-charts/blob/master/step-issuer/values.yaml#L34 ) to 500Mi.
This allowed the step-issuer workload to parse all the CertificateRequests and stay healthy.
A more permanent and better solution will be to use the Cert-manager cert-manager.io/revision-history-limit: "5" Ingress annotation. As this will seriously limit the amount of CertificateRequest objects on the cluster.
With that somewhat long intro here's my hot take.
- Why in the first place is the
step-issuerparsing all theCertificateRequestson the cluster?
2. Why not limit it to only parse theCertificateRequestcreated by the event that triggered an issuance of aCertificateor a renewal of aCertificate?
What's the reasoning? Or am I misunderstanding how things works under the hood?
Looking forward to some input and replies on this issue.
🙏🏿 you and have ☀️ day.