Skip to content

step-issuer being OOMKilled when there's relatively many CertificateRequests #60

@LarsBingBong

Description

@LarsBingBong

So the step-issuer workloads gets OOMKilled when the CertificateRequest object count in X Kubernetes cluster reaches 4334. We experienced over the just passed weekend.

We're on:

  • step-issuer v0.6.0
  • Kubernetes K3s v1.24.6+k3s1
  • cert-manager v1.9.1

When we troubleshooted the issue we restarted the Pod by simply deleting it. Then we followed its startup flow and saw that as it comes up healthy it starts parsing through all CertificateRequest objects on the cluster. This logically uses memory. Apparently so much memory that the step-issuer workload is OOMKilled.


We managed to WORK AROUND it by bumping the resources that the step-issuer can use. From the default value on the Memory limits of 128Mi ( https://github.com/smallstep/helm-charts/blob/master/step-issuer/values.yaml#L34 ) to 500Mi.

This allowed the step-issuer workload to parse all the CertificateRequests and stay healthy.

A more permanent and better solution will be to use the Cert-manager cert-manager.io/revision-history-limit: "5" Ingress annotation. As this will seriously limit the amount of CertificateRequest objects on the cluster.

With that somewhat long intro here's my hot take.

  1. Why in the first place is the step-issuer parsing all the CertificateRequests on the cluster?
    2. Why not limit it to only parse the CertificateRequest created by the event that triggered an issuance of a Certificate or a renewal of a Certificate?

What's the reasoning? Or am I misunderstanding how things works under the hood?


Looking forward to some input and replies on this issue.

🙏🏿 you and have ☀️ day.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions