-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Add Helm-managed webhook with kube-webhook-certgen #8870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: omerap12 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/cc @adrianmoisey |
iamzili
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Omer, I checked out your branch and I'm having trouble getting the admission controller to work. I would expect the following command to deploy everything properly:
helm upgrade vpa \
/home/zili/Repos/autoscaler/vertical-pod-autoscaler/charts/vertical-pod-autoscaler \
--install \
--version 0.6.0 \
--namespace vpa
What I have found so far is that the following settings seem to be incorrect:
- The
--webhook-servicevalue in the admission controller's Deployment. - The service name in the
MutatingWebhookConfigurationobject.
I'm also seeing an error in the admission controller when it attempts to perform actuation:
2025/12/04 13:37:32 http: TLS handshake error from 10.244.0.1:25304: remote error: tls: bad certificate
| metadata: | ||
| name: {{ include "vertical-pod-autoscaler.admissionController.certGen.fullname" . }} | ||
| annotations: | ||
| "helm.sh/hook": pre-install,pre-upgrade,post-install,post-upgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a specific reason why this resource (and I see the same pattern in several others, such as ClusterRoleBinding, Role, RoleBinding) defines both pre and post hooks? I think pre-install,pre-upgrade would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, you’re right - pre-install and pre-upgrade are enough, but it really depends on how we want to handle things going forward. As mentioned above, the hook currently creates a Secret containing all certificates, CA bundles, and the mutating webhook configuration. Because this Secret already exists during an upgrade, kube-webhook-certgen will not rotate the certificate values.
To address this, we may want to add post-install and post-upgrade hooks to delete the Secret, ensuring that on the next upgrade kube-webhook-certgen generates a fresh one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But using post-install and post-upgrade hooks to delete the Secret is still not a good idea, I assume because it would mean that after a Helm upgrade or install, there would be no Secret object in the cluster as Helm executes post-install and post-upgrade hooks after all non-hook resources have been deployed to the cluster.
If we want to delete the certificate before running a Helm upgrade or install, then we need to do it in pre-install and pre-upgrade hooks, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more thing: kube-webhook-certgen generates certificates that expire after 100 years, so I assume we don't need to rotate them. We could just ignore the "secret already exists" log when kube-webhook-certgen create runs.
|
Also I'm not sure if we need to bump the chart version every time at this stage of the chart's development here:
|
Thanks for review! I'll check that out |
Yeah, we already discussed it. if I remember correctly we have to do it because of the pre-commit. |
I can't remember, your memory may be correct, since cluster-autoscaler does it. However, I think that may be broken now? |
...al-pod-autoscaler/charts/vertical-pod-autoscaler/templates/admission-controller-webhook.yaml
Show resolved
Hide resolved
|
Let me add my thoughts regarding
|
I agree that we shouldn’t build our own mechanism for generating and renewing self-signed certificates. |
| # Generate certificates using cert-gen job | ||
| generateCertificate: true | ||
|
|
||
| certGen: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be disabled?
I want to create my cert with cert-manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to use generateCertificate: false then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry. Missed that. I've expected certGen.enabled=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it might be a good idea to move it below certGen, like certGen.enabled=true. I see this pattern around in Helm charts frequently.
what do you think @omerap12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed - I need to spend more time on this PR. Hopefully I’ll be able to over the weekend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@monotek , adjusted.
91f285e to
b69f511
Compare
iamzili
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect at least a message to be printed after helm install / helm upgrade when registerWebhook: true and admissionController.certGen.enabled: false are set, since the admission controller will not start.
I think pkg/admission-controller/gencerts.sh should be executed by the user. What else needs to be done?
| image: | ||
| # admissionController.certGen.image.repository -- An image that contains certgen for creating certificates. Only used if admissionController.generateCertificate is true | ||
| repository: registry.k8s.io/ingress-nginx/kube-webhook-certgen | ||
| # admissionController.certGen.image.tag -- An image tag for the admissionController.certGen.image.repository image. Only used if admissionController.generateCertificate is true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the admissionController.generateCertificate key no longer exists, so it should be removed from multiple files, including comments and README.md.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I forgot to update this as well, nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| @@ -0,0 +1,49 @@ | |||
| {{- if and .Values.admissionController.enabled (not .Values.admissionController.registerWebhook) (include "vertical-pod-autoscaler.admissionController.webhook.upgradable" .) }} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help me understand why the (include "vertical-pod-autoscaler.admissionController.webhook.upgradable" .) expression and the related logic in the _helpers.tpl file are required.
Based on my reading of _helpers.tpl, the intended goal is to upgrade only an object with a specific name and labels to address the following scenario:
- User creates a MutatingWebhookConfiguration object manually (or it was created by the admission-controller, i.e.
--register-webhook=true) - User deploys the helm chart, and the manually created MutatingWebhookConfiguration object is not going to be updated by the chart and the kube-webhook-certgen job
is my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically yes, but I removed all that part. this is un-needed..
The recommended way will be jut to let helm manage this, so if a user already has MutatingWebhookConfiguration object - just delete it and install the chart and let Helm do the work.
Currently in this PR with helm upgrade we don't create a new MutatingWebhookConfiguration object (and all tls stuff). I am not sure we want to do this, and if so let's tackle this in a different PR.
b69f511 to
cc7ff5b
Compare
b073e15 to
86714ee
Compare
86714ee to
df08f7e
Compare
my comment was meant to warn that when
So technically, we need to solve two issues. First, when we deploy with |
Thanks for checking.
|
0b691d5 to
672f1f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my comments, reflecting your latest updates:
What do you think, @omerap12, can you drop the genCA and genSignedCert Helm template functions that --set admissionController.tls.create=true triggers? This mechanism adds complexity and provides limited value, especially since the certificates regenerate on every helm upgrade.
let me write down what I think would be appropriate to support in this PR. btw some of these are already implemented or partially supported.
- [Helm message is missing] Support https://cert-manager.io/docs/. In other words, allow users to create a CA and an X.509 certificate signed by that CA, and store the certs in a Kubernetes Secret using cert-manager. This approach appears to be already supported, although I have not tested it locally. When a user choose this approach, should use:
--set admissionController.registerWebhook=true \
--set admissionController.certGen.enabled=false
and when using these flags, Helm should print a message based on the NOTES.txt file stating that it is the user's responsibility to create the Kubernetes Secret object, either via cert-manager or by other means. The printed message may also include the expected format of the Secret object, which is (the format of the data stanza is crucial, as the deployment expects these specific keys):
apiVersion: v1
kind: Secret
metadata:
name: something
type: Opaque
data:
ca: "..."
cert: "..."
key: "..."we may also notify the user that if the Secret is created after the Helm install/upgrade, the admission controller pod must be restarted to move it from the ContainerCreating state to Running.
- this approach should create the Kubernetes Secret object via the Helm chart, and should not call ingress-nginx/kube-webhook-certgen:
--set admissionController.registerWebhook=true \
--set admissionController.certGen.enabled=false \
--set-file admissionController.tls.caCert=ca.txt \
--set-file admissionController.tls.cert=cert.txt \
--set-file admissionController.tls.key=key.txt
This approach is useful when the user creates the TLS related components using a mechanism other than cert-manager, such as a shell script, and simply wants to pass the certificates to helm
- [DONE] this is the default and the recommended approach,
ingress-nginx/kube-webhook-certgenis used, i.e.:
--set admissionController.registerWebhook=false \
--set admissionController.certGen.enabled=true
So you are basically saying let's drop this: https://github.com/kubernetes/autoscaler/pull/8870/files?diff=split&w=0 |
672f1f5 to
404d09e
Compare
...al-pod-autoscaler/charts/vertical-pod-autoscaler/templates/admission-controller-webhook.yaml
Show resolved
Hide resolved
| ⚠️ WARNING: No TLS certificate source configured! | ||
| The admission controller may fail to start. Please set one of: | ||
| - admissionController.certGen.enabled: true (recommended) | ||
| - admissionController.tls.create: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - admissionController.tls.create: true |
| Mode: Helm-managed (recommended) | ||
| - Webhook registered by: Helm (MutatingWebhookConfiguration) | ||
| - Certificates managed by: certgen job | ||
| {{- else if .Values.admissionController.tls.create }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please get rid of all Values.admissionController.tls.create related part, we don't need this helm value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chatted about this on Slack. I am not sure if we wanna get rid of this.
| {{- if .Values.admissionController.registerWebhook }} | ||
| Mode: Application-managed | ||
| - Webhook registered by: admission-controller application | ||
| - Certificates managed by: admission-controller application |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Certificates managed by: admission-controller application | |
| - Be aware that, with this mode, you create the certificates by using a mechanism such as cert-manager or by creating them manually. Store the certificates in a Kubernetes Secret object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here: #8870 (comment)
| ``` | ||
| In this mode: | ||
| - The VPA admission controller creates and manages the webhook itself | ||
| - The application handles its own certificate generation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - The application handles its own certificate generation | |
| - With this mode, the end user creates the certificates using a mechanism such as cert-manager or by creating them manually, and stores the certificates in a Kubernetes Secret. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true. with this mode the admission controller creates the webhook (with all stack).
See here: https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/docs/flags.md?plain=1#L32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this locally, and what I observed is that the MutatingWebhookConfiguration is created by the admission-controller only when a Secret exists in the cluster. The Secret must be created by the user (manually or, for example via cert-manager) when the chart is deployed like this:
helm upgrade vpa \
charts/vertical-pod-autoscaler \
--install \
--version 0.8.0 \
--namespace vpa --create-namespace \
--set updater.replicas=1 \
--set updater.extraArgs\[0]\="--min-replicas=1" \
--set recommender.replicas=1 \
--set admissionController.replicas=1 \
--set admissionController.registerWebhook=true \
--set admissionController.certGen.enabled=false
I also mentioned this behavior in a comment (see point 1): #8870 (comment)
. Specifically, if the Secret is created after the helm install, the admission pod needs to be restarted to trigger the creation of the MutatingWebhookConfiguration. That is why I think it is important to warn users that, in this mode, it is their responsibility to create the Secret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right. In the current implementation ( ./hack/vpa-up.sh ) the script is also creating the secret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted in 8432460
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good, small nit: I think the "The application handles its own certificate generation" sentence can be removed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job, just found a small nit.
/lgtm
| ``` | ||
| In this mode: | ||
| - The VPA admission controller creates and manages the webhook itself | ||
| - The application handles its own certificate generation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good, small nit: I think the "The application handles its own certificate generation" sentence can be removed now.
8432460 to
3183883
Compare
Signed-off-by: Omer Aplatony <[email protected]>
3183883 to
0c69630
Compare
|
/hold |
|
/lgtm Thanks for pushing this! |
|
/unhold |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Implements Helm-managed
MutatingWebhookConfigurationwith automatic TLS certificate generation usingkube-webhook-certgen. This replaces the application's self-registration logic.Which issue(s) this PR fixes:
Related to #8587
Special notes for your reviewer:
ingress-nginxis dead, so I’m not sure about the future ofkube-webhook-certgen, which is part of the old nginx stack (registry.k8s.io/ingress-nginx/kube-webhook-certgen). Does that mean we should use it?Right now, the hook simply creates a
Secretcontaining all the certificates, CA bundles, and the mutating webhook configuration. To rotate or update those certificates, we would need just to add a Job that deletes the Secret before.This is just an initial proposal, I’d like to hear other opinions on this setup.
Also, I have noticed some RBAC problems so I fixed those.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: