Skip to content

Webhook Race Condition with TLS handshake error: tls: bad certificate #2560

Open
@njtran

Description

@njtran

Hoping to get some insight on the following issue. Happy to hop on a call or slack huddle in the knative slack to give more info.

Expected Behavior

The webhook should work and not require a non-deterministic amount of container restarts for it to work.

Actual Behavior

Using defaulting and validating webhooks for Karpenter CRDs. When first installing Karpenter, we get the following error in the webhook container logs. Even after receiving a failure, the webhook container stays ready. The issue is resolved sometimes by restarting the container a non-deterministic amount of times.

karpenter-6f84d7c89d-4246h webhook 2022/07/14 17:47:02 http: TLS handshake error from 192.168.139.165:44004: remote error: tls: bad certificate
karpenter-6f84d7c89d-4246h webhook 2022/07/14 17:47:02 http: TLS handshake error from 192.168.139.165:44006: remote error: tls: bad certificate

This webhook is further proven broken when it blocks creation of the CRD because the certificate is signed by an unknown authority.

Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "validation.webhook.karpenter.k8s.aws": Post "[https://karpenter.karpenter.svc:443/?timeout=10s](https://karpenter.karpenter.svc/?timeout=10s)": x509: certificate signed by unknown authority

Here’s where this webhook was created in code that we started to see this issue after.

Steps to Reproduce the Problem

  1. Install Karpenter with the webhook mentioned.
  2. Watch webhook logs.

Additional Info

karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.145Z	INFO	webhook.DefaultingWebhook	Starting controller and workers	{"commit": "2c98771"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.145Z	INFO	webhook.DefaultingWebhook	Started workers	{"commit": "2c98771"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.145Z	DEBUG	webhook.DefaultingWebhook	Processing from queue defaulting.webhook.karpenter.k8s.aws (depth: 1)	{"commit": "2c98771"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.145Z	DEBUG	webhook.DefaultingWebhook	Processing from queue karpenter/karpenter-cert (depth: 0)	{"commit": "2c98771"}
karpenter-6f84d7c89d-4246h webhook I0714 17:46:46.145610       1 leaderelection.go:243] attempting to acquire leader lease karpenter/webhook.validationwebhook.00-of-01...
karpenter-6f84d7c89d-4246h webhook I0714 17:46:46.146009       1 leaderelection.go:243] attempting to acquire leader lease karpenter/webhook.webhookcertificates.00-of-01...
karpenter-6f84d7c89d-4246h webhook I0714 17:46:46.146311       1 leaderelection.go:243] attempting to acquire leader lease karpenter/webhook.defaultingwebhook.00-of-01...
karpenter-6f84d7c89d-4246h webhook I0714 17:46:46.146585       1 leaderelection.go:243] attempting to acquire leader lease karpenter/webhook.validationwebhook.00-of-01...
karpenter-6f84d7c89d-4246h webhook I0714 17:46:46.146866       1 leaderelection.go:243] attempting to acquire leader lease karpenter/webhook.configmapwebhook.00-of-01...
karpenter-6f84d7c89d-4246h webhook I0714 17:46:46.147154       1 leaderelection.go:243] attempting to acquire leader lease karpenter/webhook.defaultingwebhook.00-of-01...
karpenter-6f84d7c89d-4246h webhook I0714 17:46:46.158486       1 leaderelection.go:253] successfully acquired lease karpenter/webhook.defaultingwebhook.00-of-01
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.158Z	INFO	webhook	"karpenter-6f84d7c89d-4246h_7349c3d9-0ba3-4bf1-94b3-0952c56f9be9" has started leading "webhook.defaultingwebhook.00-of-01"	{"commit": "2c98771"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.158Z	DEBUG	webhook.DefaultingWebhook	Adding to the slow queue defaulting.webhook.provisioners.karpenter.sh (depth(total/slow): 1/1)	{"commit": "2c98771", "knative.dev/key": "/defaulting.webhook.provisioners.karpenter.sh"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.158Z	DEBUG	webhook.DefaultingWebhook	Processing from queue defaulting.webhook.provisioners.karpenter.sh (depth: 0)	{"commit": "2c98771"}
karpenter-6f84d7c89d-4246h webhook I0714 17:46:46.159316       1 leaderelection.go:253] successfully acquired lease karpenter/webhook.configmapwebhook.00-of-01
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.159Z	INFO	webhook	"karpenter-6f84d7c89d-4246h_3422fdfa-830a-42e1-bfe5-323689e5641e" has started leading "webhook.configmapwebhook.00-of-01"	{"commit": "2c98771"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.159Z	DEBUG	webhook.ConfigMapWebhook	Adding to the slow queue validation.webhook.config.karpenter.sh (depth(total/slow): 1/1)	{"commit": "2c98771", "knative.dev/key": "/validation.webhook.config.karpenter.sh"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.159Z	DEBUG	webhook.ConfigMapWebhook	Processing from queue validation.webhook.config.karpenter.sh (depth: 0)	{"commit": "2c98771"}
karpenter-6f84d7c89d-4246h webhook E0714 17:46:46.159762       1 leaderelection.go:361] Failed to update lock: Operation cannot be fulfilled on leases.coordination.k8s.io "webhook.defaultingwebhook.00-of-01": the object has been modified; please apply your changes to the latest version and try again
karpenter-6f84d7c89d-4246h webhook I0714 17:46:46.159844       1 leaderelection.go:253] successfully acquired lease karpenter/webhook.validationwebhook.00-of-01
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.159Z	INFO	webhook	"karpenter-6f84d7c89d-4246h_a0c9c1f8-5016-4ebb-b738-f455dd699c48" has started leading "webhook.validationwebhook.00-of-01"	{"commit": "2c98771"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.160Z	DEBUG	webhook.ValidationWebhook	Adding to the slow queue validation.webhook.karpenter.k8s.aws (depth(total/slow): 1/1)	{"commit": "2c98771", "knative.dev/key": "/validation.webhook.karpenter.k8s.aws"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.160Z	DEBUG	webhook.ValidationWebhook	Processing from queue validation.webhook.karpenter.k8s.aws (depth: 0)	{"commit": "2c98771"}
karpenter-6f84d7c89d-4246h webhook I0714 17:46:46.160306       1 leaderelection.go:253] successfully acquired lease karpenter/webhook.webhookcertificates.00-of-01
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.160Z	INFO	webhook	"karpenter-6f84d7c89d-4246h_dafed312-0c03-4c85-9afc-9ab7842483ed" has started leading "webhook.webhookcertificates.00-of-01"	{"commit": "2c98771"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.160Z	DEBUG	webhook.WebhookCertificates	Adding to the slow queue karpenter/karpenter-cert (depth(total/slow): 1/1)	{"commit": "2c98771", "knative.dev/key": "karpenter/karpenter-cert"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.160Z	DEBUG	webhook.WebhookCertificates	Processing from queue karpenter/karpenter-cert (depth: 0)	{"commit": "2c98771"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.160Z	INFO	webhook.WebhookCertificates	Reconcile succeeded	{"commit": "2c98771", "knative.dev/traceid": "36a77d14-41da-4ef7-bcaf-585fcbf0e45b", "knative.dev/key": "karpenter/karpenter-cert", "duration": "248.666µs"}
karpenter-6f84d7c89d-4246h webhook E0714 17:46:46.160932       1 leaderelection.go:361] Failed to update lock: Operation cannot be fulfilled on leases.coordination.k8s.io "webhook.validationwebhook.00-of-01": the object has been modified; please apply your changes to the latest version and try again
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.213Z	INFO	webhook.DefaultingWebhook	Updating webhook	{"commit": "2c98771", "knative.dev/traceid": "17354ae1-5148-439f-bde6-2c0bd820c9a8", "knative.dev/key": "defaulting.webhook.provisioners.karpenter.sh"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.218Z	INFO	webhook.ConfigMapWebhook	Webhook is valid	{"commit": "2c98771", "knative.dev/traceid": "54b1e742-648e-4d9e-ba37-dae50fbdf615", "knative.dev/key": "validation.webhook.config.karpenter.sh"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.218Z	INFO	webhook.ConfigMapWebhook	Reconcile succeeded	{"commit": "2c98771", "knative.dev/traceid": "54b1e742-648e-4d9e-ba37-dae50fbdf615", "knative.dev/key": "validation.webhook.config.karpenter.sh", "duration": "58.836021ms"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.310Z	INFO	webhook.ValidationWebhook	Updating webhook	{"commit": "2c98771", "knative.dev/traceid": "41feaed3-40f0-405c-afea-e44d1dc78c3d", "knative.dev/key": "validation.webhook.karpenter.k8s.aws"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.311Z	INFO	webhook.DefaultingWebhook	Reconcile succeeded	{"commit": "2c98771", "knative.dev/traceid": "17354ae1-5148-439f-bde6-2c0bd820c9a8", "knative.dev/key": "defaulting.webhook.provisioners.karpenter.sh", "duration": "152.813399ms"}
karpenter-6f84d7c89d-4246h webhook 2022-07-14T17:46:46.319Z	INFO	webhook.ValidationWebhook	Reconcile succeeded	{"commit": "2c98771", "knative.dev/traceid": "41feaed3-40f0-405c-afea-e44d1dc78c3d", "knative.dev/key": "validation.webhook.karpenter.k8s.aws", "duration": "159.026253ms"}
karpenter-6f84d7c89d-4246h webhook 2022/07/14 17:47:02 http: TLS handshake error from 192.168.139.165:44004: remote error: tls: bad certificate
karpenter-6f84d7c89d-4246h webhook 2022/07/14 17:47:02 http: TLS handshake error from 192.168.139.165:44006: remote error: tls: bad certificate

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions