Skip to content

Conversation

jkremser
Copy link
Contributor

@jkremser jkremser commented Aug 22, 2025

Certs for triggers that use external scaler pattern can be provided as file paths inside the ScaledObject's definition. These should be mounted and accessible by KEDA. It uses the prepared infrastructure that reloads the certs if there is a change on FS (fsnotify).

Ideally, I'd rename the existing fields from caCert, tlsClientCert and tlsClientKey to caCertPem, tlsClientKeyPem and tlsClientKeyPem, so it's the same nomenclature as in here, but this would break the backward compatibility. So there are 3 new fields called:

  • caCertFile
  • tlsClientCertFile
  • tlsClientKeyFile

Also if both sets of params are specified, the in-memory ones (caCert..) win.

todo:

  • create issue
  • tests
  • remove this todo ;)

Copy link

Thank you for your contribution! 🙏

Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected.

While you are waiting, make sure to:

  • Add an entry in our changelog in alphabetical order and link related issue
  • Update the documentation, if needed
  • Add unit & e2e tests for your changes
  • GitHub checks are passing
  • Is the DCO check failing? Here is how you can fix DCO issues

Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient.

Learn more about our contribution guide.

@keda-automation keda-automation requested review from a team August 22, 2025 15:49
@JorTurFer
Copy link
Member

I'm not sure if we should read things from file system as part of scaler code. This makes that if some needs to add more certs, the operator has to be restarted. I'd prefer some way where we pull the secret from the k8s api directly and using them without passing via file system.
For example, kafka scaler does it. In general, I prefer to not use the file-system for scaler related things. Also, if the goal of this is to automatically reload the certificates when needed, the current retry system will reload them as when the cert isn't valid, the upstream will return an error triggering a refresh that will pull the new values from api server

@jkremser
Copy link
Contributor Author

jkremser commented Aug 25, 2025

if the goal of this is to automatically reload the certificates when needed

yep, that's the goal. Allowing cert manager to update the secrets, which triggers their remounting.

This makes that if some needs to add more certs, the operator has to be restarted.

👍 this is a valid point. Adding a new secret will require the pod restart, however, it's still possible to use the old way and provide the certs as inline strings. So this pod restart will be needed only if users care about the cert rotation, which they should. I mean the cert rotation itself doesn't trigger the operator's pod restart, but adding new or deleting old that would change the podSpec will have this effect.. still benefits > drawbacks given the fact that this is an optional feature

kafka scaler does it (pulls secrets from k8s api)

can you point me to the code? I was able to find this & this. In both cases, the cert, key and ca is read from the SO from trigger metadata -> can't be rotated + putting this kind of info to SOs is imho a bad practice

I'm not sure if we should read things from file system as part of scaler code

cough, cough also kafka scaler :) I am keda noob, but isn't this code evaluated by KEDA operator? KEDA operator code already does this

I don't know, polling a secret or having a watcher/informer for a k8s secret just in case it gets updated seems to be less secure to me than mounting these to a pod and reusing the infra that you've introduced. (I am trying to avoid get/list/watch rbac for k8s secrets if possible)

If there isn't a demand for such feature, I can close the PR, no hard feelings :) I was just setting up a demo w/ OTel pipelines where all the certs are rotated and this piece of puzzle was missing so once the cert for my external scaler was rotated. KEDA stopped talking to it. So I opened this draft PR.

@jkremser jkremser force-pushed the tls-scaler-comm-certs-as-files branch from 981aa3a to 3786f4b Compare August 25, 2025 12:03
…external scalers as file paths. These should be mounted and accessible by KEDA. It uses the prepared infrastructure that reloads the certs if there is a change on FS (fsnotify)

Signed-off-by: Jirka Kremser <[email protected]>
@jkremser jkremser force-pushed the tls-scaler-comm-certs-as-files branch from 3786f4b to 9a6ca13 Compare August 25, 2025 13:04
@zroubalik
Copy link
Member

zroubalik commented Aug 26, 2025

/run-e2e external
Update: You can check the progress here

@JorTurFer
Copy link
Member

JorTurFer commented Aug 29, 2025

can you point me to the code? I was able to find this & this. In both cases, the cert, key and ca is read from the SO from trigger metadata -> can't be rotated + putting this kind of info to SOs is imho a bad practice

That metadata struct isn't the metadata section from SO but the parsed information form different sources (in this case, only safe sources):

// TLS
TLS string `keda:"name=tls, order=triggerMetadata;authParams, enum=enable;disable, default=disable"`
Cert string `keda:"name=cert, order=authParams, optional"`
Key string `keda:"name=key, order=authParams, optional"`
KeyPassword string `keda:"name=keyPassword, order=authParams, optional"`
CA string `keda:"name=ca, order=authParams, optional"`

cough, cough also kafka scaler :) I am keda noob, but isn't this code evaluated by KEDA operator? KEDA operator code already does this

you are right, but this is because kerberos SDK only supports reading values from filesystem, so the scaler code saves the info from the k8s api into a local temporal file to provide it to kerberos pkg. Scaler docs also explains that this must be enabled because filesytem is readonly -> https://keda.sh/docs/2.17/scalers/apache-kafka/#your-kafka-cluster-turns-on-saslgssapi-auth-without-tls

I don't know, polling a secret or having a watcher/informer for a k8s secret just in case it gets updated seems to be less secure to me than mounting these to a pod and reusing the infra that you've introduced. (I am trying to avoid get/list/watch rbac for k8s secrets if possible)

My concert using filesystem is that it works quite well when there are a few certificates or teams, but I'm afraid about large environments where restarting KEDA every time when a new cert has to be added can be disruptive in terms of service level and the bast majority of the scaler will be stopped just because of this.
Considering how we work, I'd prefer to pull the certs into a temporal file and use it instead of expecting new files mounted externally.

If there isn't a demand for such feature, I can close the PR, no hard feelings :) I was just setting up a demo w/ OTel pipelines where all the certs are rotated and this piece of puzzle was missing so once the cert for my external scaler was rotated. KEDA stopped talking to it. So I opened this draft PR.

When your cert was rotated, was the k8s secret that KEDA uses rotated too? if yes, KEDA should have failed pulling metrics and during the scaler regeneration the new cert should have been pulled and used.

Just to be clear, I'm not trying to block this as if this happened to you, it can happen again to others and we thanks any improvement (and fixing this problem is nice!), but I'm afraid about side effects of introducing KEDA modifications as part of operation teams daily basics.

Maybe other @kedacore/keda-core-contributors @kedacore/keda-core-maintainers has any thought that they want to share 😄

@antoncohen
Copy link

I commented in the original PR that added Temporal support to KEDA. I think being able to handle certificates on the filesystem that automatically rotate is pretty important for a client using mTLS.

The Problem:

You can't revoke mTLS certs (CRLs aren't usually used). If you use a centrally trusted CA for mTLS, which is probably the case for people picking mTLS, and you have the server trust certs signed by the CA, it will trust all non-expired certs from that CA. Once a cert is signed by a CA, and trusted, it can't be revoked like an API key. You'd have to stop trusting the CA.

As a result of that, if you are using a central CA, the only reasonable thing to do is have short lived certs that rotate frequently.

There are different mechanisms to do that. Like you could have automation update a Kubernetes Secret as the cert is rotated. But how do you expose that to the application? The application could use the Kubernetes API to read the Secret, but that would be a pretty strange thing for general applications to do (make sense for KEDA, which is all about Kubernetes). So instead you mount the Secret as a directory. You can't expose the Secret as env vars, because those don't get live updated, but mounted Secrets do get live updated.

We actually don't use a Secret, and instead have a sidecar container that writes updated certs/keys to a volume that is shared with the main application containers. Other mechanisms like csi-driver-spiffe also live update certs/keys on the filesystem.

The application then needs to start using the newly rotated certs/keys without restarting, next time it needs to connect (it doesn't need to interrupt existing connections, only pick up the new certs/keys for the next connection). That functionality is built into Go using GetClientCertificate, and Temporal has documented how to do it.

if some needs to add more certs, the operator has to be restarted.

This is probably me being naive, because I don't know KEDA's internal architecture. But why would the operator have to restart?

If the operator can read values from a Secret live without restarting, and those values contain API keys or certs/keys, why can't the values contain a paths to where the certs/keys are on the filesystem? Then the automation that people already have for live updating certs/keys on the filesystem would be responsible for making sure they exist and are rotated.

@JorTurFer
Copy link
Member

This is probably me being naive, because I don't know KEDA's internal architecture. But why would the operator have to restart?

IDK exactly how that csi-driver work, but let's say that you need to add another CA because you have a large cluster supporting multiple teams, do you need to change pod spec for it? If yes (like it happens with configMap or secret), the pod has to be restarted

@antoncohen
Copy link

IDK exactly how that csi-driver work, but let's say that you need to add another CA because you have a large cluster supporting multiple teams, do you need to change pod spec for it?

There is nothing CA specific in the pod spec when using csi-driver-spiffe or what we use internally. The pod spec basically says "give me a cert + key and mount them here", and an external service takes care of what CA to use.

@JorTurFer
Copy link
Member

Do I need to restart the pod if I need to request a new certificate with other SAN or signed by other CA?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants