-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4377: Secured Prometheus metrics endpoints #4404
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kannon92 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
/assign @gabesaba |
/unassign As I won't be able to take a look until next week |
/assign @PBundyra |
Sure, on it |
/lgtm |
LGTM label has been added. Git tree hash: 72f730f0bc8f1f6ebf62e4c0c154b538fc6770f0
|
keps/4377-metrics-tls/README.md
Outdated
|
||
### Controller Changes | ||
|
||
To keep configurations minimal, we will require that the certificates be mounted to "/tmp/k8s-metrics-server/metrics-certs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why tmp? Mounts rarely happen in that place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly took the inspiration from the path we use for internal cert rotations.
https://github.com/kubernetes-sigs/kueue/blob/main/pkg/util/cert/cert.go#L31
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but the internal certs rotations are temp and localized in principle. The external certs are expected to be "permanent", so indeed the "tmp" folder name is a bit misleading. Wondering if there are some best practices for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see anything in cert docs.
We could put in /etc ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed an update to put in /etc rather than /tmp. I think that is a better location for these certs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/etc/kueue/certs
sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/etc/metrics/certs
or
/etc/kueue/metrics/certs?
We should not combine all the certificates into the same directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/etc/metrics/certs or /etc/kueue/metrics/certs?
Probably /etc/kueue/metrics/certs
, because in /etc the files are typically namespaced by the app name. Sure, we have only single app here, so everything works, but it feels more natural. OTOH, this is not a strong opinion.
We should not combine all the certificates into the same directory.
Sure, if users want to use the same certs for webhooks and metrics they would need to create soft link between the two folders or just copy them to both places.
New changes are detected. LGTM label has been removed. |
keps/4377-metrics-tls/README.md
Outdated
// False means that we will allow access to metrics to whoever has access to the ServiceAccount | ||
// Default will be false | ||
// +optional | ||
UseTLS bool `json:"useTLS,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the API should de-couple two things. TLS could be used based on internally generated and self-signed certicicates or external certificates.
This knob seems to do two things at the same time - enable TLS at the monitor level and requires using external cert manager.
I would suggest to rename it to UseCertManager
. And use TLS regardless if the certs are internally generated or external.
EDIT: alternative names coming to my mind: UseExternalCerts
, or just ExternalCerts
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And use TLS regardless if the certs are internally generated or external.
That would be a breaking change. This would effectively require Cert Manager for metrics. If Secure option is set and controller runtime generates certificates it doesn't not provide the ability to check TLS.
Controller runtime requires the use of an external certificate solution for TLS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently store the certs generated by OPA in /tmp/k8s-webhook-server/serving-certs
.
Couldn't we use the certs for metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the KEP, there are 2 types of internal certs, 1 is generated by OPA, 2 is generated by controller-runtime.
Which internal do you indicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see about using internal certs but should we create a new secret for metrics?
I think reusing the same secret for multiple endpoints is not a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see about using internal certs but should we create a new secret for metrics?
I'm ok with either way.
I think reusing the same secret for multiple endpoints is not a good idea.
What point of view do you think it is not a good idea? Security or controller-runtime specification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on why this improves security? It is not clear to me, and from the user perspective it requires a bit more overhead, and the setup is more complex.
However, since we already use different (internal) certs for webhooks and metrics, so I'm ok to keep them separate also for the external certs, but to me the argument is more about "consistency with what we currently do".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All examples seem to require creation of a certificate for webhooks and then a certificate for metrics.
If a certificate is compromised then you would effectively give access to all APIs. If you at least have separate certificates per API then it would most likely be more secure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you at least have separate certificates per API then it would most likely be more secure.
Yeah, OTOH, the more certificates you have the higher the chance some would leak. Getting any part of your app compromised increases the risk the other parts will fall soon.
So, I believe there are pros and cons to security for both approaches. By analogy: is it safer to have a single complex master password, or remembering 100 weak passwords or using stickers for them :)
I just imagine being an admin and managing certificates per endpoint is a lot of devops work in principle, and more points of failure, and of leakage, but I see company-wide strategies may differ.
Just would like to make sure the API is flexible to do both ways depending on user preference.
keps/4377-metrics-tls/README.md
Outdated
```golang | ||
// ControllerMetrics defines the metrics configs. | ||
type ControllerMetrics struct { | ||
... | ||
// UseTLS, if true, will provide tls validation for the prometheus endpoint | ||
// False means that we will allow access to metrics to whoever has access to the ServiceAccount | ||
// Default will be false | ||
// +optional | ||
UseTLS bool `json:"useTLS,omitempty"` | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we do not need to expand any APIs.
So, shouldn't metrics server watch /tmp/k8s-webhook-server/serving-certs
directory?
In that case, we can avoid focusing on CertManger cases. They can mount arbitrary certifications in the /tmp/k8s-webhook-server/serving-certs
.
This indicates the metrics endpoints always use HTTPS regardless of certificate manager types.
cc: @mimowo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not say that we should use /tmp/k8s-webhook-server/serving-certs
. I'm ok with any directory. But we can have only one certificate directory for metrics and webhooks regardless of certificate types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mimowo Do you see any problems in your use cases when we enable HTTPS for metrics endpoints by internal-certmanager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indicates the metrics endpoints always use HTTPS regardless of certificate manager types.
Yeah, I think we only support HTTPS / TLS, so I wouldn't call it UseTLS. My reading of your work is the ability to supply external certs (as mentioned in another comment).
I'm wondering if we do not need to expand any APIs.
Basically there are 3 things which could be configured: CertDir, CertName and KeyName : https://github.com/kubernetes-sigs/kueue/blob/main/vendor/sigs.k8s.io/controller-runtime/pkg/metrics/server/server.go#L85-L102
I can see why they are configurable at the controller-runtime level, but I don't see a need to configure them in Kueue. Having clear defaults will make the docs easier if less parametrized IMO. We can introduce always the parameters when users have good use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mimowo Do you see any problems in your use cases when we enable HTTPS for metrics endpoints by internal-certmanager?
Nope, I thought we already do it at the endpoint level, looking at this test:
kueue/test/e2e/singlecluster/metrics_test.go
Line 577 in f8015cb
"curl -s -k -H \"Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\" https://%s.%s.svc.cluster.local:8443/metrics ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but we do not append certs for metrics as they are handled by controller runtime feature.
If we maintain certs for metrics by OPA, what happen? In that case, will metrics server ignores the certs as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would make the helm configuration easier actually. Right now I think I need to add special code to append volumes in case of using external certs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would make the helm configuration easier actually. Right now I think I need to add special code to append volumes in case of using external certs.
I do not think that we respect Helm codes instead of API. API should always be respected; it is used widely regardless of tools.
We are currently working on Helm Chart auto-generation instead of the current shell. I believe it could mitigate Helm generation complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outstanding item:
- Internal Cert Manage with webhooks, controller runtime metrics with https but servicemonitor does not validate tls.
- Supported or not?
@mimowo Proposal:
- Leave this one as is and keep tls no verify on for ServiceMonitor and call this out of scope for this KEP.
- We will support this existing solution as is and not require changes here.
@tenzen-y Proposal:
- Internal Cert Manager for webhooks and metrics, use tls verify for ServiceMonitor
- Drop the middle state where we skip TLS in service monitor
- Internal cert manage would generate and watch secrets for both metrics and webhooks.
We could enhance this so that we no longer rely on controller runtime to generate this certificate and we use the similar opa code to watch/rotate certificates for metrics. We would need to create a secret for that.
We could keep the code as is and assume that if someone actually cares about this, the best approach would be to use an external solution for rotating certificates. This is the recommendation for Kubebuilder that one should use an external certificate solution for production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We followed up via slack.
We will not provide an API for this but will assume that if one wants to use external certificates then they also will want to provide certificates for prometheus.
I updated the design with these findings.
/retitle KEP-4377: Secured Prometheus metrics endpoints |
If you have time, i'd appreciate your insight best practices for metrics security. |
What type of PR is this?
/kind documentation
What this PR does / why we need it:
KEP for securing metrics with TLS.
Which issue(s) this PR fixes:
KEP for #4377
Special notes for your reviewer:
Does this PR introduce a user-facing change?