-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4785: Resource State Metrics #4811
base: master
Are you sure you want to change the base?
Conversation
rexagod
commented
Aug 27, 2024
•
edited
Loading
edited
- One-line PR description: This KEP proposes the incorporation of the CRDMetrics controller into the Kubernetes organization, similar to its existing counterpart for native metrics, Kube State Metrics.
- Issue link: Resource State Metrics #4785
- Other comments: Please refer to https://github.com/rexagod/resource-state-metrics for more details.
Resource State API from Kube State Metrics, and replace it by the CRDMetrics | ||
controller which, in addition to its own benefits, would allow Kube State | ||
Metrics to drop all Custom Resource State API-specific behaviors that can crash | ||
Kube State Metrics, directly affecting the availability of native metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This KEP proposes the incorporation of the CRDMetrics controller into the Kubernetes organization, similar to its existing counterpart for native metrics, Kube State Metrics. Refer: https://github.com/rexagod/crdmetrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to see this :-)
I will follow along and try to engage and help where possible for me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
I'm afraid I do have questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this @rexagod ! I added a few comments.
SGTM |
…Metrics Controller
/approve This looks good for the first alpha release. Thanks for your work on this @rexagod 🚀 |
/cc @richabanker |
/lgtm |
/cc @wojtek-t |
# of http://git.k8s.io/enhancements/OWNERS_ALIASES | ||
kep-number: 4785 | ||
alpha: | ||
approver: "@johnbelamaric" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to reassign to me, given I was cc-ed anyway.
existing tests to make this code solid enough prior to committing the changes necessary | ||
to implement this enhancement. | ||
|
||
##### Prerequisite testing updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test plan has to be filled in even for Alpha.
cluster required to make on upgrade, in order to make use of the enhancement? | ||
--> | ||
|
||
N/A. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upgrade/Downgrade is never N/A. It can be simple (no action required or sth), but it's always there.
But here, I don't think it's that obvious, in particular, how to we handle changes in the API?
This section must be completed when targeting alpha to a release. | ||
--> | ||
|
||
N/A. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
of a node? | ||
--> | ||
|
||
N/A. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't be N/A - I need to be able to disable the feature. How can I do it?
- disable kube-state-metrics?
- remove the appropriate custom resources?
- something else?
previous answers based on experience in the field. | ||
--> | ||
|
||
N/A. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
logs or events for this purpose. | ||
--> | ||
|
||
N/A. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain that this is not a workload feature - it's an additional telemetry.
This section must be completed when targeting beta to a release. | ||
--> | ||
|
||
The controller __directly__ imports the following `k8s.io` packages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a question about runtime dependencies, not the code dependencies.
- Impact of its degraded performance or high-error rates on the feature: | ||
--> | ||
|
||
The controller relies on core Kubernetes components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be more specific - it's effectively serving stack only, right (kube-apiserver + etcd + potentially webhooks etc.)
- Estimated amount of new objects: (e.g., new Object X for every existing Pod) | ||
--> | ||
|
||
The controller will **not** create or modify any object on its own. Only when a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit misleading - it actually is modifying managed resources when it's created/updated.
…85: CRDMetrics Controller
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mrueg, rexagod The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thank you for the thorough review, @wojtek-t, the out-of-tree nature of this KEP had me a bit confused. I believe this is good for another round of reviews now 🙇🏼 |
cc @wojtek-t for another review here please. |
Friendly ping, @wojtek-t for another look here please. |
Bump. cc @kubernetes/production-readiness if folks have spare cycles to help get this reviewed and merged. |
@@ -0,0 +1,1216 @@ | |||
<!-- | |||
**Note:** When your KEP is complete, all of these comment blocks should be removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ are we good to remove all comment blocks? That will make it easier to review I guess? @rexagod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe "complete" here means stable graduation, but I could be wrong.
At the moment, the `spec` houses a single `configuration` field, which defines | ||
the metric generation configuration as follows (please note that the schema is | ||
fast-moving at this point and may be subject to change based on the [feedback | ||
obtained](https://github.com/rexagod/resource-state-metrics/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be considired to be changed in future and have the configuration be a typed api?
WIth having a .spec.configuration
field which is a string, we could also just use a configmap or secret instead 🤔
Also to build tooling on top (e.g. to generate a configuration for a CRD from markers) it would be helpful to have this an typed API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of taking a config parameter reference - eg
configurationReference:
apiVersion: v1 # for alpha, has to be ConfigMap
kind: ConfigMap # for alpha, has to be ConfigMap
namespace: example # optional, but must be set for initial / alpha implementation
name: rsm-config
itemPath: >-
data.config
help: "Information about a MyPlatform instance" # The help text for the | ||
# metric family, plugged | ||
# in as-is. | ||
metrics: # Set of metrics to generate under the current metrics family. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have an example on how this could look like on arrays inside a resurce?
E.g. metrics for .status.conditions
:-)
(let me know if this is too early, as this is technical implementation question)