-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
draft: add `inject-per-series-metadata' #2632
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jacobstr The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This issue is currently awaiting triage. If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@@ -40,7 +40,12 @@ var ( | |||
podStatusReasons = []string{"Evicted", "NodeAffinity", "NodeLost", "Shutdown", "UnexpectedAdmissionError"} | |||
) | |||
|
|||
func podMetricFamilies(allowAnnotationsList, allowLabelsList []string) []generator.FamilyGenerator { | |||
func podMetricFamilies(injectPerSeriesMetadata bool, allowAnnotationsList []string, allowLabelsList []string) []generator.FamilyGenerator { | |||
mc := &MetricConfig{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this so instead of adding 3 arguments to each generator, I add one. I suppose it makes it less "pure" as the behavior of the function is dictated by a complex configuration object.
LabelKeys: []string{"phase"}, | ||
LabelValues: []string{p.n}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were shadowing the outer p (Pod)
.
@@ -38,6 +39,12 @@ var ( | |||
conditionStatuses = []v1.ConditionStatus{v1.ConditionTrue, v1.ConditionFalse, v1.ConditionUnknown} | |||
) | |||
|
|||
type MetricConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be made private since it's built / used internally to this module.
@@ -175,6 +182,17 @@ func isPrefixedNativeResource(name v1.ResourceName) bool { | |||
return strings.Contains(string(name), v1.ResourceDefaultNamespacePrefix) | |||
} | |||
|
|||
// convenience wrapper to inject allow-listed labels and annotations to a metric if per-series injection is enabled. | |||
func injectLabelsAndAnnos(m *metric.Metric, metricConfig *MetricConfig, obj *metav1.ObjectMeta) *metric.Metric { | |||
if !metricConfig.InjectPerSeriesMetadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with the pass-by-reference + early guard clause, this should be "0-cost" for those leaving the --inject-per-series-metadata
at it's default of false
.
42b2197
to
9538070
Compare
@@ -82,7 +87,7 @@ func podMetricFamilies(allowAnnotationsList, allowLabelsList []string) []generat | |||
createPodSpecVolumesPersistentVolumeClaimsInfoFamilyGenerator(), | |||
createPodSpecVolumesPersistentVolumeClaimsReadonlyFamilyGenerator(), | |||
createPodStartTimeFamilyGenerator(), | |||
createPodStatusPhaseFamilyGenerator(), | |||
createPodStatusPhaseFamilyGenerator(mc), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full implementation would add this param to every metric family generator. This is also the reason I'm bundling up the configuration in a struct and passing it as a single unit.
See the discussion here kubernetes#2551 This frees users from having to join on the various label series, which is useful when infrastructure has agents that can't do recording rules.
9538070
to
a4c4905
Compare
Thanks for your contribution! A solution like this has come up multiple times, for more information, please take a look at: |
@mrueg if you can humor this for a moment, the intent here is to reduce the cardinality of metrics when they are forwarded and stored by giving cluster administrators an additional lever by which they can aggregate. Yes, this can be done with recording rules, but those are only applied (generally) at backends with a TSDB. I've got one technology, albeit a closed-source, cloud offering from Grafana (adaptive metrics) where having pod labels would let me reduce 10k time series for a given pod metric down to something much more manageable by classifying workloads and aggregating them by a custom classification label. The circumstances in the wild can get somewhat complex and perhaps arbitrary, so I don't want to over-index on what I'm doing necessarily, other than to suggest that allowing for labels directly removes a constraint on having to do aggregation against the I wanted to talk about cardinality a bit. There's an assertion that this will increase cardinality that I want to argue. Here's what I get when I query the kube-state-metrics endpoint directly:
The metric already includes the pod name. Adding a custom label mapping such as ‡ I can contrive a scenario where e.g. the pod updates its labels while it's running. I think that's quite atypical, and in those cases, this would indeed be a questionable fit. However, I still think it's manageable in a multitude of ways - such as not enabling the capability on a subset of labels that have been made dynamic.
Of course, you can't simply drop the pod ID during relabelling as you'll end up with multiple ambiguous time series. So I want some way to segment the metrics so that when I aggregate them I can do slightly better than aggregating all pods by namespace. Additional memory usage also came up. I think this can be largely implemented in a zero-cost-ish manner unless enabled. Suppose a single label is mapped, I think we're talking an increase of 5-10% per label depending on the key/value lengths though the prom client libraries themselves might be clever here and use shared string references rather than allocating 10k copies of "footeam." Again we're hard pressed to do worse than the |
See the discussion here #2551 This frees users from having to join on the various label series, which is useful when infrastructure has agents that can't do recording rules.
Note: this is a draft proposal because repeating this work to all other metric families will get quite tedious. I wanted to walk this implementation out a bit to propose some neatly encapsulated way to inject labels and annotations with somewhat of a wrapper func - it might not perform as well, but adding all of the slice-wrangling + kube label and annotation filtering to the business logic of each metric would be a bit messy.
What this PR does / why we need it: See #2551 With systems like Grafana Cloud it's useful to filter / aggregate certain series by segmenting metrics according to a custom label (such as app, environment, etc). Having to first label_join the existing
kube_<resource>_label
gauges requires some upstream prometheus with a TSDB and precludes using things like prometheus in agent mode or alloy.By propagating the labels to individual time series we could do more effective filtering / aggregation at smart backends while having "dumb" scrapers and forwarders upstream.
One of the things this would let us do is aggregate thousands of metrics for similar pods running processing heavy / task heavy workloads. Many of these are categorized by pod labels. At scale, I don't need the pod dimension, but I do want to keep data for related workloads segmented.
How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality) Increases, but the behavior is opt-in.
Which issue(s) this PR fixes #2551