draft: add `inject-per-series-metadata' #2632

jacobstr · 2025-03-20T05:41:54Z

See the discussion here #2551 This frees users from having to join on the various label series, which is useful when infrastructure has agents that can't do recording rules.

Note: this is a draft proposal because repeating this work to all other metric families will get quite tedious. I wanted to walk this implementation out a bit to propose some neatly encapsulated way to inject labels and annotations with somewhat of a wrapper func - it might not perform as well, but adding all of the slice-wrangling + kube label and annotation filtering to the business logic of each metric would be a bit messy.

What this PR does / why we need it: See #2551 With systems like Grafana Cloud it's useful to filter / aggregate certain series by segmenting metrics according to a custom label (such as app, environment, etc). Having to first label_join the existing kube_<resource>_label gauges requires some upstream prometheus with a TSDB and precludes using things like prometheus in agent mode or alloy.

By propagating the labels to individual time series we could do more effective filtering / aggregation at smart backends while having "dumb" scrapers and forwarders upstream.

One of the things this would let us do is aggregate thousands of metrics for similar pods running processing heavy / task heavy workloads. Many of these are categorized by pod labels. At scale, I don't need the pod dimension, but I do want to keep data for related workloads segmented.

How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality) Increases, but the behavior is opt-in.

Which issue(s) this PR fixes #2551

k8s-ci-robot · 2025-03-20T05:42:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jacobstr
Once this PR has been reviewed and has the lgtm label, please assign dgrisonnet for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-03-20T05:42:02Z

This issue is currently awaiting triage.

If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jacobstr · 2025-03-20T05:50:15Z

internal/store/pod.go

@@ -40,7 +40,12 @@ var (
 	podStatusReasons           = []string{"Evicted", "NodeAffinity", "NodeLost", "Shutdown", "UnexpectedAdmissionError"}
 )

-func podMetricFamilies(allowAnnotationsList, allowLabelsList []string) []generator.FamilyGenerator {
+func podMetricFamilies(injectPerSeriesMetadata bool, allowAnnotationsList []string, allowLabelsList []string) []generator.FamilyGenerator {
+	mc := &MetricConfig{


Doing this so instead of adding 3 arguments to each generator, I add one. I suppose it makes it less "pure" as the behavior of the function is dictated by a complex configuration object.

jacobstr · 2025-03-20T05:50:38Z

internal/store/pod.go

 					LabelKeys:   []string{"phase"},
-					LabelValues: []string{p.n},


We were shadowing the outer p (Pod).

jacobstr · 2025-03-20T05:51:21Z

internal/store/utils.go

@@ -38,6 +39,12 @@ var (
 	conditionStatuses  = []v1.ConditionStatus{v1.ConditionTrue, v1.ConditionFalse, v1.ConditionUnknown}
 )

+type MetricConfig struct {


Could be made private since it's built / used internally to this module.

jacobstr · 2025-03-21T08:10:32Z

internal/store/utils.go

@@ -175,6 +182,17 @@ func isPrefixedNativeResource(name v1.ResourceName) bool {
 	return strings.Contains(string(name), v1.ResourceDefaultNamespacePrefix)
 }

+// convenience wrapper to inject allow-listed labels and annotations to a metric if per-series injection is enabled.
+func injectLabelsAndAnnos(m *metric.Metric, metricConfig *MetricConfig, obj *metav1.ObjectMeta) *metric.Metric {
+	if !metricConfig.InjectPerSeriesMetadata {


I think with the pass-by-reference + early guard clause, this should be "0-cost" for those leaving the --inject-per-series-metadata at it's default of false.

jacobstr · 2025-03-21T08:16:01Z

internal/store/pod.go

@@ -82,7 +87,7 @@ func podMetricFamilies(allowAnnotationsList, allowLabelsList []string) []generat
 		createPodSpecVolumesPersistentVolumeClaimsInfoFamilyGenerator(),
 		createPodSpecVolumesPersistentVolumeClaimsReadonlyFamilyGenerator(),
 		createPodStartTimeFamilyGenerator(),
-		createPodStatusPhaseFamilyGenerator(),
+		createPodStatusPhaseFamilyGenerator(mc),


The full implementation would add this param to every metric family generator. This is also the reason I'm bundling up the configuration in a struct and passing it as a single unit.

See the discussion here kubernetes#2551 This frees users from having to join on the various label series, which is useful when infrastructure has agents that can't do recording rules.

mrueg · 2025-03-24T10:12:36Z

Thanks for your contribution!
Unfortunately, we are not able to accept it, as it goes against the intended design of kube-state-metrics and as maintainers, we do not want to support this.
https://github.com/kubernetes/kube-state-metrics/blob/main/docs/design/metrics-best-practices.md#avoid-pre-computation

A solution like this has come up multiple times, for more information, please take a look at:
#2428 (comment)
#1758 (comment)
#2129 (comment)

jacobstr · 2025-03-25T23:13:04Z

@mrueg if you can humor this for a moment, the intent here is to reduce the cardinality of metrics when they are forwarded and stored by giving cluster administrators an additional lever by which they can aggregate.

Yes, this can be done with recording rules, but those are only applied (generally) at backends with a TSDB. I've got one technology, albeit a closed-source, cloud offering from Grafana (adaptive metrics) where having pod labels would let me reduce 10k time series for a given pod metric down to something much more manageable by classifying workloads and aggregating them by a custom classification label.

The circumstances in the wild can get somewhat complex and perhaps arbitrary, so I don't want to over-index on what I'm doing necessarily, other than to suggest that allowing for labels directly removes a constraint on having to do aggregation against the kube_pod_labels time series. A concrete result of that is I might be able to run prometheus in agent mode "sooner." In general, I think this gives cluster administrators more flexibility around their metrics topology.

I wanted to talk about cardinality a bit. There's an assertion that this will increase cardinality that I want to argue.

Here's what I get when I query the kube-state-metrics endpoint directly:

kube_pod_info{namespace="apps",pod="app-5c674df6c4-qx65r",uid="xxx",host_ip="xxx",pod_ip="xxx",node="xxx",created_by_kind="ReplicaSet",created_by_name="app-5c674df6c4",priority_class="",host_network="false"} 1

The metric already includes the pod name. Adding a custom label mapping such as owner="team-xyz" Is not going to increase the overall number of kube_pod_info time series' under what I would regard as "normal" circumstances‡. There is already a unique time series per pod - the cardinality generally can't get worse than a unique value for every instance of a pod.

‡ I can contrive a scenario where e.g. the pod updates its labels while it's running. I think that's quite atypical, and in those cases, this would indeed be a questionable fit. However, I still think it's manageable in a multitude of ways - such as not enabling the capability on a subset of labels that have been made dynamic.

kube_pod_info{namespace="apps",pod="app-5c674df6c4-qx65r",uid="xxx",host_ip="xxx",pod_ip="xxx",node="xxx",created_by_kind="ReplicaSet",created_by_name="app-5c674df6c4",priority_class="",host_network="false", **label_owner="footeam"**}

Of course, you can't simply drop the pod ID during relabelling as you'll end up with multiple ambiguous time series. So I want some way to segment the metrics so that when I aggregate them I can do slightly better than aggregating all pods by namespace.

Additional memory usage also came up. I think this can be largely implemented in a zero-cost-ish manner unless enabled. Suppose a single label is mapped, I think we're talking an increase of 5-10% per label depending on the key/value lengths though the prom client libraries themselves might be clever here and use shared string references rather than allocating 10k copies of "footeam."

Again we're hard pressed to do worse than the pod label, and we can't simply labeldrop pod - we need to aggregate on "something."

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 20, 2025

k8s-ci-robot requested review from CatherineF-dev and mrueg March 20, 2025 05:42

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 20, 2025

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 20, 2025

jacobstr mentioned this pull request Mar 20, 2025

Propagate select pod labels to all metrics without requiring promql metric joins. #2551

Open

jacobstr commented Mar 20, 2025

View reviewed changes

internal/store/pod.go

LabelKeys: []string{"phase"},

LabelValues: []string{p.n},

Copy link

Contributor Author

jacobstr Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were shadowing the outer p (Pod).

jacobstr commented Mar 20, 2025

View reviewed changes

jacobstr commented Mar 21, 2025

View reviewed changes

jacobstr force-pushed the koobz/per-metric-metadata branch from 42b2197 to 9538070 Compare March 21, 2025 08:13

jacobstr commented Mar 21, 2025

View reviewed changes

add `inject-per-series-metadata'

a4c4905

See the discussion here kubernetes#2551 This frees users from having to join on the various label series, which is useful when infrastructure has agents that can't do recording rules.

jacobstr force-pushed the koobz/per-metric-metadata branch from 9538070 to a4c4905 Compare March 21, 2025 08:20

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 21, 2025

jacobstr marked this pull request as draft March 21, 2025 08:21

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft: add `inject-per-series-metadata' #2632

draft: add `inject-per-series-metadata' #2632

jacobstr commented Mar 20, 2025 •

edited

Loading

k8s-ci-robot commented Mar 20, 2025

k8s-ci-robot commented Mar 20, 2025

jacobstr Mar 20, 2025 •

edited

Loading

jacobstr Mar 20, 2025

jacobstr Mar 20, 2025

jacobstr Mar 21, 2025

jacobstr Mar 21, 2025

mrueg commented Mar 24, 2025

jacobstr commented Mar 25, 2025 •

edited

Loading

draft: add `inject-per-series-metadata' #2632

Are you sure you want to change the base?

draft: add `inject-per-series-metadata' #2632

Conversation

jacobstr commented Mar 20, 2025 • edited Loading

k8s-ci-robot commented Mar 20, 2025

k8s-ci-robot commented Mar 20, 2025

jacobstr Mar 20, 2025 • edited Loading

Choose a reason for hiding this comment

jacobstr Mar 20, 2025

Choose a reason for hiding this comment

jacobstr Mar 20, 2025

Choose a reason for hiding this comment

jacobstr Mar 21, 2025

Choose a reason for hiding this comment

jacobstr Mar 21, 2025

Choose a reason for hiding this comment

mrueg commented Mar 24, 2025

jacobstr commented Mar 25, 2025 • edited Loading

jacobstr commented Mar 20, 2025 •

edited

Loading

jacobstr Mar 20, 2025 •

edited

Loading

jacobstr commented Mar 25, 2025 •

edited

Loading