Skip to content

Conversation

@moko-poi
Copy link
Contributor

@moko-poi moko-poi commented Dec 3, 2025

What problem does this PR solve?

Closes #2221

Allows users to customize which NodePool labels are included in NodePool metrics (karpenter_nodepools_limit and karpenter_nodepools_usage), enabling better observability for custom attributes like capacity type, availability zone, and architecture.

How does this PR solve the problem?

Implementation approach

Added a CLI flag --additional-nodepool-metric-labels (environment variable: ADDITIONAL_NODEPOOL_METRIC_LABELS) that accepts a comma-separated list of label keys to include in NodePool metrics.

Default behavior: No additional labels (empty by default). Users must explicitly configure which labels to include.

Key design decisions

  1. CLI flag instead of annotations: Satisfies Prometheus constraint that all time series for a metric must have identical label keys. Label keys are fixed at startup when metrics are registered.

  2. Deferred initialization: Follows the same pattern as Node metrics controller - metrics are declared as package variables but initialized in initializeMetrics() called from NewController(). This ensures options are available when building the label key set.

  3. Empty string for missing labels: If a NodePool doesn't have a specified label, the metric uses an empty string "" for that label value. This maintains consistent label keys across all NodePools while allowing flexibility.

  4. No defaults: By not providing default labels, this PR stays focused on providing the capability without making assumptions about which labels exist on users' NodePools.

Example usage

# No additional labels (default)
karpenter

# Add custom labels for capacity type, zone, and architecture
karpenter --additional-nodepool-metric-labels=capacity_type,zone,architecture

# Or use environment variable
export ADDITIONAL_NODEPOOL_METRIC_LABELS=environment,team,version
karpenter

With a NodePool:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: production-spot
  labels:
    capacity_type: spot
    zone: us-east-1a
    architecture: amd64
spec:
  limits:
    cpu: "1000"

When configured with --additional-nodepool-metric-labels=capacity_type,zone,architecture, produces metrics:

karpenter_nodepools_limit{nodepool="production-spot",resource_type="cpu",capacity_type="spot",zone="us-east-1a",architecture="amd64"} 1000

Without the flag, produces metrics:

karpenter_nodepools_limit{nodepool="production-spot",resource_type="cpu"} 1000

Testing

  • Added comprehensive test coverage in suite_test.go:
    • Verifies additional labels are included in metrics when configured
    • Validates empty string behavior for missing labels
    • All 6 tests passing

Checklist

  • Added CLI flag with empty default
  • Implemented deferred metric initialization
  • Updated controller wiring to pass context
  • Added test helper support
  • Comprehensive test coverage (6/6 passing)
  • Follows existing Node metrics pattern for consistency

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 3, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @moko-poi. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 3, 2025
@moko-poi moko-poi force-pushed the feat/configurable-nodepool-metric-labels branch from 16e9f2b to f0f968e Compare December 3, 2025 12:04
@moko-poi
Copy link
Contributor Author

moko-poi commented Dec 3, 2025

The test is failing, but we've addressed it in #2687 (comment) .

Comment on lines 158 to 162
if value, ok := nodePool.Labels[labelKey]; ok {
labels[labelKey] = value
} else {
labels[labelKey] = ""
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already produces empty string on missing

Suggested change
if value, ok := nodePool.Labels[labelKey]; ok {
labels[labelKey] = value
} else {
labels[labelKey] = ""
}
labels[labelKey] = nodePool.Labels[labelKey]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed in bbe404f

Comment on lines +174 to +184
nodePool.Labels = map[string]string{
"capacity_type": "spot",
"zone": "us-east-1a",
"architecture": "amd64",
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these labels even commonly exist ?
I was hoping zone would translate to topology.kubernetes.io/zone
... maybe have that as the only default since it is a well known label ?
(alternatively have no defaults so this is noop for users that don't care)

Copy link
Contributor Author

@moko-poi moko-poi Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grosser f461d94
I've removed the default values entirely. The default is now an empty string, making this a no-op unless users explicitly configure it. This keeps the PR focused on providing the capability for users to configure additional metric labels, without making assumptions about which labels exist on their NodePools.

Comment on lines 166 to 172
if o.rawAdditionalNodePoolMetricLabels != "" {
o.AdditionalNodePoolMetricLabels = lo.Map(strings.Split(o.rawAdditionalNodePoolMetricLabels, ","), func(s string, _ int) string {
return strings.TrimSpace(s)
})
} else {
o.AdditionalNodePoolMetricLabels = []string{}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: less code paths 🤷

Suggested change
if o.rawAdditionalNodePoolMetricLabels != "" {
o.AdditionalNodePoolMetricLabels = lo.Map(strings.Split(o.rawAdditionalNodePoolMetricLabels, ","), func(s string, _ int) string {
return strings.TrimSpace(s)
})
} else {
o.AdditionalNodePoolMetricLabels = []string{}
}
o.AdditionalNodePoolMetricLabels = lo.Map(strings.Split(o.rawAdditionalNodePoolMetricLabels, ","), func(s string, _ int) string {
return strings.TrimSpace(s)
})

Copy link
Contributor Author

@moko-poi moko-poi Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3377cac
Updated to use lo.FilterMap to reduce code paths as suggested. I also added filtering for empty strings since strings.Split("", ",") returns []string{""} rather than an empty slice, which would cause invalid empty label keys in the metrics controller.

Copy link

@grosser grosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic makes sense
idk about the defaults

@moko-poi moko-poi requested a review from grosser December 13, 2025 11:50
@moko-poi moko-poi force-pushed the feat/configurable-nodepool-metric-labels branch from 1595f79 to f461d94 Compare December 13, 2025 11:59
@moko-poi
Copy link
Contributor Author

@grosser
I've updated the implementation to remove the default labels (capacity_type, zone, architecture). The default is now an empty string, making this feature a no-op unless users explicitly configure it.

This keeps the PR scope focused on providing the capability for users to add custom labels to their metrics, rather than imposing specific defaults that may not match their label schemas.

@coveralls
Copy link

coveralls commented Dec 13, 2025

Pull Request Test Coverage Report for Build 20636338260

Details

  • 57 of 58 (98.28%) changed or added relevant lines in 4 files are covered.
  • 4 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.04%) to 80.53%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/controllers.go 0 1 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/controllers/disruption/consolidation.go 4 88.0%
Totals Coverage Status
Change from base Build 20628152380: 0.04%
Covered Lines: 12024
Relevant Lines: 14931

💛 - Coveralls

@moko-poi moko-poi force-pushed the feat/configurable-nodepool-metric-labels branch from f461d94 to 48d503e Compare December 13, 2025 14:56
Copy link

@grosser grosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: grosser, moko-poi
Once this PR has been reviewed and has the lgtm label, please assign mwielgus for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 1, 2026
Replace conditional logic with lo.FilterMap to reduce code paths and
filter empty strings. This prevents invalid empty label keys from being
added to metrics when the input contains empty strings or consecutive
commas.
@moko-poi moko-poi force-pushed the feat/configurable-nodepool-metric-labels branch from 48d503e to c9213b0 Compare January 1, 2026 09:36
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow setting additional metric labels

4 participants