Skip to content

Initialize DRA request metrics series at startup#1029

Open
dims wants to merge 1 commit intokubernetes-sigs:mainfrom
dims:worktree-fix-request-metrics-init
Open

Initialize DRA request metrics series at startup#1029
dims wants to merge 1 commit intokubernetes-sigs:mainfrom
dims:worktree-fix-request-metrics-init

Conversation

@dims
Copy link
Copy Markdown
Member

@dims dims commented Apr 12, 2026

This PR fixes the root cause of the metrics smoke-test failure seen in #1025.

While #1028 relaxed the test, the real issue was that the DRA request metrics were not visible on /metrics until the first prepare or unprepare call created labeled series, so a fresh kubelet-plugin scrape could show nvidia_dra_prepared_devices but not nvidia_dra_requests_total as in the:
Prow log.

This change initializes the request metric series at startup for both kubelet plugins before the HTTP endpoint is exposed, keeps them visible at 0 on the first scrape, adds a regression test for that behavior, and updates the error-metric descriptions to match current usage.

Pre-create the DRA request metric series before exposing /metrics so the first Prometheus scrape includes zero-valued request counters, histograms, and in-flight gauges even before any request has been processed.

Add a regression test that exercises the metrics handler prior to the first DRA request and confirms the initialized series are present in the exposition output.

Also clarify the kubelet-plugin error metric descriptions to match their current usage.
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 12, 2026
@dims
Copy link
Copy Markdown
Member Author

dims commented Apr 13, 2026

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 13, 2026
@dims
Copy link
Copy Markdown
Member Author

dims commented Apr 13, 2026

/assign @shengnuo

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@dims: GitHub didn't allow me to assign the following users: shengnuo.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

Details

In response to this:

/assign @shengnuo

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants