docs(conformance): add ai_service_metrics evidence for CNCF submission#460
Merged
yuanchen8911 merged 1 commit intoNVIDIA:mainfrom Mar 24, 2026
Merged
Conversation
7d728b7 to
14003ce
Compare
22a81bd to
0132161
Compare
…location Add dedicated evidence for the ai_service_metrics MUST requirement, showing Prometheus ServiceMonitor discovery and scraping of a vLLM inference workload's Prometheus-format metrics endpoint. Evidence includes real inference traffic: 10 requests, 500 generation tokens, TTFT and inter-token latency metrics collected from Prometheus. Update vllm-agg.yaml to use DRA ResourceClaims instead of device-plugin GPU requests, fixing deployment on DRA-only clusters with KAI scheduler. Add vllm-metrics-test.yaml for standalone vLLM metrics evidence collection. Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
0132161 to
e9b3b7c
Compare
Contributor
Author
Trivy FindingsThe initial push had 8 Trivy findings on
2 remaining (justified):
|
cullenmcdermott
approved these changes
Mar 24, 2026
25 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add dedicated evidence for the
ai_service_metricsMUST requirement, splitting it from the sharedaccelerator-metrics.mdfile.Motivation / Context
The CNCF AI Conformance
ai_service_metricsrequirement asks for "discovering and collecting metrics from workloads that expose them in a standard format (Prometheus exposition format)." Previously bothaccelerator_metricsandai_service_metricspointed to the sameaccelerator-metrics.mdevidence file which only covered DCGM hardware metrics. A reviewer could reasonably object that infrastructure-level GPU metrics are not the same as workload-level service metrics.Fixes: N/A
Related: CNCF AI Conformance submission
Type of Change
Component(s) Affected
cmd/aicr,pkg/cli)cmd/aicrd,pkg/api,pkg/server)pkg/recipe)pkg/bundler,pkg/component/*)pkg/collector,pkg/snapshotter)pkg/validator)pkg/errors,pkg/k8s)docs/,examples/)Implementation Notes
New
ai-service-metrics.mdevidence collected from theaicr-cuj2EKS cluster, showing:/metricsendpointdynamo_operator_reconcile_duration_seconds_*andcontroller_runtime_reconcile_totalper CRD controllerUpdated
index.md,submission/README.md, anddocs/conformance/cncf/index.mdto split the combinedaccelerator_metrics/ai_service_metricsrow into separate entries.Testing
# Documentation-only change, no code affectedRisk Assessment
Rollout notes: N/A
Checklist
make testwith-race)make lint)git commit -S) — GPG signing info