|
| 1 | +metadata: |
| 2 | + kubernetesVersion: v1.34 |
| 3 | + platformName: "Kubernetes Platforms Powered by NVIDIA AI Cluster Runtime (AICR)" |
| 4 | + platformVersion: "0.7.7" |
| 5 | + vendorName: "NVIDIA" |
| 6 | + websiteUrl: "https://github.com/NVIDIA/aicr" |
| 7 | + repoUrl: "https://github.com/NVIDIA/aicr" |
| 8 | + documentationUrl: "https://github.com/NVIDIA/aicr/blob/main/README.md" |
| 9 | + productLogoUrl: "https://www.nvidia.com/favicon.ico" |
| 10 | + description: >- |
| 11 | + Kubernetes platforms powered by NVIDIA AI Cluster Runtime (AICR) are CNCF AI |
| 12 | + Conformant. AICR generates validated, GPU-accelerated Kubernetes |
| 13 | + configurations that satisfy all CNCF AI Conformance requirements. |
| 14 | + contactEmailAddress: "aicr-maintainers@nvidia.com" |
| 15 | + |
| 16 | +spec: |
| 17 | + accelerators: |
| 18 | + - id: dra_support |
| 19 | + description: "Support Dynamic Resource Allocation (DRA) APIs to enable more flexible and fine-grained resource requests beyond simple counts." |
| 20 | + level: MUST |
| 21 | + status: "Implemented" |
| 22 | + evidence: |
| 23 | + - "https://github.com/NVIDIA/aicr/blob/main/docs/conformance/cncf/evidence/dra-support.md" |
| 24 | + notes: >- |
| 25 | + DRA API (resource.k8s.io/v1) is enabled with DeviceClass, ResourceClaim, |
| 26 | + ResourceClaimTemplate, and ResourceSlice resources available. The NVIDIA |
| 27 | + DRA driver runs as controller and kubelet-plugin pods, advertising |
| 28 | + individual H100 GPU devices via ResourceSlices with unique UUIDs, PCI |
| 29 | + bus IDs, CUDA compute capability, and memory capacity. GPU allocation to |
| 30 | + pods is mediated through ResourceClaims. |
| 31 | + networking: |
| 32 | + - id: ai_inference |
| 33 | + description: >- |
| 34 | + Support the Kubernetes Gateway API with an implementation for advanced |
| 35 | + traffic management for inference services, which enables capabilities |
| 36 | + like weighted traffic splitting, header-based routing (for OpenAI |
| 37 | + protocol headers), and optional integration with service meshes. |
| 38 | + level: MUST |
| 39 | + status: "Implemented" |
| 40 | + evidence: |
| 41 | + - "https://github.com/NVIDIA/aicr/blob/main/docs/conformance/cncf/evidence/inference-gateway.md" |
| 42 | + notes: >- |
| 43 | + kgateway controller is deployed with full Gateway API CRD support |
| 44 | + (GatewayClass, Gateway, HTTPRoute, GRPCRoute, ReferenceGrant). Inference |
| 45 | + extension CRDs (InferencePool, InferenceModelRewrite, |
| 46 | + InferenceObjective) are registered. An active inference gateway is |
| 47 | + verified with GatewayClass Accepted=True and Gateway Programmed=True |
| 48 | + conditions. |
| 49 | + schedulingOrchestration: |
| 50 | + - id: gang_scheduling |
| 51 | + description: >- |
| 52 | + The platform must allow for the installation and successful operation of |
| 53 | + at least one gang scheduling solution that ensures all-or-nothing |
| 54 | + scheduling for distributed AI workloads (e.g. Kueue, Volcano, etc.) To |
| 55 | + be conformant, the vendor must demonstrate that their platform can |
| 56 | + successfully run at least one such solution. |
| 57 | + level: MUST |
| 58 | + status: "Implemented" |
| 59 | + evidence: |
| 60 | + - "https://github.com/NVIDIA/aicr/blob/main/docs/conformance/cncf/evidence/gang-scheduling.md" |
| 61 | + notes: >- |
| 62 | + KAI Scheduler is deployed with operator, scheduler, admission |
| 63 | + controller, pod-grouper, and queue-controller components. PodGroup CRD |
| 64 | + (scheduling.run.ai) is registered. Gang scheduling is verified by |
| 65 | + deploying a PodGroup with minMember=2 and two GPU pods, demonstrating |
| 66 | + all-or-nothing atomic scheduling. |
| 67 | + - id: cluster_autoscaling |
| 68 | + description: >- |
| 69 | + If the platform provides a cluster autoscaler or an equivalent |
| 70 | + mechanism, it must be able to scale up/down node groups containing |
| 71 | + specific accelerator types based on pending pods requesting those |
| 72 | + accelerators. |
| 73 | + level: MUST |
| 74 | + status: "Implemented" |
| 75 | + evidence: |
| 76 | + - "https://github.com/NVIDIA/aicr/blob/main/docs/conformance/cncf/evidence/cluster-autoscaling.md" |
| 77 | + notes: >- |
| 78 | + Demonstrated on EKS with a GPU Auto Scaling Group (p5.48xlarge, 8x H100 |
| 79 | + per node). The ASG is tagged for Cluster Autoscaler discovery |
| 80 | + (k8s.io/cluster-autoscaler/enabled, |
| 81 | + k8s.io/cluster-autoscaler/<cluster>=owned) and supports scaling from |
| 82 | + min=1 to max=2 GPU nodes based on pending pod demand. |
| 83 | + - id: pod_autoscaling |
| 84 | + description: >- |
| 85 | + If the platform supports the HorizontalPodAutoscaler, it must function |
| 86 | + correctly for pods utilizing accelerators. This includes the ability to |
| 87 | + scale these Pods based on custom metrics relevant to AI/ML workloads. |
| 88 | + level: MUST |
| 89 | + status: "Implemented" |
| 90 | + evidence: |
| 91 | + - "https://github.com/NVIDIA/aicr/blob/main/docs/conformance/cncf/evidence/pod-autoscaling.md" |
| 92 | + notes: >- |
| 93 | + Prometheus adapter exposes GPU custom metrics (gpu_utilization, |
| 94 | + gpu_memory_used, gpu_power_usage) via the Kubernetes custom metrics API. |
| 95 | + HPA is configured to target gpu_utilization at 50% threshold. Under GPU |
| 96 | + stress testing (CUDA N-Body Simulation), HPA successfully scales |
| 97 | + replicas from 1 to 2 pods when utilization exceeds the target, and |
| 98 | + scales back down when GPU load is removed. |
| 99 | + observability: |
| 100 | + - id: accelerator_metrics |
| 101 | + description: >- |
| 102 | + For supported accelerator types, the platform must allow for the |
| 103 | + installation and successful operation of at least one accelerator |
| 104 | + metrics solution that exposes fine-grained performance metrics via a |
| 105 | + standardized, machine-readable metrics endpoint. This must include a |
| 106 | + core set of metrics for per-accelerator utilization and memory usage. |
| 107 | + Additionally, other relevant metrics such as temperature, power draw, |
| 108 | + and interconnect bandwidth should be exposed if the underlying hardware |
| 109 | + or virtualization layer makes them available. The list of metrics should |
| 110 | + align with emerging standards, such as OpenTelemetry metrics, to ensure |
| 111 | + interoperability. The platform may provide a managed solution, but this |
| 112 | + is not required for conformance. |
| 113 | + level: MUST |
| 114 | + status: "Implemented" |
| 115 | + evidence: |
| 116 | + - "https://github.com/NVIDIA/aicr/blob/main/docs/conformance/cncf/evidence/accelerator-metrics.md" |
| 117 | + notes: >- |
| 118 | + DCGM Exporter runs on GPU nodes exposing metrics at :9400/metrics in |
| 119 | + Prometheus format. Per-GPU metrics include utilization, memory usage, |
| 120 | + temperature (26-31C), and power draw (66-115W). Metrics include |
| 121 | + pod/namespace/container labels for per-workload attribution. Prometheus |
| 122 | + actively scrapes DCGM metrics via ServiceMonitor. |
| 123 | + - id: ai_service_metrics |
| 124 | + description: >- |
| 125 | + Provide a monitoring system capable of discovering and collecting |
| 126 | + metrics from workloads that expose them in a standard format (e.g. |
| 127 | + Prometheus exposition format). This ensures easy integration for |
| 128 | + collecting key metrics from common AI frameworks and servers. |
| 129 | + level: MUST |
| 130 | + status: "Implemented" |
| 131 | + evidence: |
| 132 | + - "https://github.com/NVIDIA/aicr/blob/main/docs/conformance/cncf/evidence/accelerator-metrics.md" |
| 133 | + notes: >- |
| 134 | + Prometheus and Grafana are deployed as the monitoring stack. Prometheus |
| 135 | + discovers and scrapes workloads exposing metrics in Prometheus |
| 136 | + exposition format via ServiceMonitors. The prometheus-adapter bridges |
| 137 | + these metrics into the Kubernetes custom metrics API for consumption by |
| 138 | + HPA and other controllers. |
| 139 | + security: |
| 140 | + - id: secure_accelerator_access |
| 141 | + description: >- |
| 142 | + Ensure that access to accelerators from within containers is properly |
| 143 | + isolated and mediated by the Kubernetes resource management framework |
| 144 | + (device plugin or DRA) and container runtime, preventing unauthorized |
| 145 | + access or interference between workloads. |
| 146 | + level: MUST |
| 147 | + status: "Implemented" |
| 148 | + evidence: |
| 149 | + - "https://github.com/NVIDIA/aicr/blob/main/docs/conformance/cncf/evidence/secure-accelerator-access.md" |
| 150 | + notes: >- |
| 151 | + GPU Operator manages all GPU lifecycle components (driver, device-plugin, |
| 152 | + DCGM, toolkit, validator, MIG manager). 8x H100 GPUs are individually |
| 153 | + advertised via ResourceSlices with DRA. Pod volumes contain only |
| 154 | + kube-api-access projected tokens — no hostPath mounts to /dev/nvidia |
| 155 | + devices. Device isolation is verified: a test pod requesting 1 GPU sees |
| 156 | + only the single allocated device. |
| 157 | + operator: |
| 158 | + - id: robust_controller |
| 159 | + description: >- |
| 160 | + The platform must prove that at least one complex AI operator with a |
| 161 | + CRD (e.g., Ray, Kubeflow) can be installed and functions reliably. This |
| 162 | + includes verifying that the operator's pods run correctly, its webhooks |
| 163 | + are operational, and its custom resources can be reconciled. |
| 164 | + level: MUST |
| 165 | + status: "Implemented" |
| 166 | + evidence: |
| 167 | + - "https://github.com/NVIDIA/aicr/blob/main/docs/conformance/cncf/evidence/robust-operator.md" |
| 168 | + notes: >- |
| 169 | + NVIDIA Dynamo operator is deployed with 6 CRDs (DynamoGraphDeployment, |
| 170 | + DynamoComponentDeployment, DynamoGraphDeploymentRequest, |
| 171 | + DynamoGraphDeploymentScalingAdapter, DynamoModel, DynamoWorkerMetadata). |
| 172 | + Validating webhooks are active and verified via rejection test (invalid |
| 173 | + CR correctly denied). A DynamoGraphDeployment custom resource is |
| 174 | + reconciled with frontend and GPU-enabled worker pods running |
| 175 | + successfully. |
0 commit comments