|
| 1 | +# CNCF AI Conformance Evidence |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This directory contains evidence for [CNCF Kubernetes AI Conformance](https://github.com/cncf/k8s-ai-conformance) |
| 6 | +certification. The evidence demonstrates that a cluster configured with a specific |
| 7 | +recipe meets the Must-have requirements for Kubernetes v1.34. |
| 8 | + |
| 9 | +> **Note:** It is the **cluster configured by a recipe** that is conformant, not the |
| 10 | +> tool itself. The recipe determines which components are deployed and how they are |
| 11 | +> configured. Different recipes may produce clusters with different conformance profiles. |
| 12 | +
|
| 13 | +**Recipe used:** `h100-eks-ubuntu-inference-dynamo` |
| 14 | +**Cluster:** EKS with p5.48xlarge (8x NVIDIA H100 80GB HBM3) |
| 15 | +**Kubernetes:** v1.34 |
| 16 | + |
| 17 | +## Directory Structure |
| 18 | + |
| 19 | +``` |
| 20 | +docs/conformance/cncf/ |
| 21 | +├── README.md |
| 22 | +├── collect-evidence.sh |
| 23 | +├── manifests/ |
| 24 | +│ ├── dra-gpu-test.yaml |
| 25 | +│ └── gang-scheduling-test.yaml |
| 26 | +└── evidence/ |
| 27 | + ├── index.md |
| 28 | + ├── dra-support.md |
| 29 | + ├── gang-scheduling.md |
| 30 | + ├── secure-accelerator-access.md |
| 31 | + ├── accelerator-metrics.md |
| 32 | + ├── inference-gateway.md |
| 33 | + └── robust-operator.md |
| 34 | +``` |
| 35 | + |
| 36 | +## Usage |
| 37 | + |
| 38 | +```bash |
| 39 | +# Collect all evidence |
| 40 | +./docs/conformance/cncf/collect-evidence.sh all |
| 41 | + |
| 42 | +# Collect evidence for a single feature |
| 43 | +./docs/conformance/cncf/collect-evidence.sh dra |
| 44 | +./docs/conformance/cncf/collect-evidence.sh gang |
| 45 | +./docs/conformance/cncf/collect-evidence.sh secure |
| 46 | +./docs/conformance/cncf/collect-evidence.sh metrics |
| 47 | +./docs/conformance/cncf/collect-evidence.sh gateway |
| 48 | +./docs/conformance/cncf/collect-evidence.sh operator |
| 49 | +``` |
| 50 | + |
| 51 | +## Evidence |
| 52 | + |
| 53 | +See [evidence/index.md](evidence/index.md) for a summary of all collected evidence and results. |
| 54 | + |
| 55 | +## Feature Areas |
| 56 | + |
| 57 | +| # | Feature | Requirement | Evidence File | |
| 58 | +|---|---------|-------------|---------------| |
| 59 | +| 1 | DRA Support | `dra_support` | [evidence/dra-support.md](evidence/dra-support.md) | |
| 60 | +| 2 | Gang Scheduling | `gang_scheduling` | [evidence/gang-scheduling.md](evidence/gang-scheduling.md) | |
| 61 | +| 3 | Secure Accelerator Access | `secure_accelerator_access` | [evidence/secure-accelerator-access.md](evidence/secure-accelerator-access.md) | |
| 62 | +| 4 | Accelerator & AI Service Metrics | `accelerator_metrics`, `ai_service_metrics` | [evidence/accelerator-metrics.md](evidence/accelerator-metrics.md) | |
| 63 | +| 5 | Inference API Gateway | `ai_inference` | [evidence/inference-gateway.md](evidence/inference-gateway.md) | |
| 64 | +| 6 | Robust AI Operator | `robust_controller` | [evidence/robust-operator.md](evidence/robust-operator.md) | |
| 65 | + |
| 66 | +## TODO |
| 67 | + |
| 68 | +- [ ] **Cluster Autoscaling** (`cluster_autoscaling`, MUST) — Demonstrate Karpenter or cluster autoscaler scaling GPU node groups based on pending pod requests |
| 69 | +- [ ] **Pod Autoscaling** (`pod_autoscaling`, MUST) — Demonstrate HPA scaling pods based on custom GPU metrics (e.g., `gpu_utilization` from prometheus-adapter) |
0 commit comments