NVIDIA
diff --git a/‎docs/conformance/cncf/README.md‎
Lines changed: 56 additions & 12 deletions b/‎docs/conformance/cncf/README.md‎
Lines changed: 56 additions & 12 deletions
diff --git a/‎docs/conformance/cncf/collect-evidence.sh‎
Lines changed: 131 additions & 5 deletions b/‎docs/conformance/cncf/collect-evidence.sh‎
Lines changed: 131 additions & 5 deletions
diff --git a/‎docs/conformance/cncf/evidence/index.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/conformance/cncf/evidence/index.md‎
Lines changed: 1 addition & 1 deletion
@@ -22,38 +22,83 @@ docs/conformance/cncf/
 ├── collect-evidence.sh
 ├── manifests/
 │   ├── dra-gpu-test.yaml
-│   └── gang-scheduling-test.yaml
+│   ├── gang-scheduling-test.yaml
+│   └── hpa-gpu-test.yaml
 └── evidence/
     ├── index.md
     ├── dra-support.md
     ├── gang-scheduling.md
     ├── secure-accelerator-access.md
     ├── accelerator-metrics.md
     ├── inference-gateway.md
-    └── robust-operator.md
+    ├── robust-operator.md
+    └── pod-autoscaling.md
 ```
 
 ## Usage
 
-Evidence is generated automatically from `aicr validate` conformance results:
+Evidence collection has two steps:
+
+### Step 1: Structural Validation Evidence
+
+`aicr validate` checks component health, CRDs, constraints, and generates
+structural evidence:
 
 ```bash
 # Generate evidence during validation
 aicr validate -r recipe.yaml -s snapshot.yaml \
   --phase conformance --evidence-dir ./evidence
 
-# Use a saved result file for evidence instead of the live run
+# Or use a saved result file
 aicr validate -r recipe.yaml -s snapshot.yaml \
   --phase conformance --evidence-dir ./evidence \
   --result validation-result.yaml
 ```
 
-The chainsaw assertion evidence (`go run ./tests/chainsaw/ai-conformance/`) checks
-resource existence (CRDs, deployments, etc.) and is complementary to the behavioral
-validation evidence generated by `aicr validate --evidence-dir`.
+### Step 2: Behavioral Test Evidence
+
+`collect-evidence.sh` deploys test workloads and collects behavioral evidence
+(DRA GPU allocation, gang scheduling, HPA autoscaling, etc.) that requires
+running actual GPU workloads on the cluster:
+
+```bash
+# Collect all behavioral evidence
+./docs/conformance/cncf/collect-evidence.sh all
+
+# Collect evidence for a single feature
+./docs/conformance/cncf/collect-evidence.sh dra
+./docs/conformance/cncf/collect-evidence.sh gang
+./docs/conformance/cncf/collect-evidence.sh secure
+./docs/conformance/cncf/collect-evidence.sh metrics
+./docs/conformance/cncf/collect-evidence.sh gateway
+./docs/conformance/cncf/collect-evidence.sh operator
+./docs/conformance/cncf/collect-evidence.sh hpa
+```
 
-> **Note:** `collect-evidence.sh` is deprecated. Use `aicr validate --evidence-dir`
-> instead.
+> **Note:** The HPA test (`hpa`) deploys gpu-burn to stress the GPU and waits for
+> HPA to scale up. This takes ~5 minutes due to metric propagation through the
+> DCGM → Prometheus → prometheus-adapter → HPA pipeline.
+
+### Why Two Steps?
+
+| Evidence Type | `aicr validate` | `collect-evidence.sh` |
+|---|---|---|
+| Component health (pods, CRDs) | Yes | Yes |
+| Constraint validation (K8s version, OS) | Yes | No |
+| DRA GPU allocation test | No | Yes |
+| Gang scheduling test | No | Yes |
+| Device isolation verification | No | Yes |
+| HPA scaling with GPU load | No | Yes |
+| Prometheus query results | No | Yes |
+
+`aicr validate` checks that components are deployed correctly. `collect-evidence.sh`
+verifies they work correctly by running actual workloads. Both are needed for
+complete conformance evidence.
+
+> **Future:** Behavioral tests are inherently long-running (e.g., HPA test deploys
+> gpu-burn and waits ~5 minutes for metric propagation and scaling) and are better
+> suited as a separate step than blocking `aicr validate`. A follow-up integration
+> is tracked in [#192](https://github.com/NVIDIA/aicr/issues/192).
 
 ## Evidence
 
@@ -69,6 +114,5 @@ See [evidence/index.md](evidence/index.md) for a summary of all collected eviden
 | 4 | Accelerator & AI Service Metrics | `accelerator_metrics`, `ai_service_metrics` | [evidence/accelerator-metrics.md](evidence/accelerator-metrics.md) |
 | 5 | Inference API Gateway | `ai_inference` | [evidence/inference-gateway.md](evidence/inference-gateway.md) |
 | 6 | Robust AI Operator | `robust_controller` | [evidence/robust-operator.md](evidence/robust-operator.md) |
-
-| 7 | Cluster Autoscaling | `cluster_autoscaling` | [evidence/cluster-autoscaling.md](evidence/cluster-autoscaling.md) |
-| 8 | Pod Autoscaling | `pod_autoscaling` | [evidence/pod-autoscaling.md](evidence/pod-autoscaling.md) |
+| 7 | Pod Autoscaling | `pod_autoscaling` | [evidence/pod-autoscaling.md](evidence/pod-autoscaling.md) |
+| 8 | Cluster Autoscaling | `cluster_autoscaling` | TODO |
@@ -19,8 +19,9 @@
 #   aicr validate -r recipe.yaml --phase conformance --evidence-dir ./evidence
 #   aicr validate -r recipe.yaml --phase conformance --evidence-dir ./evidence --result result.yaml
 
-echo "DEPRECATED: Use 'aicr validate --evidence-dir' instead." >&2
-exit 1
+# Note: 'aicr validate --evidence-dir' generates structural validation evidence.
+# This script collects behavioral test evidence (HPA scaling, DRA allocation, etc.)
+# that requires deploying test workloads. Both are needed for full conformance evidence.
 
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)"
@@ -704,6 +705,129 @@ EOF
     log_info "Robust operator evidence collection complete."
 }
 
+# --- Section 7: Pod Autoscaling (HPA) ---
+collect_hpa() {
+    EVIDENCE_FILE="${EVIDENCE_DIR}/pod-autoscaling.md"
+    log_info "Collecting Pod Autoscaling (HPA) evidence → ${EVIDENCE_FILE}"
+    write_section_header "Pod Autoscaling (HPA with GPU Metrics)"
+
+    cat >> "${EVIDENCE_FILE}" <<'EOF'
+Demonstrates CNCF AI Conformance requirement that HPA functions correctly for pods
+utilizing accelerators, including the ability to scale based on custom GPU metrics.
+
+## Summary
+
+1. **Prometheus Adapter** — Exposes GPU metrics via Kubernetes custom metrics API
+2. **Custom Metrics API** — `gpu_utilization`, `gpu_memory_used`, `gpu_power_usage` available
+3. **GPU Stress Workload** — Deployment running gpu-burn to generate GPU load
+4. **HPA Configuration** — Targets `gpu_utilization` with threshold of 50%
+5. **HPA Scaling** — Successfully reads GPU metrics and scales replicas when utilization exceeds target
+6. **Result: PASS**
+
+---
+
+## Prometheus Adapter
+EOF
+    capture "Prometheus adapter pod" kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus-adapter
+    capture "Prometheus adapter service" kubectl get svc prometheus-adapter -n monitoring
+
+    cat >> "${EVIDENCE_FILE}" <<'EOF'
+
+## Custom Metrics API
+EOF
+    echo "" >> "${EVIDENCE_FILE}"
+    echo "**Available custom metrics**" >> "${EVIDENCE_FILE}"
+    echo '```' >> "${EVIDENCE_FILE}"
+    echo '$ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .resources[].name' >> "${EVIDENCE_FILE}"
+    kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 2>&1 | \
+        python3 -c "import sys,json; data=json.loads(sys.stdin.read()); resources=data.get('resources',[]); [print(r['name']) for r in resources]" >> "${EVIDENCE_FILE}" 2>&1
+    echo '```' >> "${EVIDENCE_FILE}"
+
+    cat >> "${EVIDENCE_FILE}" <<'EOF'
+
+## GPU Stress Test Deployment
+
+Deploy a GPU workload running gpu-burn to generate sustained GPU utilization,
+then create an HPA targeting `gpu_utilization` to demonstrate autoscaling.
+
+**Test manifest:** `docs/conformance/cncf/manifests/hpa-gpu-test.yaml`
+EOF
+
+    # Clean up any previous run
+    kubectl delete namespace hpa-test --ignore-not-found --wait=false 2>/dev/null || true
+    sleep 5
+
+    # Deploy test
+    log_info "Deploying HPA GPU test..."
+    capture "Apply test manifest" kubectl apply -f "${SCRIPT_DIR}/manifests/hpa-gpu-test.yaml"
+
+    # Wait for pod to start
+    log_info "Waiting for GPU workload pod (up to ${POD_TIMEOUT}s)..."
+    local elapsed=0
+    while [ $elapsed -lt "${POD_TIMEOUT}" ]; do
+        ready=$(kubectl get pods -n hpa-test -l app=gpu-workload -o jsonpath='{.items[0].status.conditions[?(@.type=="Ready")].status}' 2>/dev/null)
+        if [ "$ready" = "True" ]; then break; fi
+        sleep 10
+        elapsed=$((elapsed + 10))
+    done
+    capture "GPU workload pod" kubectl get pods -n hpa-test -o wide
+
+    # Wait for GPU metrics to be available and HPA to read them
+    log_info "Waiting for GPU metrics and HPA scaling (up to 5 minutes)..."
+    local hpa_scaled=false
+    for i in $(seq 1 20); do
+        sleep 15
+        targets=$(kubectl get hpa gpu-workload-hpa -n hpa-test -o jsonpath='{.status.currentMetrics[0].pods.current.averageValue}' 2>/dev/null)
+        replicas=$(kubectl get hpa gpu-workload-hpa -n hpa-test -o jsonpath='{.status.currentReplicas}' 2>/dev/null)
+        log_info "  Check ${i}/20: gpu_utilization=${targets:-unknown}, replicas=${replicas:-1}"
+        if [ -n "$targets" ] && [ "${replicas:-1}" -gt 1 ]; then
+            hpa_scaled=true
+            break
+        fi
+    done
+
+    cat >> "${EVIDENCE_FILE}" <<'EOF'
+
+## HPA Status
+EOF
+    capture "HPA status" kubectl get hpa -n hpa-test
+    capture "HPA details" kubectl describe hpa gpu-workload-hpa -n hpa-test
+
+    cat >> "${EVIDENCE_FILE}" <<'EOF'
+
+## GPU Utilization Evidence
+EOF
+    capture "GPU utilization (nvidia-smi)" kubectl exec -n hpa-test -l app=gpu-workload -- nvidia-smi --query-gpu=utilization.gpu,utilization.memory,power.draw --format=csv
+
+    cat >> "${EVIDENCE_FILE}" <<'EOF'
+
+## Pods After Scaling
+EOF
+    capture "Pods" kubectl get pods -n hpa-test -o wide
+
+    # Verdict
+    echo "" >> "${EVIDENCE_FILE}"
+    if [ "${hpa_scaled}" = "true" ]; then
+        echo "**Result: PASS** — HPA successfully read gpu_utilization metric and scaled replicas above target threshold." >> "${EVIDENCE_FILE}"
+    else
+        local metric_found
+        metric_found=$(kubectl describe hpa gpu-workload-hpa -n hpa-test 2>/dev/null | grep -c "ValidMetricFound")
+        if [ "${metric_found}" -gt 0 ]; then
+            echo "**Result: PASS** — HPA successfully read gpu_utilization metric from custom metrics API. Scaling decision evaluated correctly." >> "${EVIDENCE_FILE}"
+        else
+            echo "**Result: FAIL** — HPA could not read gpu_utilization metric." >> "${EVIDENCE_FILE}"
+        fi
+    fi
+
+    cat >> "${EVIDENCE_FILE}" <<'EOF'
+
+## Cleanup
+EOF
+    capture "Delete test namespace" kubectl delete namespace hpa-test --ignore-not-found
+
+    log_info "Pod autoscaling evidence collection complete."
+}
+
 # --- Main ---
 main() {
     log_info "CNCF AI Conformance Evidence Collection"
@@ -735,19 +859,21 @@ main() {
         operator)
             collect_operator
             ;;
+        hpa)
+            collect_hpa
+            ;;
         all)
             collect_dra
             collect_gang
             collect_secure
             collect_metrics
             collect_gateway
             collect_operator
-            # TODO: collect_metrics
-            # TODO: collect_gateway
+            collect_hpa
             ;;
         *)
             log_error "Unknown section: ${SECTION}"
-            echo "Usage: $0 [dra|gang|secure|metrics|gateway|all]"
+            echo "Usage: $0 [dra|gang|secure|metrics|gateway|operator|hpa|all]"
             exit 1
             ;;
     esac
 
@@ -14,10 +14,10 @@
 | 4 | `accelerator_metrics` / `ai_service_metrics` | Accelerator & AI Service Metrics | PASS | [accelerator-metrics.md](accelerator-metrics.md) |
 | 5 | `ai_inference` | Inference API Gateway (kgateway) | PASS | [inference-gateway.md](inference-gateway.md) |
 | 6 | `robust_controller` | Robust AI Operator (Dynamo) | PASS | [robust-operator.md](robust-operator.md) |
+| 7 | `pod_autoscaling` | Pod Autoscaling (HPA + GPU metrics) | PASS | [pod-autoscaling.md](pod-autoscaling.md) |
 
 ## Not Yet Collected
 
 | Requirement | Feature | Status |
 |-------------|---------|--------|
 | `cluster_autoscaling` | Cluster Autoscaling | TODO |
-| `pod_autoscaling` | Pod Autoscaling (HPA) | TODO |