NVIDIA
diff --git a/‎docs/conformance/cncf/README.md‎
Lines changed: 44 additions & 54 deletions b/‎docs/conformance/cncf/README.md‎
Lines changed: 44 additions & 54 deletions
diff --git a/‎docs/conformance/cncf/evidence/dra-support.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/conformance/cncf/evidence/dra-support.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/conformance/cncf/evidence/gang-scheduling.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/conformance/cncf/evidence/gang-scheduling.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/conformance/cncf/evidence/pod-autoscaling.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/conformance/cncf/evidence/pod-autoscaling.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎pkg/cli/validate.go‎
Lines changed: 43 additions & 0 deletions b/‎pkg/cli/validate.go‎
Lines changed: 43 additions & 0 deletions
@@ -19,11 +19,9 @@ recipe meets the Must-have requirements for Kubernetes v1.34.
 ```
 docs/conformance/cncf/
 ├── README.md
-├── collect-evidence.sh
-├── manifests/
-│   ├── dra-gpu-test.yaml
-│   ├── gang-scheduling-test.yaml
-│   └── hpa-gpu-test.yaml
+├── submission/
+│   ├── PRODUCT.yaml
+│   └── README.md
 └── evidence/
     ├── index.md
     ├── dra-support.md
@@ -34,76 +32,68 @@ docs/conformance/cncf/
     ├── robust-operator.md
     ├── pod-autoscaling.md
     └── cluster-autoscaling.md
+
+pkg/evidence/scripts/             # Evidence collection script + test manifests
+├── collect-evidence.sh
+└── manifests/
+    ├── dra-gpu-test.yaml
+    ├── gang-scheduling-test.yaml
+    └── hpa-gpu-test.yaml
 ```
 
 ## Usage
 
 Evidence collection has two steps:
 
-### Step 1: Structural Validation Evidence
+### Structural Validation (CI)
 
-`aicr validate` checks component health, CRDs, constraints, and generates
-structural evidence:
+`aicr validate` checks component health, CRDs, and constraints for CI:
 
 ```bash
-# Generate evidence during validation
-aicr validate -r recipe.yaml -s snapshot.yaml \
+# Structural validation + evidence rendering
+aicr validate -r recipe.yaml \
   --phase conformance --evidence-dir ./evidence
-
-# Or use a saved result file
-aicr validate -r recipe.yaml -s snapshot.yaml \
-  --phase conformance --evidence-dir ./evidence \
-  --result validation-result.yaml
 ```
 
-### Step 2: Behavioral Test Evidence
+### CNCF Submission Evidence
 
-`collect-evidence.sh` deploys test workloads and collects behavioral evidence
-(DRA GPU allocation, gang scheduling, HPA autoscaling, etc.) that requires
-running actual GPU workloads on the cluster:
+Add `--cncf-submission` to collect detailed behavioral evidence for CNCF AI
+Conformance submission. This deploys GPU workloads, captures command outputs,
+workload logs, nvidia-smi output, and Prometheus queries:
 
 ```bash
 # Collect all behavioral evidence
-./docs/conformance/cncf/collect-evidence.sh all
-
-# Collect evidence for a single feature
-./docs/conformance/cncf/collect-evidence.sh dra
-./docs/conformance/cncf/collect-evidence.sh gang
-./docs/conformance/cncf/collect-evidence.sh secure
-./docs/conformance/cncf/collect-evidence.sh metrics
-./docs/conformance/cncf/collect-evidence.sh gateway
-./docs/conformance/cncf/collect-evidence.sh operator
-./docs/conformance/cncf/collect-evidence.sh hpa
-./docs/conformance/cncf/collect-evidence.sh cluster-autoscaling
+aicr validate --phase conformance \
+  --evidence-dir ./evidence --cncf-submission
+
+# Collect specific features
+aicr validate --phase conformance \
+  --evidence-dir ./evidence --cncf-submission -f dra -f hpa
+```
+
+Alternatively, run the evidence collection script directly:
+```bash
+./pkg/evidence/scripts/collect-evidence.sh all
+./pkg/evidence/scripts/collect-evidence.sh dra
 ```
 
-> **Note:** The HPA test (`hpa`) deploys a GPU stress workload (nbody) and waits
-> for HPA to scale up, then verifies scale-down. This takes ~5 minutes due to
-> metric propagation through the DCGM → Prometheus → prometheus-adapter → HPA pipeline.
+> **Note:** The `--cncf-submission` flag deploys GPU workloads and takes ~15
+> minutes. The HPA test uses CUDA N-Body Simulation to stress GPUs and verifies
+> both scale-up and scale-down.
 
-### Why Two Steps?
+### Two Modes
 
-| Evidence Type | `aicr validate` | `collect-evidence.sh` |
+| | `aicr validate --phase conformance` | `--cncf-submission` |
 |---|---|---|
-| Component health (pods, CRDs) | Yes | Yes |
-| Constraint validation (K8s version, OS) | Yes | No |
-| DRA GPU allocation test | No | Yes |
-| Gang scheduling test | No | Yes |
-| Device isolation verification | No | Yes |
-| Gateway condition checks (Accepted, Programmed) | No | Yes |
-| Webhook rejection test | No | Yes |
-| HPA scale-up and scale-down with GPU load | No | Yes |
-| Prometheus query results | No | Yes |
-| Cluster autoscaling (ASG config) | No | Yes |
-
-`aicr validate` checks that components are deployed correctly. `collect-evidence.sh`
-verifies they work correctly by running actual workloads. Both are needed for
-complete conformance evidence.
-
-> **Future:** Behavioral tests are inherently long-running (e.g., HPA test deploys
-> CUDA N-Body Simulation and waits ~5 minutes for metric propagation and scaling) and are better
-> suited as a separate step than blocking `aicr validate`. A follow-up integration
-> is tracked in [#192](https://github.com/NVIDIA/aicr/issues/192).
+| **Purpose** | CI pass/fail | CNCF submission evidence |
+| **Speed** | ~3 minutes | ~15 minutes |
+| **Deploys workloads** | No | Yes |
+| **Output** | Structural evidence (pass/fail + artifacts) | Behavioral evidence (command outputs, logs, queries) |
+| **DRA GPU allocation test** | Status check only | Deploys pod, verifies GPU access |
+| **Gang scheduling test** | Component check only | Deploys PodGroup, verifies co-scheduling |
+| **HPA autoscaling** | Metrics API check | Scale-up + scale-down with GPU load |
+| **Gateway** | Status check | Condition verification (Accepted, Programmed) |
+| **Webhook test** | No | Rejection test with invalid CR |
 
 ## Evidence
 
 
@@ -47,7 +47,7 @@ ip-100-64-171-120.ec2.internal-gpu.nvidia.com-75xvv              ip-100-64-171-1
 
 Deploy a test pod that requests 1 GPU via ResourceClaim and verifies device access.
 
-**Test manifest:** `docs/conformance/cncf/manifests/dra-gpu-test.yaml`
+**Test manifest:** `pkg/evidence/scripts/manifests/dra-gpu-test.yaml`
 
 ```yaml
 ---
@@ -99,7 +99,7 @@ spec:
 
 **Apply test manifest**
 ```
-$ kubectl apply -f docs/conformance/cncf/manifests/dra-gpu-test.yaml
+$ kubectl apply -f pkg/evidence/scripts/manifests/dra-gpu-test.yaml
 namespace/dra-test created
 resourceclaim.resource.k8s.io/gpu-claim created
 pod/dra-gpu-test created
 
@@ -52,7 +52,7 @@ podgroups.scheduling.run.ai   2026-02-12T20:42:05Z
 Deploy a PodGroup with minMember=2 and two GPU pods. KAI scheduler ensures both
 pods are scheduled atomically.
 
-**Test manifest:** `docs/conformance/cncf/manifests/gang-scheduling-test.yaml`
+**Test manifest:** `pkg/evidence/scripts/manifests/gang-scheduling-test.yaml`
 
 ```yaml
 ---
@@ -149,7 +149,7 @@ spec:
 
 **Apply test manifest**
 ```
-$ kubectl apply -f docs/conformance/cncf/manifests/gang-scheduling-test.yaml
+$ kubectl apply -f pkg/evidence/scripts/manifests/gang-scheduling-test.yaml
 namespace/gang-scheduling-test created
 podgroup.scheduling.run.ai/gang-test-group created
 pod/gang-worker-0 created
 
@@ -56,7 +56,7 @@ pods/gpu_utilization
 Deploy a GPU workload running CUDA N-Body Simulation to generate sustained GPU utilization,
 then create an HPA targeting `gpu_utilization` to demonstrate autoscaling.
 
-**Test manifest:** `docs/conformance/cncf/manifests/hpa-gpu-test.yaml`
+**Test manifest:** `pkg/evidence/scripts/manifests/hpa-gpu-test.yaml`
 
 ```yaml
 ---
@@ -123,7 +123,7 @@ spec:
 
 **Apply test manifest**
 ```
-$ kubectl apply -f docs/conformance/cncf/manifests/hpa-gpu-test.yaml
+$ kubectl apply -f pkg/evidence/scripts/manifests/hpa-gpu-test.yaml
 namespace/hpa-test created
 deployment.apps/gpu-workload created
 horizontalpodautoscaler.autoscaling/gpu-workload-hpa created
 
@@ -363,6 +363,15 @@ func validateCmdFlags() []cli.Flag {
 			Name:  "evidence-dir",
 			Usage: "Write CNCF conformance evidence markdown to this directory. Requires --phase conformance.",
 		},
+		&cli.BoolFlag{
+			Name:  "cncf-submission",
+			Usage: "Collect detailed behavioral evidence for CNCF AI Conformance submission. Deploys GPU workloads, captures nvidia-smi output, Prometheus queries, and HPA scaling tests. Requires --evidence-dir. Takes ~15 minutes.",
+		},
+		&cli.StringSliceFlag{
+			Name:    "feature",
+			Aliases: []string{"f"},
+			Usage:   "Evidence feature to collect (repeatable, default: all). Use -f all to run all features (cannot be combined with other features). Only used with --cncf-submission.",
+		},
 		&cli.StringFlag{
 			Name:  "result",
 			Usage: "Use a saved validation result file as the source for evidence rendering (live validation still runs). Note: saved results do not include diagnostic artifacts captured during live runs. Requires --phase conformance and --evidence-dir.",
@@ -462,6 +471,40 @@ Use a saved result file for evidence instead of the live run:
 				return errors.New(errors.ErrCodeInvalidRequest, "--result requires --evidence-dir")
 			}
 
+			cncfSubmission := cmd.Bool("cncf-submission")
+			if cncfSubmission && evidenceDir == "" {
+				return errors.New(errors.ErrCodeInvalidRequest, "--cncf-submission requires --evidence-dir")
+			}
+			features := cmd.StringSlice("feature")
+			if len(features) > 0 && !cncfSubmission {
+				return errors.New(errors.ErrCodeInvalidRequest, "--feature requires --cncf-submission")
+			}
+
+			// When --cncf-submission is set, run behavioral evidence collection
+			// instead of structural Go checks. This deploys GPU workloads and
+			// captures detailed outputs for CNCF submission.
+			if cncfSubmission {
+				slog.Info("collecting behavioral conformance evidence",
+					"dir", evidenceDir, "features", features)
+
+				// Use a longer timeout for behavioral evidence (default 5m is too short).
+				evidenceTimeout := cmd.Duration("timeout")
+				if evidenceTimeout <= 5*time.Minute {
+					evidenceTimeout = 20 * time.Minute
+				}
+				evidenceCtx, evidenceCancel := context.WithTimeout(ctx, evidenceTimeout)
+				defer evidenceCancel()
+
+				collector := evidence.NewCollector(evidenceDir,
+					evidence.WithFeatures(features),
+				)
+				if err := collector.Run(evidenceCtx); err != nil {
+					return errors.Wrap(errors.ErrCodeInternal, "evidence collection failed", err)
+				}
+				slog.Info("conformance evidence written", "dir", evidenceDir)
+				return nil
+			}
+
 			recipeFilePath := cmd.String("recipe")
 			snapshotFilePath := cmd.String("snapshot")
 			kubeconfig := cmd.String("kubeconfig")