Skip to content

docs(conformance): refresh evidence from EKS v1.35 cluster#323

Merged
mchmarny merged 1 commit intoNVIDIA:mainfrom
yuanchen8911:docs/refresh-conformance-evidence
Mar 10, 2026
Merged

docs(conformance): refresh evidence from EKS v1.35 cluster#323
mchmarny merged 1 commit intoNVIDIA:mainfrom
yuanchen8911:docs/refresh-conformance-evidence

Conversation

@yuanchen8911
Copy link
Contributor

Summary

Refresh all 8 CNCF AI Conformance evidence files from a fresh deployment on EKS v1.35 with 2x p5.48xlarge GPU nodes, generated using --cncf-submission.

Motivation / Context

Evidence was last refreshed in PR #302 against K8s v1.34 with 1 GPU node. This updates to v1.35 with 2 GPU nodes and incorporates evidence script improvements from PR #322.

Fixes: N/A
Related: #322, #302

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/api, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: ____________

Implementation Notes

Evidence improvements over previous version:

  • DCGM metrics section populated (was empty due to flaky curl pod)
  • ASG details resolved (was empty due to missing nodegroup tag)
  • ELB hostnames auto-redacted
  • Robust operator evidence shows PodCliques and filtered workload pods (no stale curl-test pods)
  • DRA evidence includes note explaining post-completion pending state
  • Cluster autoscaling clarifies configuration-level evidence scope
  • Inference gateway shows CRDs via name-grep (no empty label query results)

Testing

Documentation-only change. Evidence generated and verified on live EKS cluster.

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert

Rollout notes: N/A

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

@yuanchen8911 yuanchen8911 requested a review from a team as a code owner March 10, 2026 03:58
@yuanchen8911 yuanchen8911 added documentation Improvements or additions to documentation area/docs labels Mar 10, 2026
…00 nodes

Regenerated all 8 CNCF AI Conformance evidence files from a fresh
deployment on EKS v1.35 with 2x p5.48xlarge GPU nodes using the
--cncf-submission behavioral evidence collection.

Changes:
- Kubernetes v1.34 → v1.35
- 1 GPU node → 2 GPU nodes (16 GPUs total)
- DCGM metrics section now populated (was empty)
- ASG details resolved via instance ID fallback
- ELB hostnames auto-redacted
- Robust operator shows PodCliques and filtered workload pods
- DRA evidence includes pending state explanation note
- Cluster autoscaling clarifies configuration-level evidence
- Gateway CRDs shown via name-grep (no empty label queries)
- Updated README and index to reflect v1.35 and 2-node cluster

Signed-off-by: yuanchen97@gmail.com
@yuanchen8911 yuanchen8911 force-pushed the docs/refresh-conformance-evidence branch from 4d7840a to 44b8fde Compare March 10, 2026 04:00
@yuanchen8911 yuanchen8911 requested review from dims and mchmarny March 10, 2026 04:02
@mchmarny mchmarny merged commit 156a357 into NVIDIA:main Mar 10, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docs documentation Improvements or additions to documentation size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants