Skip to content

feat(ci): collect AI conformance evidence in H100 smoke test#147

Merged
mchmarny merged 5 commits intoNVIDIA:mainfrom
dims:ci/add-conformance-evidence-collection
Feb 19, 2026
Merged

feat(ci): collect AI conformance evidence in H100 smoke test#147
mchmarny merged 5 commits intoNVIDIA:mainfrom
dims:ci/add-conformance-evidence-collection

Conversation

@dims
Copy link
Collaborator

@dims dims commented Feb 19, 2026

Summary

  • Adds a new step to the H100 GPU smoke test workflow that runs tests/chainsaw/ai-conformance/main.go after inference validation
  • Checks that all expected Kubernetes resources (from kind/assert-*.yaml) exist in the cluster and prints a structured PASS/FAIL summary
  • Runs with if: always() so evidence is collected even when earlier steps fail

Test plan

  • Trigger H100 smoke test workflow manually and verify the new "Collect AI conformance evidence" step runs
  • Confirm the evidence summary appears in CI logs with per-resource PASS/FAIL/ERROR output

@dims dims requested a review from a team as a code owner February 19, 2026 02:05
@dims dims force-pushed the ci/add-conformance-evidence-collection branch from 25c46b0 to 4efaf7c Compare February 19, 2026 02:15
@dims dims requested a review from a team as a code owner February 19, 2026 02:57
dims added 4 commits February 19, 2026 07:02
Run the ai-conformance checker after inference validation to verify
that all expected Kubernetes resources exist in the Kind cluster,
providing a structured PASS/FAIL evidence summary in CI logs.

Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>
Report container images for Deployments/DaemonSets/StatefulSets,
app.kubernetes.io/version or helm.sh/chart labels for Namespaces,
and served API versions for CRDs alongside the PASS/FAIL output.

Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>
Change --dir from a single-value to a repeatable flag so the tool
can scan both kind/ (reduced, kind-specific asserts) and cluster/
(shared component asserts like cert-manager, monitoring, dynamo).
Duplicates are deduplicated by apiVersion/kind/namespace/name,
with earlier directories taking priority.

Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>
… checks

The cluster/ directory includes resources not present in Kind (driver
daemonset, MIG manager, standalone DCGM, container toolkit daemonset).
Instead of scanning the full cluster/ directory, explicitly include
only the 8 shared assert files that the chainsaw kind test references.

Adds --file flag for specifying individual assert YAML files alongside
--dir for directory scanning.

Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>
@dims dims force-pushed the ci/add-conformance-evidence-collection branch from 1df0d8e to 7184b2e Compare February 19, 2026 12:02
Copy link
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@mchmarny mchmarny merged commit 1f39bce into NVIDIA:main Feb 19, 2026
10 of 11 checks passed
@mchmarny mchmarny deleted the ci/add-conformance-evidence-collection branch February 19, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants