Skip to content

Commit 0972e98

Browse files
committed
feat(validator): add Kubeflow Trainer support to robust-controller check
The robust-controller conformance check previously only validated the Dynamo operator, causing it to skip on all training clusters. This adds Kubeflow Trainer as an alternative target, selected based on recipe component presence: - dynamo-platform in recipe → validate Dynamo operator - kubeflow-trainer in recipe → validate Kubeflow Trainer - neither → skip Kubeflow Trainer validation checks: 1. Controller deployment running (kubeflow-trainer-controller-manager) 2. Validating webhook operational with reachable endpoint 3. TrainJob CRD exists (trainjobs.trainer.kubeflow.org) 4. Webhook rejects invalid TrainJob (behavioral test) Refactored the original Dynamo validation into checkRobustDynamo() and renamed validateWebhookRejects to validateDynamoWebhookRejects for clarity.
1 parent f1c915b commit 0972e98

File tree

3 files changed

+447
-40
lines changed

3 files changed

+447
-40
lines changed

recipes/validators/catalog.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ validators:
122122
env: []
123123
- name: robust-controller
124124
phase: conformance
125-
description: "Verify Dynamo operator controller and webhooks"
125+
description: "Verify AI operator controller and webhooks (Dynamo or Kubeflow Trainer)"
126126
image: ghcr.io/nvidia/aicr-validators/conformance:latest
127127
timeout: 5m
128128
args: ["robust-controller"]

0 commit comments

Comments
 (0)