Set RuntimePatch.Time field automatically during admission#3319
Set RuntimePatch.Time field automatically during admission#3319astefanutti wants to merge 1 commit intokubeflow:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Pull Request Test Coverage Report for Build 23017880989Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
There was a problem hiding this comment.
Pull request overview
Adds a mutating admission webhook path for TrainJob and wires runtime-level defaulting so spec.runtimePatches[*].time is automatically set on create and selectively updated on patch changes.
Changes:
- Introduces a
TrainJobDefaulterwebhook that calls into the selected runtime to apply defaults on create/update. - Extends the runtime/framework plugin system with
CustomDefaulterPluginand implements timestamp defaulting in the PlainML plugin. - Adds integration/e2e/unit tests to validate
RuntimePatch.Timedefaulting and update behavior.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/integration/webhooks/trainjob_test.go | Adds an integration test asserting RuntimePatch.Time is set on create. |
| test/e2e/e2e_test.go | Extends e2e coverage to check Time is set and selectively updated on patch change. |
| pkg/webhooks/trainjob_webhook_test.go | Adds unit tests for defaulter behavior across create/update scenarios. |
| pkg/webhooks/trainjob_webhook.go | Adds a mutating defaulter webhook and hooks it into webhook setup. |
| pkg/runtime/interface.go | Extends runtime.Runtime with DefaultObjects for admission-time defaulting. |
| pkg/runtime/framework/plugins/plainml/plainml_test.go | Adds unit tests for patch timestamp defaulting logic in PlainML plugin. |
| pkg/runtime/framework/plugins/plainml/plainml.go | Implements runtime patch timestamp defaulting and change-detection behavior. |
| pkg/runtime/framework/interface.go | Introduces CustomDefaulterPlugin interface. |
| pkg/runtime/framework/core/framework_test.go | Updates framework tests to account for custom defaulter plugin collection. |
| pkg/runtime/framework/core/framework.go | Collects/runs CustomDefaulterPlugins via RunCustomDefaulterPlugins. |
| pkg/runtime/core/trainingruntime.go | Implements DefaultObjects by invoking framework custom defaulters. |
| manifests/base/webhook/manifests.yaml | Adds a MutatingWebhookConfiguration for TrainJob defaulting. |
You can also share your feedback on Copilot code review. Take the survey.
| now := metav1.Now() | ||
|
|
||
| if oldObj == nil { | ||
| for i := range newObj.Spec.RuntimePatches { | ||
| newObj.Spec.RuntimePatches[i].Time = &now | ||
| } |
There was a problem hiding this comment.
Default() assigns &now to every RuntimePatch, so all entries share the same *metav1.Time pointer; prefer creating a per-patch copy (or calling metav1.Now() per assignment) to avoid surprising aliasing if any code mutates the Time value in place later.
| oldCmp, newCmp := old, *patch | ||
| oldCmp.Time, newCmp.Time = nil, nil | ||
| if equality.Semantic.DeepEqual(oldCmp, newCmp) { | ||
| patch.Time = old.Time |
There was a problem hiding this comment.
On UPDATE, if a patch is unchanged and old.Time is nil (e.g., TrainJobs created before this webhook existed), the code preserves nil and leaves Time unset; if the intent is that RuntimePatch.Time is always populated after admission, consider defaulting to "now" when old.Time is nil even for unchanged patches.
| patch.Time = old.Time | |
| if old.Time != nil { | |
| patch.Time = old.Time | |
| } else { | |
| patch.Time = &now | |
| } |
| g.Expect(gotTrainJob.Spec.RuntimePatches[0].Time.Equal(unchangedTime)).To(gomega.BeTrue()) | ||
| // Changed patch: Time must be updated (different from original). | ||
| g.Expect(gotTrainJob.Spec.RuntimePatches[1].Time).ShouldNot(gomega.BeNil()) | ||
| g.Expect(gotTrainJob.Spec.RuntimePatches[1].Time.Equal(changedTime)).To(gomega.BeFalse()) |
There was a problem hiding this comment.
The e2e assertion that the changed patch's Time is strictly different from the original can be flaky on platforms with coarse time resolution (create+update may occur in the same tick); consider ensuring the update happens after at least one clock tick (or asserting non-decreasing time plus a separate signal that defaulting ran).
| kind: MutatingWebhookConfiguration | ||
| metadata: | ||
| name: mutating-webhook-configuration | ||
| webhooks: | ||
| - admissionReviewVersions: | ||
| - v1 | ||
| clientConfig: | ||
| service: | ||
| name: webhook-service | ||
| namespace: system | ||
| path: /mutate-trainer-kubeflow-org-v1alpha1-trainjob | ||
| failurePolicy: Fail | ||
| name: defaulter.trainjob.trainer.kubeflow.org | ||
| rules: | ||
| - apiGroups: | ||
| - trainer.kubeflow.org | ||
| apiVersions: | ||
| - v1alpha1 | ||
| operations: | ||
| - CREATE | ||
| - UPDATE | ||
| resources: | ||
| - trainjobs | ||
| sideEffects: None | ||
| --- | ||
| apiVersion: admissionregistration.k8s.io/v1 |
There was a problem hiding this comment.
The MutatingWebhookConfiguration uses the placeholder service name/namespace (webhook-service/system), but the kustomize patching in this repo currently targets only ValidatingWebhookConfiguration, so this mutating webhook will likely point to a non-existent service in rendered manifests and never run.
| kind: MutatingWebhookConfiguration | |
| metadata: | |
| name: mutating-webhook-configuration | |
| webhooks: | |
| - admissionReviewVersions: | |
| - v1 | |
| clientConfig: | |
| service: | |
| name: webhook-service | |
| namespace: system | |
| path: /mutate-trainer-kubeflow-org-v1alpha1-trainjob | |
| failurePolicy: Fail | |
| name: defaulter.trainjob.trainer.kubeflow.org | |
| rules: | |
| - apiGroups: | |
| - trainer.kubeflow.org | |
| apiVersions: | |
| - v1alpha1 | |
| operations: | |
| - CREATE | |
| - UPDATE | |
| resources: | |
| - trainjobs | |
| sideEffects: None | |
| --- | |
| apiVersion: admissionregistration.k8s.io/v1 |
| @@ -1,5 +1,31 @@ | |||
| --- | |||
| apiVersion: admissionregistration.k8s.io/v1 | |||
| kind: MutatingWebhookConfiguration | |||
There was a problem hiding this comment.
Can you also update:
- Kustomize config: https://github.com/astefanutti/trainer/blob/a44a7c98b20a7b0c5fd89fc937340e291731b9fa/manifests/base/webhook/kustomizeconfig.yaml
- patch: https://github.com/astefanutti/trainer/blob/a44a7c98b20a7b0c5fd89fc937340e291731b9fa/manifests/base/webhook/patch.yaml
- https://github.com/astefanutti/trainer/blob/a44a7c98b20a7b0c5fd89fc937340e291731b9fa/manifests/base/webhook/kustomization.yaml#L6
| @@ -1,5 +1,31 @@ | |||
| --- | |||
| apiVersion: admissionregistration.k8s.io/v1 | |||
There was a problem hiding this comment.
Can you also update Helm chart please?
| if !ok { | ||
| return nil | ||
| } | ||
| return rt.DefaultObjects(ctx, oldObj, newObj) |
There was a problem hiding this comment.
Hmmm, do we really need to default manager as part of PlainML policy?
I was thinking that we always set it to the Current Time if that is not set.
I agree that in the future we can extend the Runtime Framework to support customDefaulterPlugins when plugins should inject default values, but that is not needed here.
It is like similar to how we validate Deprecated Runtimes directly in the the webhook: https://github.com/astefanutti/trainer/blob/a44a7c98b20a7b0c5fd89fc937340e291731b9fa/pkg/webhooks/clustertrainingruntime_webhook.go#L49
There was a problem hiding this comment.
I agree. I initially started by adding the logic in the webhook. The plugin interfaces are not a perfect fit for TrainJob admission. I'll revert back to it.
|
|
||
| var _ admission.Defaulter[*trainer.TrainJob] = (*TrainJobDefaulter)(nil) | ||
|
|
||
| func (d *TrainJobDefaulter) Default(ctx context.Context, newObj *trainer.TrainJob) error { |
There was a problem hiding this comment.
Maybe just this, as we discussed here: kubernetes-sigs/jobset#1159 (comment)
| func (d *TrainJobDefaulter) Default(ctx context.Context, newObj *trainer.TrainJob) error { | |
| func (d *TrainJobDefaulter) Default(ctx context.Context, trainJob *trainer.TrainJob) error { |
| oldTime := metav1.NewTime(time.Date(2025, 1, 1, 0, 0, 0, 0, time.UTC)) | ||
|
|
||
| cases := map[string]struct { | ||
| operation admissionv1.Operation |
There was a problem hiding this comment.
Do you really need this?
You can just check if oldObj == nil, set operation == Update
What this PR does / why we need it:
This PR adds a mutating webhook so the
timefields in runtime patches are automatically set.Checklist: