Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create PodTemplate before ProvisioningRequest #4086

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mbobrovskyi
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

Create PodTemplate before ProvisioningRequest.

Which issue(s) this PR fixes:

Fixes #3957

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix a bug that occurs when a PodTemplate has not been created yet, but the Cluster Autoscaler attempts to process the ProvisioningRequest and marks it as failed.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Jan 29, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mbobrovskyi
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 29, 2025
Copy link

netlify bot commented Jan 29, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 08ad38a
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/679a312a9598f40008782a5a

@mbobrovskyi mbobrovskyi force-pushed the fix/create-pod-template-before-provision-request branch from f66cc05 to 3aa0a49 Compare January 29, 2025 13:32
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 29, 2025
@mbobrovskyi mbobrovskyi force-pushed the fix/create-pod-template-before-provision-request branch from 3aa0a49 to 0c0efc0 Compare January 29, 2025 13:44
@mbobrovskyi mbobrovskyi marked this pull request as ready for review January 29, 2025 13:44
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 29, 2025
@mbobrovskyi
Copy link
Contributor Author

/cc @PBundyra

@k8s-ci-robot k8s-ci-robot requested a review from PBundyra January 29, 2025 13:45
@mbobrovskyi mbobrovskyi force-pushed the fix/create-pod-template-before-provision-request branch from 0c0efc0 to 08ad38a Compare January 29, 2025 13:46
Comment on lines +320 to +352
newPt := &corev1.PodTemplate{
ObjectMeta: metav1.ObjectMeta{
Name: ptKey.Name,
Namespace: ptKey.Namespace,
Labels: map[string]string{
constants.ManagedByKueueLabel: "true",
},
},
Template: ps.Template,
}

// apply the admission node selectors to the Template
psi, err := podset.FromAssignment(ctx, c.client, psaMap[psName], podSet.Count)
if err != nil {
return nil, err
}

err = podset.Merge(&newPt.Template.ObjectMeta, &newPt.Template.Spec, psi)
if err != nil {
return nil, err
}

// copy limits to requests if needed
workload.UseLimitsAsMissingRequestsInPod(&newPt.Template.Spec)

if err = c.client.Create(ctx, newPt); err != nil {
msg := fmt.Sprintf("Error creating PodTemplate %q: %v", newPt.Name, err)
ac.Message = api.TruncateConditionMessage(msg)
workload.SetAdmissionCheckState(&wl.Status.AdmissionChecks, *ac, c.clock)

c.record.Eventf(wl, corev1.EventTypeWarning, "FailedCreate", api.TruncateEventMessage(msg))
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make a separate function for PodTemplate creation?

Comment on lines -393 to -418
if err != nil {
// it's a not found, so create it
newPt := &corev1.PodTemplate{
ObjectMeta: metav1.ObjectMeta{
Name: ptKey.Name,
Namespace: ptKey.Namespace,
Labels: map[string]string{
constants.ManagedByKueueLabel: "true",
},
},
Template: ps.Template,
}

// apply the admission node selectors to the Template
psi, err := podset.FromAssignment(ctx, c.client, psaMap[psName], reqPS.Count)
if err != nil {
return err
}

err = podset.Merge(&newPt.Template.ObjectMeta, &newPt.Template.Spec, psi)
if err != nil {
return err
}

// copy limits to requests if needed
workload.UseLimitsAsMissingRequestsInPod(&newPt.Template.Spec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we moved this part to different place, can we name the function accordingly?

@@ -371,7 +421,7 @@ func (c *Controller) syncProvisionRequestsPodTemplates(ctx context.Context, wl *
for i := range request.Spec.PodSets {
reqPS := &request.Spec.PodSets[i]
psName, refFound := podsetRefsMap[reqPS.PodTemplateRef.Name]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need those check? Is there any risk of race condition? We are already checking that in the syncOwnedProvisionRequest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if we need to check if the PodSets are the same in Workload and ProvReq:
https://github.com/kubernetes-sigs/kueue/pull/4086/files#diff-d5b88e9a8af6b97ce61788f1307ec9ba1f4e3581a9c0634a151ce310c9ca3d91R412-R416
Since we set them a few lines above. Could you evaluate if there are any scenarios where such a need indeed occurs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ProvisioningRequest is created before its PodTemplates, what may cause Cluster Autoscaler to mark it as failed
3 participants