Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3322: add a new field maxRestartTimesOnFailure to podSpec #3339

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

kerthcet
Copy link
Member

@kerthcet kerthcet commented Jun 6, 2022

Signed-off-by: kerthcet [email protected]

  • One-line PR description: Add a new field maxRestartTimes to podSpec when running into RestartPolicyOnFailure
  • Other comments:

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 6, 2022
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jun 6, 2022
@wojtek-t wojtek-t self-assigned this Jun 6, 2022
@kerthcet kerthcet force-pushed the feat/add-maxRetries-to-podSpec branch 2 times, most recently from af1ce55 to d2e8b5d Compare June 6, 2022 11:07
@kerthcet kerthcet force-pushed the feat/add-maxRetries-to-podSpec branch from d2e8b5d to affd244 Compare June 6, 2022 14:57
@kerthcet kerthcet changed the title [WIP] Feat: add a new field maxRestartTimes to podSpec [WIP] KEP-3322: add a new field maxRestartTimes to podSpec Jun 6, 2022
@kerthcet kerthcet force-pushed the feat/add-maxRetries-to-podSpec branch 2 times, most recently from 525ea53 to c52de33 Compare June 7, 2022 05:34
@kerthcet kerthcet changed the title [WIP] KEP-3322: add a new field maxRestartTimes to podSpec KEP-3322: add a new field maxRestartTimes to podSpec Jun 7, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2022
@kerthcet
Copy link
Member Author

kerthcet commented Jun 9, 2022

cc @wojtek-t PTAL, thanks a lot.

@wojtek-t
Copy link
Member

wojtek-t commented Jun 9, 2022

cc @wojtek-t PTAL, thanks a lot.

We're generally doing PRR once you already have SIG approval.

@kerthcet
Copy link
Member Author

cc @dchen1107 for sig-node side review, also cc @hex108

@scbizu
Copy link

scbizu commented Jun 16, 2022

This KEP is helpful especially for those pods that holds a large resource set such as the JVM based pod . We give these kinds of pods a high limit threshold to speed up their startup , restart always policy will make this worse , even the node crash. In the old days , daemon control tools like supervisorctl has its startretries mechanism to limit the max startup retries , but for k8s deployments there is no replacement for it .

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is no SIG-level agreement on it - I made a quick pass and added some comments, but please ping me only once you have SIG-level approval.


Pros:
* BackoffLimitPerIndex can reuse this functionality and no longer need to consider the restart times per index.
Specifically, it can avoid to use the annotation anymore, and works at a high level control by watching the pod status.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually agree with Aldo.
It's not an implementation detail - it's a fundamental thing of "pod recreations". If we want to track something across pod recreations (which is the case for jobs), maxRestarts won't solve it on its own - but actually may help with it.

nitty-gritty.
-->

Add a new field `maxRestartTimesOnFailure` to the `pod.spec`. `maxRestartTimesOnFailure` only applies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Tim - I think that generalizing it to "Always" is natural and instead of making the API narrow, let's make it more generic.

will rollout across nodes.
-->

Because kubelet will not upgrade/downgrade until api-server ready, so this will not impact
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand it.

FWUW - this section isn't necessary for Alpha so given time bounds you may want to delete your answers from rollout and monitoring sections as their answers are controversial...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean here is when upgrading api servers, we'll wait until all the apiservers are ready, then upgrade the kubelet. So if feature gates are enabled on some apiservers, we'll do nothing. Is this reasonable? Or what we want here is all the possibilities not the best practices, since it said as paranoid as possible. cc @wojtek-t

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed for alpha

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just comment out this section.

reviewers:
- TBD
approvers:
- TBD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to find an approver. Without approver defined we unlikely can take it.

@mrunalp @derekwaynecarr @dchen1107 any of you want to take it?

```

- If Pod's `RestartPolicy` is `Always` or `Never`, `maxRestartTimesOnFailure` is default to nil, and will not apply.
- If Pod's `RestartPolicy` is `OnFailure`, `maxRestartTimesOnFailure` is also default to nil, which means infinite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two questions

  1. if Pod's RestartPolicy is OnFailure and maxRestartTimesOnFailure is 0, invalid? or means Never?
  2. is the maxRestartTimesOnFailure editable for the pod?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO:

  1. if restartPolicy is "OnFailure" and maxRestarts is 0, it is effectively "Never". I don't think we need to special-case 0 to be a failure, but I don't feel strongly and could be argued either way.
  2. let's start with "no" and see if there's really a need?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it feels like never to me too.

@thockin
Copy link
Member

thockin commented Jun 15, 2023

Deadline is in ~8 hours -- Is this still hoping to land?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: drinktee, kerthcet
Once this PR has been reviewed and has the lgtm label, please ask for approval from wojtek-t and additionally assign derekwaynecarr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


Pros:
* BackoffLimitPerIndex can reuse this functionality and no longer need to consider the restart times per index.
Specifically, it can avoid to use the annotation anymore, and works at a high level control by watching the pod status.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise we could end up in another scenario where features work independently, but when paired together are mostly useless or have very strange semantics that are hard for users to understand, and hard to maintain.

I like this analysis.

Pros:
* Reduce the maintenance cost of Job API

Cons:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's not the same pod. Job is a higher-level contruct.

Yes, it feels like a conflation of a pod-level maxRestarts and a job-level maxRecreates or something.

```

- If Pod's `RestartPolicy` is `Always` or `Never`, `maxRestartTimesOnFailure` is default to nil, and will not apply.
- If Pod's `RestartPolicy` is `OnFailure`, `maxRestartTimesOnFailure` is also default to nil, which means infinite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it feels like never to me too.

restart times for backwards compatibility.

In runtime, we'll check the sum of `RestartCount` of
all containers [`Status`](https://github.com/kubernetes/kubernetes/blob/451e1fa8bcff38b87504eebd414948e505526d01/pkg/kubelet/container/runtime.go#L306-L335)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pod.spec.containers[].maxRestarts reads well to me.

But it's a little weird because Containers also have a RestartPolicy and the only allowed value is Always. It would stop being intuitive how the attribute interactions actually work because now we're intermingling pod level attributes with container level attributes.

CRI or CNI may require updating that component before the kubelet.
-->

The kubelet version should be consistent with the api-server version, or this feature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this enhancement involve coordinating behavior in the control plane and in the kubelet? How does an n-2 kubelet without this feature available behave when this feature is used?

In other words, if we have a kubelet at 1.26 and a kube-apiserver at 1.29 with the feature enabled, what is the expected behavior?

Will any other components on the node change? For example, changes to CSI, CRI or CNI may require updating that component before the kubelet.

I believe the answer to this should be no.


When we set the restartPolicy=OnFailure and set a specific maxRestartTimesOnFailure number,
but Pod restarts times is not equal to the number.
Or we can refer to the metric `pod_exceed_restart_times_size` for comparison.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new or existing metric?

will rollout across nodes.
-->

Because kubelet will not upgrade/downgrade until api-server ready, so this will not impact
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just comment out this section.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2024
@kerthcet
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 19, 2024
@adampl
Copy link

adampl commented May 24, 2024

hey, it's 2024, any update?

/lifecycle frozen

@k8s-ci-robot
Copy link
Contributor

@adampl: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

hey, it's 2024, any update?

/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 23, 2024
@adampl
Copy link

adampl commented Jun 24, 2024

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 24, 2024
@kannon92
Copy link
Contributor

@kerthcet Are you going to work on this?

@alculquicondor
Copy link
Member

FYI on a somewhat related KEP #4603

@kannon92
Copy link
Contributor

@alculquicondor I'm trying to see if this is a KEP that sig-node should help review for 1.32.

@kannon92
Copy link
Contributor

@lauralorenz presented the plan for CrashLoopBackOff.

I think that any kind of capping max restart time is out of scope for #4603.

@kerthcet
Copy link
Member Author

Not yet, if you're interested, you can take it over, I have other works trapping me right now.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 26, 2024
@GEownt
Copy link

GEownt commented Dec 5, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 5, 2024
@tallclair
Copy link
Member

I'm not convinced that this should be a Kubelet & pod level concern, but whether or not we eventually want it on the pod it seems like a good candidate for prototyping out-of-tree. High-level idea would be a controller that watches containers for restart, and deletes the pod when a policy deems it necessary (is deletion sufficient to mark a job as failed?). API could use annotations and/or CRD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.