KEP-3953: Node Resource Hot Plug #3955

Karthik-K-N · 2023-04-17T08:24:44Z

One-line PR description: Node Resource Hot Plug

Issue link: Node Resource Hot Plug #3953

Other comments:

bart0sh · 2023-04-17T19:49:59Z

/assign @mrunalp @SergeyKanzhelev @klueska

kad · 2023-04-28T16:15:22Z

/cc

ffromani · 2023-05-18T07:29:48Z

/cc

keps/sig-node/3953-dynamic-node-resize/README.md

fmuyassarov · 2023-05-25T11:59:23Z

/cc

keps/sig-node/3953-dynamic-node-resize/README.md

keps/sig-node/3953-node-resource-hot-plug/README.md

Karthik-K-N · 2025-02-12T09:55:12Z

Thanks @ffromani , @bart0sh for the inputs, I have updated the KEP with more details. Please take a look when time permits.

I think the most important open point is the hardware model, as we already seeing very clearly the limit of the cadvisor hardware model.

As you might be aware there is an ongoing efforts to fetch the machine info via CRI and in future I think it should not be much efforts to adopt.

Co-authored-by: kishen-v <[email protected]>

ffromani · 2025-02-12T10:07:35Z

Thanks @ffromani , @bart0sh for the inputs, I have updated the KEP with more details. Please take a look when time permits.

I think the most important open point is the hardware model, as we already seeing very clearly the limit of the cadvisor hardware model.

As you might be aware there is an ongoing efforts to fetch the machine info via CRI and in future I think it should not be much efforts to adopt.

Sure, I can't recall which hardware representation model (if we agreed already) is gonna be used in that case

keps/sig-node/3953-node-resource-hot-plug/README.md

iholder101

Thank you @Karthik-K-N!
Let me know if you have any questions regarding the NodeSwap KEP.

keps/sig-node/3953-node-resource-hot-plug/README.md

iholder101 · 2025-02-12T11:49:38Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+The kubelet will periodically fetch this information, subsequently entrusting the node status updater to disseminate these updates at the node level across the cluster.
+Moreover, this KEP aims to refine the initialization and reinitialization processes of resource managers, including the memory manager and CPU manager, to ensure their adaptability to changes in node configurations.
+
+### User Stories


Currently, if a node is hotplugged with additional memory, do swap limits get updated? IIUC the answer is no, and this can perhaps serve as a justification for why this approach is needed.

More generally, IIUC restarting kubelet is a workaround in the sense that kubelet doesn't expect a situation where it would spawn on a node with already running containers that need to be modified based on node's resources being changed. Please keep me honest here as I'm not 100% sure that's the case.

keps/sig-node/3953-node-resource-hot-plug/README.md

iholder101 · 2025-02-12T12:40:02Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+The kubelet will periodically fetch this information, subsequently entrusting the node status updater to disseminate these updates at the node level across the cluster.
+Moreover, this KEP aims to refine the initialization and reinitialization processes of resource managers, including the memory manager and CPU manager, to ensure their adaptability to changes in node configurations.
+
+### User Stories


Another thought: is it dangerous to hotplug the node without restarting kubelet?
If it is, then ensuring the kubelet would restart can be considered as a stability enhancement.

keps/sig-node/3953-node-resource-hot-plug/README.md

Co-authored-by: kishen-v <[email protected]>

keps/sig-node/3953-node-resource-hot-plug/README.md

deads2k · 2025-02-12T21:00:23Z

Thanks for the PRR update. PRR lgtm for alpha.

Separate from PRR, some kind of reaction when resources are removed seems far better than doing nothing. Even if it's to indicate a problem in node status.

approving PRR.

/approve

k8s-ci-robot · 2025-02-12T21:00:37Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: deads2k, Karthik-K-N
Once this PR has been reviewed and has the lgtm label, please ask for approval from mrunalp. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [deads2k]
keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

keps/sig-node/3953-node-resource-hot-plug/kep.yaml

SergeyKanzhelev · 2025-02-12T21:25:35Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+* Enable the re-initialization of resource managers (CPU manager, memory manager) to accommodate alterations in the node's resource allocation.
+* Recalculating the OOMScoreAdj and swap memory limit for existing pods.


would those two result in a new CRI API calls? Which ones? How we will order these calls when we have many Pods and how we will react on failing calls?

We decided to use UpdateContainerResources CRI method for updating both OOMScoreAdj and Swap, Initial plan is to serially update the containers across Pods.
Incase of errors to have retry mechanism like SynchPod.

I would appreciate a bit more details here. If resize failed on a pod, does it mean that we need to fail it? Or notify user somehow? Do we report the size change back to scheduler BEFORE all Pods confirmed the resize or AFTER? What would be implications if scheduler want to use extra resources BEFORE all Pods were actually resized?

Sure definitely, We will update the KEP with more information.

Co-authored-by: kishen-v <[email protected]>

pbetkier · 2025-02-13T09:31:02Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+
+* Dynamically adjust system reserved and kube reserved values.
+* Hot unplug of node resources.
+* Update the autoscaler to utilize resource hot plugging.


Components should not depend on hotplugging being the implementation of Node resize, see my comment in Motivation. In particular IMO Cluster Autoscaler should watch Node changes and observe allocatable changing, that's all it needs to know to make its decisions AFAIK. Same goes for Scheduler?

pbetkier · 2025-02-13T09:31:11Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+- Handling resource demand with a limited set of nodes by increasing the capacity of existing nodes instead of creating new nodes.
+- Creating new nodes takes more time compared to increasing the capacity of existing nodes.
+
+### Goals


Components should not depend on hotplugging being the implementation of Node resize, see my comment in Motivation.

pbetkier · 2025-02-13T09:37:05Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+For now, we will introduce an error mode in the kubelet to inform users about the shrink in the available resources in case of hotunplug.
+
+Few of the concerns surrounding hotunplug are listed below
+* Pod re-admission:


To avoid terminating Pods we could perform Node scale-down in 3 phases:

Block admission on the Node

Perform scale-down (hotunplug)

Unblock admission on the Node

These phases could be performed either manually by an operator or automatically with calls from some control plane (which is out of scope for KEP).

Noted, But we may need to evaluate resource availability for already running pods, hence running podReAdmission is necessary. Ref: https://docs.google.com/document/d/1KfjPRmCc8Xk0xxa4S8ZRle6VMzc1C6MQg4ivOaoB150/edit?disco=AAAAt7P2DTA

This implies that the scale-down is something kubernetes knows about before it happens, right? How does it come to learn about an imminent scale event?

And even then, we still need to define what happens when you get a surprise.

FYI, eviction is a topic in itself, especially being able to assess what would be an impact on affected workflows and how to migrate pods with as little impact as possible. But all of it is well described in KEP-4563 (#4565).

pbetkier · 2025-02-13T09:51:02Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+
+### Non-Goals
+
+* Dynamically adjust system reserved and kube reserved values.


Why can we afford not adjusting reserved resources? If we increase Node's CPU/memory significantly, we allow to run way more Pods and reinitialize existing components to detect more resources.

Some background about have it in non-goal: https://docs.google.com/document/d/1KfjPRmCc8Xk0xxa4S8ZRle6VMzc1C6MQg4ivOaoB150/edit?disco=AAABVzcLYu4

I think it's in-scope to be very clear that these values CAN CHANGE (so consumers know), but maybe out-of-scope on exactly if/how kubelet allows changing them on the fly?

With our exploration we identified that the ratio can vary across different providers or flavors of Kubernetes offerings. For example in GKE its calculated as https://cloud.google.com/kubernetes-engine/docs/concepts/plan-node-sizes

Is it suggested standardize this formula and may be override them later by the downstream controllers?

Kind of agree that theses values are set and calculated by individual vendors/clouds today. It's not so meaningful to include some standard formula and adjust them automatically.

But we can discuss how someone can change this on-the-fly if needed to, and maybe provide some knobs to do so.

But we can discuss how someone can change this on-the-fly if needed to, and maybe provide some knobs to do so.

Sure, I think this can be considered as an extension to this KEP and will add it in the Future Work.

pbetkier · 2025-02-13T10:15:34Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+In a conventional Kubernetes environment, the cluster resources might necessitate modification because of inaccurate resource allocation during cluster initialization or escalating workload over time, 
+necessitating supplementary resources within the cluster.
+
+Contemporarily, kernel capabilities enable the dynamic addition of CPUs and memory to a node (for example: https://docs.kernel.org/core-api/cpu_hotplug.html and https://docs.kernel.org/core-api/memory-hotplug.html).


We tried Node resizing with hot plug/unplug in GKE and noticed some components don't handle onlining of previously offlined CPUs well, e.g. Cilium not listening to perf events on the newly onlined CPUs. We also found other libraries that could be affected. We later switched to cgroups limiting on the guest + VMM-level throtling as a more reliable alternative.

I treat Node resizing as still a research space so I want to make sure there's an API for telling Kubelet what should be the current resources, as opposed to it discovering it on its own. It's what we discussed in the past with a new API for NRI plugins. This way hotplugging could be the default implementation, but replaceable with other solutions.

Agreed, May be once we have some concrete resource API for kubelet , We can consider switching over to an option of swapping alternatives in the future if required without disrupting the overall system.

e.g. Cilium not listening to perf events on the newly onlined CPUs

Not blaming Cilium here, it's easy to assume something never changes when it has never actually changed. Even though we all KNOW that cpu hotplug is a thing. I do think we should apply pressure to these components to do proper fixes.

pbetkier · 2025-02-13T10:28:34Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+```
+
+The interaction sequence is as follows
+1. Kubelet will be polling in interval to fetch the machine resource information from cAdvisor's cache, Which is currently updated every 5 minutes.


Note that with an external signal about the current VM size coming from NRI Kubelet could maybe react within ~1s as opposed to the current 5 minutes of cadvisor cache refresh (see my comment in Motivation).

True, but cAdvisor is currently using default poll interval of 5 min, which can be customized through update_machine_info_interval for aggressive polling. On the similar lines, if needed we can customize poling interval in kubelet through a flag.

We should define an interface and assert a desired SLO. If that means polling more frequently or watching a different kernel mechanism, that goes BEHIND the interface

Sure, we will explore and update as per your suggestion #3955 (comment)

Co-authored-by: kishen-v <[email protected]>

thockin

I looked at this from a high level, but given the timeframe it seems unlikely to make theis KEP freeze -- how bad is that?

thockin · 2025-02-13T19:22:28Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+In a conventional Kubernetes environment, the cluster resources might necessitate modification because of inaccurate resource allocation during cluster initialization or escalating workload over time, 
+necessitating supplementary resources within the cluster.
+
+Contemporarily, kernel capabilities enable the dynamic addition of CPUs and memory to a node (for example: https://docs.kernel.org/core-api/cpu_hotplug.html and https://docs.kernel.org/core-api/memory-hotplug.html).


e.g. Cilium not listening to perf events on the newly onlined CPUs

Not blaming Cilium here, it's easy to assume something never changes when it has never actually changed. Even though we all KNOW that cpu hotplug is a thing. I do think we should apply pressure to these components to do proper fixes.

thockin · 2025-02-13T19:31:49Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+
+### Non-Goals
+
+* Dynamically adjust system reserved and kube reserved values.


I think it's in-scope to be very clear that these values CAN CHANGE (so consumers know), but maybe out-of-scope on exactly if/how kubelet allows changing them on the fly?

thockin · 2025-02-13T19:35:57Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+### Non-Goals
+
+* Dynamically adjust system reserved and kube reserved values.
+* Hot unplug of node resources.


Should it be removed from non-goals, then?

thockin · 2025-02-13T19:42:00Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+
+* Dynamically adjust system reserved and kube reserved values.
+* Hot unplug of node resources.
+* Update the autoscaler to utilize resource hot plugging.


"Depend on" and "take advantage of" are different.

Should VPA know whether a node is vertically scalable? If it knew that, might it make better decisions? What SLOs would be needed to make it useful?

Or maybe that's just too complicated? I'm assuming that some smart, multi-dmensional scaling HAS to emerge, so it can decide between adding replicas or scaling-up pods (or some mix thereof). With IPPR it can really only look at the node capacity to make that decision. If as node could express "I have 12 cores, but I could have up to 32 with an 86% chance of success", would that be useful?

thockin · 2025-02-13T19:45:13Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+
+## Proposal
+
+This KEP strives to enable node resource hot plugging by incorporating a polling mechanism within the kubelet to retrieve machine-information from cAdvisor's cache, which is already updated periodically.


From a design POV, why is polling the way to go? With the luxury of distance, I would think something like this would be simpler:

Define an interface which calls a callback function when resources change.
Implement said interface in terms of cadvisor (which might poll underneath).
Consume those callback in kubelet (possibly through a queue and a periodic handler).

Sure thank you for the idea. So inclined to polling in Kubelet as it was existing design in cAdvisor.
We will surely explore this path as it can be managed better for rate limiting if needed.

thockin · 2025-02-13T19:52:46Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+As a Cluster administrator, I want to resize a Kubernetes node dynamically, so that I can quickly hot plug resources without waiting for new nodes to join the cluster.
+


I like this topic.

Why NOT restart kubelet? It's certainly going to be less complicated, technically.

Is it too slow? Is thsat something we should fix anyway? Having kubelet restarts be reliably fast and safe is a win.

Would that only be a partial solution (e.g. kubelet plugins have the same problem)?

etc.

I am not saying "do it with a restart" but "convince me that a restart is bad or insufficient"

thockin · 2025-02-13T19:54:29Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+- Lack of coordination about change in resource availability across kubelet/runtime/plugins.
+  - The plugins/runtime should be updated to react to change in resource information on the node.
+
+- Kubelet missing hotplug event or too many hotplug events


We should always be level triggered. Events are great, but frequent re-validation is where it's at.

Agreed, Intention was to mention kubelet failing to react/ignoring to any hot plug instances. So will rephrase accordingly to mention its level triggered.

thockin · 2025-02-13T19:57:24Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+
+### Notes/Constraints/Caveats (Optional)
+
+### Risks and Mitigations


I would add the risk that bringing new NAMED resources (as opposed to aggregate resources) on-line can trip up naive consumers.

Anything that spawns per-CPU things (threads, workpools, etc)

Anything that is aware of relationships (e.g NUMA)

Hot-plugging other devices (e.g. GPUs) might trip up applications or libraries (e.g. is CUDA/NCCL ready for this?)

thockin · 2025-02-13T19:59:01Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+```
+
+The interaction sequence is as follows
+1. Kubelet will be polling in interval to fetch the machine resource information from cAdvisor's cache, Which is currently updated every 5 minutes.


We should define an interface and assert a desired SLO. If that means polling more frequently or watching a different kernel mechanism, that goes BEHIND the interface

keps/sig-node/3953-node-resource-hot-plug/README.md

johnbelamaric · 2025-02-13T20:24:09Z

/cc

haosdent · 2025-02-14T03:53:02Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+### Goals
+
+* Achieve seamless node capacity expansion through hot plugging resources.
+* Enable the re-initialization of resource managers (CPU manager, memory manager) and kube runtime manager to accommodate alterations in the node's resource allocation.


If some existing pods have been pinned to the old remaining CPU set, would the re-init update them after new CPU cores appear?

No, hotplug only supplements additional CPUs rather than re-assigning new set of reuired CPUs.

haosdent · 2025-02-14T03:59:07Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+   - https://github.com/kubernetes/kubernetes/issues/125579
+   - https://github.com/kubernetes/kubernetes/issues/127793
+
+Hence, it is necessary to handle the updates in the compute capacity in a graceful fashion across the cluster, than adopting to reset the cluster components to achieve the same.


Suggested change

Hence, it is necessary to handle the updates in the compute capacity in a graceful fashion across the cluster, than adopting to reset the cluster components to achieve the same.

Hence, it is necessary to handle capacity updates gracefully across the cluster, rather than resetting the cluster components to achieve the same outcome.

haosdent · 2025-02-14T03:59:33Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+This KEP strives to enable node resource hot plugging by incorporating a polling mechanism within the kubelet to retrieve machine-information from cAdvisor's cache, which is already updated periodically.
+The kubelet will periodically fetch this information, subsequently entrusting the node status updater to disseminate these updates at the node level across the cluster.
+Moreover, this KEP aims to refine the initialization and reinitialization processes of resource managers, including the memory manager and CPU manager, to ensure their adaptability to changes in node configurations.
+With this proposal its also necessary to recalculate and update OOMScoreAdj and swap limit for the pods that had been existing before resize. But this carries small overhead due to recalculation of swap and OOMScoreAdj.


Suggested change

With this proposal its also necessary to recalculate and update OOMScoreAdj and swap limit for the pods that had been existing before resize. But this carries small overhead due to recalculation of swap and OOMScoreAdj.

With this proposal its also necessary to recalculate and update OOMScoreAdj and swap limit for the pods that had been existing before resize. But this carries a small overhead due to recalculation of swap and OOMScoreAdj.

haosdent · 2025-02-14T03:59:58Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+* Node resource information before and after resource hot plug for the following scenarios.
+  * upsize -> downsize
+  * upsize -> downsize -> upsize
+  * downsize- > upsize


Suggested change

* downsize- > upsize

* downsize -> upsize

haosdent · 2025-02-14T04:01:38Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+    * Since the total capacity of the node has changed, values associated with the nodes memory capacity must be recomputed.
+* Handling unplug of reserved CPUs.
+
+we intend to propose a separate KEP dedicated to hotunplug of resources to address the same.


Suggested change

we intend to propose a separate KEP dedicated to hotunplug of resources to address the same.

We intend to propose a separate KEP dedicated to hotunplug of resources to address that.

haosdent · 2025-02-14T04:05:00Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+In case of rollout failures, running workloads are not affected, If the pods are on pending state they remain
+in the pending state only.


Suggested change

In case of rollout failures, running workloads are not affected, If the pods are on pending state they remain

in the pending state only.

In case of rollout failures, running workloads are not affected. If pods are in the pending state, they remain pending.

haosdent · 2025-02-14T04:08:18Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+In a conventional Kubernetes environment, the cluster resources might necessitate modification because of inaccurate resource allocation during cluster initialization or escalating workload over time, 
+necessitating supplementary resources within the cluster.


Suggested change

In a conventional Kubernetes environment, the cluster resources might necessitate modification because of inaccurate resource allocation during cluster initialization or escalating workload over time,

necessitating supplementary resources within the cluster.

In a conventional Kubernetes environment, cluster resources might need modification because of inaccurate resource allocation or due to escalating workloads over time, requiring supplementary resources within the cluster.

haosdent · 2025-02-14T04:33:18Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+<!--
+Even if applying deprecation policies, they may still surprise some users.
+-->
+No


Would we add a flag to control the poll duration about kubelet reading new machine info from cAdvisor

Flag, possibly not. Did you mean "field within the kubelet configuration file?"

(using command line arguments to kubelet is mostly deprecated in favor of using configuration fields)

As suggested in comment , we have decided to move away from poll mechanism to level triggered mechanism, Where we continuously watch for resize changes from cAdvisor and handle them accordingly.

sanposhiho

Coming from the ping on SIG-Scheduling slack and take a glance at it.
Overall I don't see any mention of what changes are needed for which part of the scheduler for this proposal. Do you mean we don't need any change on the scheduler because node's status is updated and the scheduler watches it? (If Yes, can you update the KEP to clarify that point?)

~~Also, it only mention "With increase in cluster resources" scenario? Would it be impossible to decrease the resources?~~
EDIT: looks like that's out of scope of this KEP.
https://kubernetes.slack.com/archives/C09TP78DV/p1739505791133219?thread_ts=1739425555.326269&cid=C09TP78DV

sftim · 2025-02-14T10:28:07Z

When we graduate to beta, and on Linux, we could consider a test along the lines of:

start or wipe a server. Offline half the CPUs
Configure the kubelet and define some static Pods
start the kubelet
SIGSTOP the kubelet
wait a little while
online some CPUs
wake up / continue the kubelet
observe that the kubelet notices the new CPUs within a timespan, and notifies its API server
optionally, online some more CPUs

Co-authored-by: kishen-v <[email protected]>

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Apr 17, 2023

k8s-ci-robot requested review from dchen1107 and derekwaynecarr April 17, 2023 08:24

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 17, 2023

Karthik-K-N mentioned this pull request Apr 17, 2023

Node Resource Hot Plug #3953

Open

4 tasks

Karthik-K-N force-pushed the node-resize branch from a7bc843 to 03e927f Compare April 17, 2023 08:27

Karthik-K-N changed the title ~~Dynamic node resize~~ KEP-3953: Dynamic node resize Apr 17, 2023

k8s-ci-robot assigned klueska, mrunalp and SergeyKanzhelev Apr 17, 2023

k8s-ci-robot requested a review from kad April 28, 2023 16:15

Karthik-K-N force-pushed the node-resize branch from 03e927f to 0c0214a Compare May 17, 2023 12:49

k8s-ci-robot requested a review from ffromani May 18, 2023 07:29

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 22, 2023

pacoxu reviewed May 23, 2023

View reviewed changes

keps/sig-node/3953-dynamic-node-resize/README.md Outdated Show resolved Hide resolved

pacoxu reviewed May 23, 2023

View reviewed changes

keps/sig-node/3953-dynamic-node-resize/README.md Outdated Show resolved Hide resolved

Karthik-K-N force-pushed the node-resize branch from 0c0214a to d98d71a Compare May 23, 2023 12:54

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 23, 2023

k8s-ci-robot requested a review from fmuyassarov May 25, 2023 11:59

Karthik-K-N force-pushed the node-resize branch from d98d71a to f6aadcc Compare July 25, 2023 14:36

44past4 mentioned this pull request Sep 22, 2023

Support node allocatable and capacity managed by external controller kubernetes/kubernetes#120833

Closed

sftim reviewed Sep 23, 2023

View reviewed changes

keps/sig-node/3953-dynamic-node-resize/README.md Outdated Show resolved Hide resolved

bart0sh reviewed Feb 11, 2025

View reviewed changes

keps/sig-node/3953-node-resource-hot-plug/README.md Outdated Show resolved Hide resolved

Address review comments

a7b59f4

Co-authored-by: kishen-v <[email protected]>

Karthik-K-N force-pushed the node-resize branch from 92d631d to a7b59f4 Compare February 12, 2025 10:04

bart0sh reviewed Feb 12, 2025

View reviewed changes

keps/sig-node/3953-node-resource-hot-plug/README.md Show resolved Hide resolved

sftim reviewed Feb 12, 2025

View reviewed changes

keps/sig-node/3953-node-resource-hot-plug/README.md Show resolved Hide resolved

sftim reviewed Feb 12, 2025

View reviewed changes

keps/sig-node/3953-node-resource-hot-plug/README.md Show resolved Hide resolved

sftim reviewed Feb 12, 2025

View reviewed changes

keps/sig-node/3953-node-resource-hot-plug/README.md Outdated Show resolved Hide resolved

sftim reviewed Feb 12, 2025

View reviewed changes

keps/sig-node/3953-node-resource-hot-plug/README.md Outdated Show resolved Hide resolved

iholder101 reviewed Feb 12, 2025

View reviewed changes

Address review comments

fb6ecd6

Co-authored-by: kishen-v <[email protected]>

marquiz reviewed Feb 12, 2025

View reviewed changes

keps/sig-node/3953-node-resource-hot-plug/README.md Outdated Show resolved Hide resolved

keps/sig-node/3953-node-resource-hot-plug/README.md Outdated Show resolved Hide resolved

keps/sig-node/3953-node-resource-hot-plug/README.md Outdated Show resolved Hide resolved

SergeyKanzhelev reviewed Feb 12, 2025

View reviewed changes

keps/sig-node/3953-node-resource-hot-plug/kep.yaml Show resolved Hide resolved

SergeyKanzhelev reviewed Feb 12, 2025

View reviewed changes

Address review comments

c62c7d4

Co-authored-by: kishen-v <[email protected]>

pbetkier reviewed Feb 13, 2025

View reviewed changes

Update HotUnplug Scenario

7a0d3d0

Co-authored-by: kishen-v <[email protected]>

thockin reviewed Feb 13, 2025

View reviewed changes

k8s-ci-robot requested a review from johnbelamaric February 13, 2025 20:24

haosdent reviewed Feb 14, 2025

View reviewed changes

sanposhiho reviewed Feb 14, 2025

View reviewed changes

Karthik-K-N mentioned this pull request Feb 28, 2025

Add crictl update --oom-score-adj flag kubernetes-sigs/cri-tools#1781

Merged

Karthik-K-N mentioned this pull request Mar 13, 2025

[WIP][DNM] Add support for updating oom_score_adj opencontainers/runc#4669

Draft

Address review comments

7015575

Co-authored-by: kishen-v <[email protected]>

		* Enable the re-initialization of resource managers (CPU manager, memory manager) to accommodate alterations in the node's resource allocation.
		* Recalculating the OOMScoreAdj and swap memory limit for existing pods.


		### Non-Goals

		* Dynamically adjust system reserved and kube reserved values.


		## Proposal

		This KEP strives to enable node resource hot plugging by incorporating a polling mechanism within the kubelet to retrieve machine-information from cAdvisor's cache, which is already updated periodically.

		As a Cluster administrator, I want to resize a Kubernetes node dynamically, so that I can quickly hot plug resources without waiting for new nodes to join the cluster.


		### Notes/Constraints/Caveats (Optional)

		### Risks and Mitigations

	Hence, it is necessary to handle the updates in the compute capacity in a graceful fashion across the cluster, than adopting to reset the cluster components to achieve the same.
	Hence, it is necessary to handle capacity updates gracefully across the cluster, rather than resetting the cluster components to achieve the same outcome.

	With this proposal its also necessary to recalculate and update OOMScoreAdj and swap limit for the pods that had been existing before resize. But this carries small overhead due to recalculation of swap and OOMScoreAdj.
	With this proposal its also necessary to recalculate and update OOMScoreAdj and swap limit for the pods that had been existing before resize. But this carries a small overhead due to recalculation of swap and OOMScoreAdj.

	we intend to propose a separate KEP dedicated to hotunplug of resources to address the same.
	We intend to propose a separate KEP dedicated to hotunplug of resources to address that.

		In case of rollout failures, running workloads are not affected, If the pods are on pending state they remain
		in the pending state only.

		In a conventional Kubernetes environment, the cluster resources might necessitate modification because of inaccurate resource allocation during cluster initialization or escalating workload over time,
		necessitating supplementary resources within the cluster.

KEP-3953: Node Resource Hot Plug #3955

Are you sure you want to change the base?

KEP-3953: Node Resource Hot Plug #3955

Conversation

Karthik-K-N commented Apr 17, 2023 • edited Loading

bart0sh commented Apr 17, 2023

kad commented Apr 28, 2023

ffromani commented May 18, 2023

fmuyassarov commented May 25, 2023

Karthik-K-N commented Feb 12, 2025

ffromani commented Feb 12, 2025

iholder101 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deads2k commented Feb 12, 2025

k8s-ci-robot commented Feb 12, 2025

SergeyKanzhelev Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pbetkier Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

pbetkier Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Karthik-K-N Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pbetkier Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Karthik-K-N Feb 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnbelamaric commented Feb 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanposhiho left a comment • edited Loading

Choose a reason for hiding this comment

sftim commented Feb 14, 2025 • edited Loading

Karthik-K-N commented Apr 17, 2023 •

edited

Loading

SergeyKanzhelev Feb 12, 2025 •

edited

Loading

pbetkier Feb 13, 2025 •

edited

Loading

pbetkier Feb 13, 2025 •

edited

Loading

Karthik-K-N Feb 18, 2025 •

edited

Loading

pbetkier Feb 13, 2025 •

edited

Loading

Karthik-K-N Feb 14, 2025 •

edited

Loading

sanposhiho left a comment •

edited

Loading

sftim commented Feb 14, 2025 •

edited

Loading