KEP-5972: Dynamic Containers by tallclair · Pull Request #6169 · kubernetes/enhancements

tallclair · 2026-06-08T06:16:20Z

One-line PR description: Introducing Dynamic Containers: making containers mutable.
Issue link: Dynamic Pod Mutation: Optimistic Execution & Hierarchical Resource Delegation #5972

Adapted to KEP format from https://docs.google.com/document/d/1LUttwty4xIU9gcLGSvnu1qVqAztxT3kl53ZdpM8q7o0/edit?tab=t.0#heading=h.px3kmst2742n

/sig node
/assign @dchen1107 @haircommander
/label api-review

haircommander · 2026-06-08T17:39:50Z

+
+**Mitigation:** This change corrects a race condition/atomicity violation, ensuring Pod updates are
+treated as proper transactions (we shouldn't run a new image if we can't secure the resources for
+its accompanying spec). We will explicitly document this behavioral shift.


we should be able to catch this situation with metrics too, right? may be worth exploring to make clearer to admins when this has happened

We could, I'm not really sure it would be worth it though. Resize via pod update is a new capability so the only case we're really changing is the following:

Pod is resized (via /resize), and the resize is deferred, and THEN the container image is updated.

If the image is updated before the resize, we would expect that to go through in both cases (unless the Kubelet missed the first update and received both at the same time).

Updating container images is already a pretty niche use.

A dedicated metric feels like overkill. A log line feels more appropriate. WDYT?

haircommander · 2026-06-08T19:45:34Z

+ups) significantly increases the API surface of the mutable / allocatable portion of the Pod spec,
+and simply mirroring these fields into the pod status will drastically inflate the size of the pod
+object. Similarly, duplicating every pod into an `AllocatedPod` object would heavily inflate the API
+server object count and data size.


is there then follow-up work with sig-cli and other clients to reflect pod status differently, now that we're not writing back updates? historically users used pod spec to find the state, but it sounds like there won't be spec reflection ?

I'm not sure I understand the question.

... now that we're not writing back updates?

Are you referring to the proposal to have a local API that bypasses the KCP? If so, that is out-of-scope for this KEP.

The only way pod spec and allocated spec drift is very briefly after the pod is updated but before the Kubelet has admitted the update, or if a resize (including adding a new container with resources) is deferred (or infeasible).

haircommander · 2026-06-08T20:36:38Z

+
+**Admission Configuration:** If custom admission webhooks are configured to intercept only `CREATE`
+operations for Pods, they can be bypassed by adding a violating container via update. This risk will
+be managed with user communications, but there won't be technical safeguards preventing it.


maybe a bit meta, but we could create an admission configuration that mandates pod webhooks also run on UPDATE? (I think, I actually am not positive that's possible but it seems like it could be)

I can't think of any way to make this work. The issue is that webhooks can have side effects, so this would be a breaking change.

Or did you mean just provide something that users could optionally run to audit their clusters?

kannon92 · 2026-06-08T20:58:24Z

+authors:
+  - "@tallclair"
+owning-sig: sig-node
+participating-sigs: []


I was chating with @elmiko about this KEP offline.

Have you floated this idea with sig-scheduling or sig-autoscaling?

It seems a pretty large change to go without involving other SIGs.

I don't think this impacts much sig-scheduling since from day one, sig-scheduling treats pods as the smallest scheduling units, and they are doing the same with this proposal. Same / similar with today's autoscaling. But I do agree after the feature introduced to K8s, we can do better for autoscaling.

From other hand, I agreed that this movement has concerns on the potential security implications and API exhaustion. We did talk to both teams.

I don't think this impacts much sig-scheduling since from day one, sig-scheduling treats pods as the smallest scheduling units, and they are doing the same with this proposal.

my main scenario I want an answer on:

Say you have a node with 8 cores and someone requests a pod with 1 core.

User resizes by adding containers and adds a container that uses 8 cores (pretend we can use entire node).

Will the scheduler be notified that the pod requests don't match the actual admitted value?

And then future pod requests will correctly detect that the pod is fully occupied?

Or will we get cases where pods see 1 core and then try to place more pods on that node. Only to be rejected at kubelet admission time.

Either way I think a scenario like this is worth exploring / testing on.

Say you have a node with 8 cores and someone requests a pod with 1 core.

User resizes by adding containers and adds a container that uses 8 cores (pretend we can use entire node).

Will the scheduler be notified that the pod requests don't match the actual admitted value?

And then future pod requests will correctly detect that the pod is fully occupied?

Yes, the scheduler gets the update immediately, potentially even before the Kubelet has actuated the resize. We call the resources described by the pod spec the "desired" resources, and the pod status also includes the allocated (what the Kubelet has admitted or "agreed to"), and actual. The scheduler uses the maximum of those three different resource values for each resource. So, as soon as the pod is resized (by adding the container), it's "desired" resources change. The scheduler gets the update through it's pod informer, and updates it's cached node info to reflect the new resource total for the node. New pods that go to be scheduled will do so based on the updated total. This is exactly how it works for pod resizes today, and this KEP doesn't really change anything here.

Or will we get cases where pods see 1 core and then try to place more pods on that node. Only to be rejected at kubelet admission time.

There is a race condition, where a pod is resized at the same time as a new pod is scheduled to the node, which can result in the scheduled pod being rejected by the Kubelet. We've discussed this at length and have a few potential solutions, but empirically found that the race window is so small that we decided it isn't a priority right now. See https://docs.google.com/document/d/1VdE-yCre69q1hEFt-yxL4PBKt9qOjVtasOmN-XK12XU/edit?resourcekey=0-KJc-YvU5zheMz92uUOWm4w&tab=t.0 for the latest discussion.

/cc @natasha41575

Either way I think a scenario like this is worth exploring / testing on.

Yes, but we've already done a lot of that. We can definitely continue to do so, but I think that should all be considered outside the scope of Dynamic Containers.

We should think of Dynamic Containers as a use case for pod resize, rather than something else that impacts scheduling and autoscaling.

this KEP

This is a discussion for resize, not this KEP. This KEP doesn't propose any changes to the resize functionality.

Will there be a doc/KEP update with these findings?

There are many docs, but @tallclair linked the most relevant one above: https://docs.google.com/document/d/1VdE-yCre69q1hEFt-yxL4PBKt9qOjVtasOmN-XK12XU/edit?resourcekey=0-KJc-YvU5zheMz92uUOWm4w&tab=t.0#heading=h.clxvs733rwyx

FWIW, we discussed with the sig-scheduling leads, the sig-autoscaling leads, and the kueue maintainers prior to IPPR GA, and had broad agreement that these issues don't need to be resolved now.

Yes, the proposed changes does not impact scheduling directly, so we treat is as FYI.

There are still a few points worth to bring up

The race between kubelet and scheduler becomes more prominent, so we should prioritize addressing it as part of WAS, since IPPR is already GA and wasn't considering WAS integration as a blocker.

Obviously the change allows to bypass the scheduling, so the scheduling constraints we develop as part of WAS won't be useful when pods are resized.

We don't have that yet, but if it happens that WAS introduces runtime constraints (constraints which have to be met not only during initial scheduling, but also runtime), then the problem of making arbitrary decisions outside of scheduler may become much more problematic.

This is a discussion for resize, not this KEP. This KEP doesn't propose any changes to the resize functionality.

I guess to me the main thing I wonder is will this feature make the race condition worse?

We are accepting this race condition it seems and accepting this a problem that will be addressed in separate work?

The only reason this would make the race condition worse is that it's potentially adding more use cases for resize. This doesn't change anything to make the race more likely, it just potentially adds more resize QPS.

k8s-ci-robot · 2026-06-11T00:16:26Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tallclair
Once this PR has been reviewed and has the lgtm label, please ask for approval from dchen1107 and additionally assign wojtek-t for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

keps/prod-readiness/OWNERS
keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

HirazawaUi · 2026-06-11T15:12:37Z

+ecosystem that pods are fundamentally immutable execution envelopes. This will require updates to
+third-party controllers, service meshes, and logging tools.
+
+## Alternatives


The vision for this KEP seems to share some common ground with the new project @thockin is working on: https://github.com/agent-substrate/substrate. Could using a third-party project like substrate be considered as an alternative?

Substrate is a potential client of this proposal. I think the alternative here is scheduling tasks at the application layer, totally outside of Kubernetes. That's the approach that substrate is currently taking, and the same for Ray. Part of the idea of Dynamic Containers is offering a way to pull those tasks down to the Kubernetes layer (as containers) so that they can gain all the benefits that come with that.

The use cases seem very strongly in the application layer to me ...

I had read the majority of the use cases in the KEP as being critical to support projects like substrate, but I was trying to see if we could make the capability an opt-in behavior at the Pod level so not all other workload controllers need to reason around it similar to how we had to handle performance sensitive workloads and/or telco scenarios.

In addition, it would be ideal in my opinion if expression of this new behavior could be restricted via RBAC via a dedicated handler so the workload controllers that depend on this capability can acquire this power rather than all other workload controllers/actors that can edit a Pod today do not inherit it by surprise.

I do worry there may be unintended side effects not fully reasoned through otherwise. Both ephemeral containers and abuse of existing debug handlers (exec/attach) have caught us by surprise, so being able to limit that potential surprise is my primary concern.

dims · 2026-06-16T16:13:05Z

+  - Device attachment & initialization
+  - Generic application-level initialization (via InitContainers)
+
+Decoupling these elements from workload startup will be critical in achieving the sub-100ms latency required for interactive agentic workloads.


achieving the sub-100ms latency - for startup? all operations?

Request to starting code execution is the metric we care about. This is not yet achievable without changes at the runtime layer, but with v1.36 changes we're close to this target for container resize operations (assuming available resources).

dims · 2026-06-16T16:19:54Z

+
+#### The Agentic Sandbox (AI-Native)
+
+An "Agent" container spawns ephemeral tool-execution sandboxes on-the-fly within preallocated pods.


should be:
An "Agent" orchestrator

instead of:
An "Agent" container

Right?

I think it's a bit fuzzy. There are a few different architectural models for this, and I think what "agent" refers to is a bit fuzzy... I just added an or here.

dims · 2026-06-16T17:36:40Z

+    Kubelet will not make any further allocations once the pod has entered a terminated (or
+    terminating) phase.
+  - The pod must be in an initialized state (i.e. all init containers have completed) before any dynamic container changes can be made.
+  - This proposal does not allow for general container mutation. A corollary is that a container


can new containers with new unique names be added? if so what is our upper limit as to how many?

what happens when there aren't enough cpu/mem, do we keep the static stability of what was there before and only the new call to /dynamic fails (and pod stays Running)?

can new containers with new unique names be added?

Yes. That is part of the core proposal: add new containers with unique names, or remove existing containers, but it excludes arbitrary updates to existing containers.

if so what is our upper limit as to how many?

I didn't include one since we don't have an upper limit no create. We could use this as an opportunity to add a limit, but I don't know what a reasonable upper limit would be without benchmarking.

what happens when there aren't enough cpu/mem, do we keep the static stability of what was there before and only the new call to /dynamic fails (and pod stays Running)?

Are you asking about requested resources? If so the answer is that we treat this as an in-place resize. If the new requests can't be allocated, the changes are put into a deferred state. See the section titled `### Allocation.

If you're asking about runtime resource contention, then there aren't any changes to the existing resource management mechanisms, including OOM killing, CPU scheduling & throttling, PSI and eviction.

if so what is our upper limit as to how many?

We discussed this frequently on several topics (overcommitted node, pod density, etc. ), not limited to this one, and agreed to define a max containers per running pod at a given time. @tallclair Maybe we can initially start with 10 containers for mutable pod? No change to existing inmutable ones?

dims · 2026-06-16T17:58:01Z

+
+#### Alpha
+
+- Feature implemented behind a feature flag (`DynamicContainers`).


We expect this flag on apiserver + kubelet right? (anything else?)

Yeah, just those 2. This is covered in the KEP metadata.

dims · 2026-06-16T18:00:03Z

+
+This proposal does not introduce any new fields. The API changes are limited to two new subresources
+
+1. `/dynamic`: Enables the new update functionality, in addition to updates allowed on the main


any alternate names were considered? (for /dynamic)

Yes, lots. I don't like dynamic, but I didn't like anything else better. If you have any suggestions I'm all ears...

/active: Dangling adjective; too generic

/additions: Restrictive; ignores container removals.

/adjustments: Candidate for consideration.

/allocate: Verb (bad for GET); typo-prone next to /allocated.

/allocated: Updates are on the desired spec, not the allocated spec.

/allocatedPod: See allocated.

/allocation: See allocate

/attachments: Works for volumes, but weird for modifying existing containers.

/augment: Weird for removels.

/augmentations: See augment.

/blueprint: Abstract; lacks K8s precedent.

/budget: Implies resource bounds only; ignores structural components.

/components: Too vague (pods have immutable components too).

/composition: Abstract.

/config: Collides with ConfigMaps and component config.

/configuration: See config.

/desired: Candidate for consideration, but lacks precedent.

/dynamicAllocation

/dynamicAllocations

/dynamicConfiguration

/dynamicEnvelope: Introduces non-standard vocabulary to core K8s.

/dynamicManifest: Overloaded term.

/dynamicPod: Path redundancy (stuttering); falsely implies an API Kind.

/dynamicSandbox: Doesn't capture non-sandbox changes.

/dynamicSpec: Inaccurate; metadata is also mutable, breaking the spec boundary.

/dynamicState: Too similar to .status.

/envelope: Introduces new vocabulary to K8s.

/execution: Colides with exec

/extensions: Implies third-party plugins

/flex: Colides with flex volumes.

/fluid: Unprecedented term, sort of weird.

/fluidAllocation: see fluid.

/intent: Abstract control-theory jargon.

/live: Colides with liveness and other unrelated concepts.

/livespec: See live. Splits spec across sub-paths; inaccurate due to metadata changes.

/managed: Overloaded term.

/manifest: Overloaded term.

/modifications: Generic; lacks standard K8s API precedent.

/mutable: Candidate for consideration, but doesn't capture the difference from other mutable endpoints

/mutate: Verb; semantically broken for GET requests.

/mutations: Implies history of mutations.

/objective

/overrides: Similar issues to mutations, and implies layering.

/pending: Collides with the Pending pod phase

/proposal

/proposed

/reallocate: Verb; semantically broken for GET requests.

/reallocation

/reallocations

/reconfigure: Verb

/request(s): Collides with pod resource requests.

/revisions: Generic, similar issues to mutations

/runtime: Collides with Container Runtime Interface (CRI).

/runtimeAllocation: Collides with DRA and CRI.

/sandbox: Non-standard vocabulary for core API paths, implies pod sandbox changes.

/staged: Considered.

/target: Abstract.

/workload: Misses the storage/metadata aspect.

what about hotswap ? or hotplug or something with the hot * concept or just hot ?

my $0.02, something like /spec seems clearer than /dynamic to me as an indicator that it's allowing updating spec, even if it includes mutable metadata

Maybe /runningSpec? That distinguishes it from updates to the base resource, although it's still confusing that you can modify container images through that... I'd also consider /proposedSpec or /desiredSpec, or maybe even /dynamicSpec.

From the sig-auth discussion, there was an emphasis on wanting to solve this only once, and declaring that ALL mutations are possible in the future through this new endpoint. From that perspective, leaning into the mutability aspect on the name seems better. Maybe something like /mutable or even /fullymutable.

I'm not sure we need to go overboard in making the subresource sound scary. Since mutations would still be constrained to spec and metadata, I'd still try to include spec somewhere in the name.

/spec, /dynamicspec, and /mutablespec all seem plausible to me (we don't have any mixed case subresources today, and lowercase ephemeralcontainers and portforward)

Hoisting from another thread, if there's a field in the pod that toggles whether this is allowed, e.g. mutable: true, coordinating the subresource name with the name of that field could make sense.

For example, spec.mutable: true paired with /mutablespec, and calling update on /mutablespec on any pod without that field set to true errors.

(that leads to a meta question about whether you can mutate that field to false via an update on /mutablespec, which is fun 🤯 )

dims · 2026-06-16T18:05:25Z

+
+The API server will resume accepting additions and removals to `.spec.containers`, and the Kubelet will resume reconciling the changes.
+
+There are no storage changes in the API.


Note that @derekwaynecarr is proposing a field addition to designate pods as dynamic/mutable at creation time.

@tallclair i defer on this point to others as well, but i think as we facilitate broader and broader mutability, i think some users would still like to preserve an intent for immutability on some of these items. similar to how we have resizePolicy we may want to explore mutationPolicy and let users say yes/no to particular types of runtime mutation (containers, volumes, etc.).

It would certainly be easier to reason about a cluster if the pods which are subject to mutation this way are flagged via a field.

kannon92 · 2026-06-16T18:54:31Z

+Future extensions that enable additional pod dynamism and mutability (such as dynamic volume
+provisioning) will be added to this same subresource.
+
+#### Limitations


How are we enforcing this limitations?

I'd also expect some kind of description for this in our test plan.

These are all enforced through API validation, and covered by the pkg/apis/core/validation unit tests (already listed)

tallclair · 2026-06-16T18:57:24Z

@enj

Given my recent experience with the writable cgroups KEP, I believe a change of this magnitude requires significant open discussion in the community.

I don't really see the parallel to writable cgroups. Is the connection that these are both sig-node projects with implications for admission control?

While I am sympathetic to the rapid pace of AI innovation and I am sure that some of my own customers could benefit from the proposed changes here, I strongly believe that we should spend more time to better understand the requirements. Not every problem needs to be directly solved by core Kubernetes - some are simply better left to the workloads.

This is a much bigger discussion, but many of us are worried about Kubernetes being left behind and becoming irrelevant as the tech stack shifts if Kubernetes doesn't evolve to meet the new demands.

I am deeply involved in various aspects of the Kube project, and I only recently heard about this proposal, and thus I suspect many others have not had time to review it. Therefore, I propose that we use the v1.37 release to solicit feedback.

There seems to be some fundamental disagreement about the scope of SIG-Node and SIG-Architecture. Independent of what direction this KEP goes in, I think that's a bigger conversation that we need to have.

deads2k · 2026-06-16T19:54:27Z

There seems to be some fundamental disagreement about the scope of SIG-Node and SIG-Architecture. Independent of what direction this KEP goes in, I think that's a bigger conversation that we need to have.

The thread started in sig-architecture as a rally point we all pay attention to as a courtesy to you. The concerns of this KEP extend significantly beyond sig-node and should have been recognized as such and left open to community comment for a reasonable period.

tallclair · 2026-06-16T20:45:40Z

/assign @derekwaynecarr

liggitt · 2026-06-16T21:43:46Z

+
+This proposal does not introduce any new fields. The API changes are limited to two new subresources
+
+1. `/dynamic`: Enables the new update functionality, in addition to updates allowed on the main


my $0.02, something like /spec seems clearer than /dynamic to me as an indicator that it's allowing updating spec, even if it includes mutable metadata

liggitt · 2026-06-16T21:48:27Z

+authors:
+  - "@tallclair"
+owning-sig: sig-node
+participating-sigs: []


The impact of the direction this is proposing is really significant both within sig-node (in kubelet) and outside sig-node (potentially ~every pod-admitting policy, ~every pod-creating controller, ~every pod-observing controller). I'm surprised to not see any other sigs here.

I think folks outside sig-node (like myself) are trying to wrap their heads around:

what changes are required for non-kubelet actors, are those changes feasible, and can we avoid doing that work until we're sure sig-node is committed to this path?

do we have to ripple changes for this out to all responsibly-maintained pod-involved components, or are there ways to bound this (ideally by default) so that we only have to ripple out where it makes sense?

what happens for all the pod-involved things that never update to consider this?

For sig-node components, I suspect the effort to make kubelet behave well when ~anything about the pod can change is far bigger than currently described. We have a long tail of kubelet issues around pod lifecycle handling in weird edges (kubernetes/kubernetes#124701, kubernetes/kubernetes#116481, kubernetes/kubernetes#122161 are just a few off the top of my head), and this multiples the weird states it's possible to get into.

Is there a way to front-load the work with kubelet improvements that both make the current state better and more reliable (ideally tied to resolving long-lived lifecycle bugs) and would happen to make the kubelet handle scenarios like mutable pods better, but are not specifically tied to dynamic containers? I suspect there are probably some ~rewrites of kubelet subloops needed (for example, I'm not sure how many of it items identified in Kubelet state of the state are still outstanding)

If we merge an alpha where the pod can mutate, I expect to immediately start seeing merges of fixes targeted to compensating for weird scenarios we see in the kubelet specifically around pod mutation. Those will be difficult to back out if we decide not to proceed with the mutable pod feature.

Improving kubelet in master to correctly handle the transitions this introduces, so the ~only changes needed would be the more permissive spec-mutating subresource on the API side, would build confidence that things added under alpha weren't going to be difficult to back out if they didn't work out.

@liggitt thank you for the comments, very constructive.

The impact of the direction this is proposing is really significant both within sig-node (in kubelet) and outside sig-node (potentially ~every pod-admitting policy, ~every pod-creating controller, ~every pod-observing controller). I'm surprised to not see any other sigs here.

Agreed with you on this and Derek also made the similar suggestion on bounding the blast radius for non-kubelet actors. Tim and I completely agree and to avoid a massive ripple effect across the ecosystem, as we noted on the SIG Arch mailing list, KEP was updated completely isolate this capability behind a dedicated API subresource rather than expanding the standard Pod UPDATE validation.

I will leave @tallclair on the design details. But I believe with that change, existing controllers, webhooks, and observing tools would experience zero behavior change.

Is there a way to front-load the work with kubelet improvements that both make the current state better and more reliable (ideally tied to resolving long-lived lifecycle bugs) and would happen to make the kubelet handle scenarios like mutable pods better, but are not specifically tied to dynamic containers? I suspect there are probably some ~rewrites of kubelet subloops needed (for example, I'm not sure how many of it items identified in Kubelet state of the state are still outstanding)

I will leave @yujuhong and @SergeyKanzhelev to answer your overall concerns on pod lifecycle fragility and reliability. But just want to point out that during developing in-place pod resizing and pod level resource, some those old bugs in pod lifecycle were identified and addressed.

On another hand, I completely agree that if we were introducing dynamic volumes or mutable DRA resource claims today, the Kubelet lifecycle complexity would be overwhelming. That is exactly why we explicitly made those non-goals for this initial alpha.

In fact, compared to recent features like Sidecar Containers or in-place pod restart or EvictionRequest API, this alpha proposal is intentionally minimized. By leveraging the exact same atomic checkpointing framework established by In-Place Pod Resizing, adding or removing a container at the Kubelet layer translates directly to existing CreateContainer and KillContainer primitives inside computePodActions. @tallclair please correct me if I misunderstood your proposal.

We are intentionally starting with this tightly bounded scope precisely so we can isolate and test the foundational Kubelet state transitions cleanly, without the added complexity of dynamic storage or shifting QoS boundaries.

Thanks for your input. There's a lot to respond to here, so I'll take it one piece at a time.

The impact of the direction this is proposing is really significant both within sig-node (in kubelet) and outside sig-node (potentially ~every pod-admitting policy, ~every pod-creating controller, ~every pod-observing controller). I'm surprised to not see any other sigs here.

Yeah, after this discussion it's clear that I need to involve sig-auth. I'm just not sure what the order of operations are at this point. Does sig-auth input need to be alpha-blocking?

I think folks outside sig-node (like myself) are trying to wrap their heads around:

what changes are required for non-kubelet actors, are those changes feasible, and can we avoid doing that work until we're sure sig-node is committed to this path?

How many of those actors are weighing in on KEPs though? Isn't beginning alpha development a stronger signal to start soliciting feedback?

do we have to ripple changes for this out to all responsibly-maintained pod-involved components, or are there ways to bound this (ideally by default) so that we only have to ripple out where it makes sense?

I don't see any way around it. Either the component cares about things at the container level, in which case it will need to do work no matter where we put the dynamic containers, or how we control their availability, or the component doesn't care about containers, in which case there shouldn't be any issues with the proposal as is? Maybe I'm missing something.

what happens for all the pod-involved things that never update to consider this?

How is this different form any other pod change from this perspective? In some ways, this seems like it would be less impcting since it isn't introducing any new API fields.

For sig-node components, I suspect the effort to make kubelet behave well when ~anything about the pod can change is far bigger than currently described. We have a long tail of kubelet issues around pod lifecycle handling in weird edges (kubernetes/kubernetes#124701, kubernetes/kubernetes#116481, kubernetes/kubernetes#122161 are just a few off the top of my head), and this multiples the weird states it's possible to get into.

Notably, those are all issues around pod startup and termination, neither of which are really affected by this KEP. We should take care around termination handling, but I don't anticipate issues there. As I said elsewhere, things like new pod phases or dynamic volumes changes would be a more complex change. The container interface is already largely driven as by an imperative state reconciliation process that already handles most container changes.

Is there a way to front-load the work with kubelet improvements that both make the current state better and more reliable (ideally tied to resolving long-lived lifecycle bugs) and would happen to make the kubelet handle scenarios like mutable pods better, but are not specifically tied to dynamic containers? I suspect there are probably some ~rewrites of kubelet subloops needed (for example, I'm not sure how many of it items identified in Kubelet state of the state are still outstanding)

We did a lot of that work with in-place resize, and this proposal is building on the machinery that was already put in place for that. There is ongoing work to move more components to drawing from the AllocationManager, and it will make more sense to prioritize that work with this KEP.

If we merge an alpha where the pod can mutate, I expect to immediately start seeing merges of fixes targeted to compensating for weird scenarios we see in the kubelet specifically around pod mutation. Those will be difficult to back out if we decide not to proceed with the mutable pod feature.

Improving kubelet in master to correctly handle the transitions this introduces, so the ~only changes needed would be the more permissive spec-mutating subresource on the API side, would build confidence that things added under alpha weren't going to be difficult to back out if they didn't work out.

This sounds like a good backup plan if we can't merge this KEP, but I'm still not convinced that we can't do both in parallel.

I will leave @yujuhong and @SergeyKanzhelev to answer your overall concerns on pod lifecycle fragility and reliability.

While Kubelet still has some edge cases during state transitions, the issues mentioned above mostly relate to pod setup and teardown. Dynamic containers should have minimal impact on these phases, as container mutations are mostly bounded within the pod's running state. As Dawn pointed out, while this KEP adds complexity to the container lifecycle, previous features have already laid some groundwork and also fixed related bugs. Therefore, the overall risk to state transitions is not as extensive.

liggitt · 2026-06-16T21:58:59Z

+
+1. `/dynamic`: Enables the new update functionality, in addition to updates allowed on the main
+resource, and `/resize`.
+2. `/allocated`: Returns the allocated version of the pod, served directly from the Kubelet.


Will controllers need to maintain this across all pods? Since this is a subresource, and I guess is pulled from the kubelet, it won't be able to be listed or watched.

No. This is explicitly intended as a debug endpoint. If we find that controllers are needing to rely on it, we should build it in as a proper (sub)resource. See the section titled #### Allocated Subresource.

liggitt · 2026-06-16T22:00:03Z

+
+Dynamic container mutation is allowed only through the new `/dynamic` subresource. Update
+permission on this subresource is NOT granted to the default `edit` Role, and must be explicitly
+granted by a `cluster-admin`.


Will any pod-managing controllers need to use this (the In-Place Upgrades (Sidecars, Daemons) user story makes it seem like they will)? If so, the permission will have to be granted to the controller (e.g. the replicaset or job or daemonset controller), effectively making it available to any user who has control over one of the higher-order API types.

That's out-of-scope. I could imagine adding it as an update strategy on some pod-managing controllers (maybe DaemonSets?), but I think that would be its own KEP. The use case I was considering was a sidecar-injecting admission controller. If you want to upgrade those injected sidecars today, you need to recreate the workload (although maybe you could make something work with in-place image updates). This only applies to 3rd-party use cases though.

kannon92 · 2026-06-16T23:56:04Z

+
+### Risks and Mitigations
+
+#### Third-Party Controllers that assume containers are static


This should be updated to reflect the latest design right?

I don't think anything changed here, but I added an explicit note for admission controllers.

#6169 (comment)

I think @derekwaynecarr, @liggitt, @deads2k concerns are that we need a way to roll this out without basically tell the ecosystem to deal.

I am a hard no on this being solved with "extensive documentation and release notes". Very few people read these things and we can't expect the entire ecosystem to be aware that we are now saying pods are fully mutable.

I like the idea of a subresource but I think we need something stronger.

I see that this is a beta criteria so maybe update this section to reflect.

micahhausler · 2026-06-17T18:17:24Z

+
+## Motivation
+
+Dynamic Container mutation is the next step in an ongoing effort to transform Kubernetes into a more dynamic execution environment that is capable of meeting the evolving demands of modern AI workloads, including high-performance batch workloads leveraging special-purpose frameworks, and low-latency execution environments for agentic workloads.


Any more here on motivations? I'd be interested in seeing more on on why this is necessary

Are the user stories what you're looking for? I don't really understand why the KEP template doesn't put user stories with motivation. Also note that the hierarchical scheduling and elimination of initialization overhead are the main drivers for this (those are meant to be part of the motivation section). Can you elaborate more on what additional detail you're looking for?

You might also want to check out @dchen1107's doc: https://docs.google.com/document/d/1J8Aq0XzN8BiNdWHXSEGA1Xw2nXcZRSKTMoi-tNh7FTc/edit?pli=1&tab=t.0#heading=h.apbhfrpbxmw0

tallclair · 2026-06-17T23:15:59Z

My takeaways from the SIG-Auth discussion:

Concerns were raised over the motivation, and whether the use cases should be pushed to the application layer instead. Hopefully we can continue this discussion in SIG-Architecture.
- Questions were raised around "dynamic volumes", and whether our latency goals were realistic once volume attachment comes into play.
@deads2k: Ask to have a static off-switch/opt-out as an API server flag.
If we're going to break assumptions about pod mutability, let's only do it once:
- Declare that mutation of ANY pod field is in-scope for the new subresource. This doesn't mean that we'll allow all mutations, but it means that we might allow any mutation in the future. Policy controllers must reevaluate the full pod object on updates through the new subresource.
@enj had a suggestion I especially liked for handling legacy admission webhooks & policy: when admitting a request to the new subresource, check for all admission webhooks & policies that could be invoked for CREATE or UPDATE requests on the base pod resource, then those webhooks/policies MUST also intercept /dynamic, otherwise the request will be rejected. I'll add this to the KEP, and provide more detail there.

See SIG-Auth Meeting Agenda for detailed notes.

@deads2k @liggitt @enj @micahhausler anything you want to add?

KEP-5972: Dynamic Containers

dbd390d

k8s-ci-robot assigned dchen1107 and haircommander Jun 8, 2026

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. api-review Categorizes an issue or PR as actively needing an API review. labels Jun 8, 2026

github-project-automation Bot added this to API Reviews Jun 8, 2026

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Jun 8, 2026

k8s-ci-robot requested review from derekwaynecarr and johnbelamaric June 8, 2026 06:16

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 8, 2026

lasomethingsomething mentioned this pull request Jun 8, 2026

Dynamic Pod Mutation: Optimistic Execution & Hierarchical Resource Delegation #5972

Open

4 tasks

kannon92 reviewed Jun 8, 2026

View reviewed changes

Comment thread keps/prod-readiness/sig-node/5972.yaml Outdated

haircommander reviewed Jun 8, 2026

View reviewed changes

Comment thread keps/sig-node/5972-dynamic-containers/README.md

haircommander reviewed Jun 8, 2026

View reviewed changes

Comment thread keps/sig-node/5972-dynamic-containers/README.md

kannon92 reviewed Jun 8, 2026

View reviewed changes

Comment thread keps/sig-node/5972-dynamic-containers/README.md

kannon92 reviewed Jun 8, 2026

View reviewed changes

Comment thread keps/sig-node/5972-dynamic-containers/README.md

haircommander reviewed Jun 8, 2026

View reviewed changes

Comment thread keps/sig-node/5972-dynamic-containers/README.md

haircommander reviewed Jun 8, 2026

View reviewed changes

kannon92 reviewed Jun 8, 2026

View reviewed changes

Comment thread keps/sig-node/5972-dynamic-containers/README.md Outdated

kannon92 reviewed Jun 9, 2026

View reviewed changes

Comment thread keps/sig-node/5972-dynamic-containers/README.md

dchen1107 reviewed Jun 9, 2026

View reviewed changes

Comment thread keps/sig-node/5972-dynamic-containers/README.md Outdated

k8s-ci-robot requested a review from natasha41575 June 10, 2026 23:48

Address feedback

ce5dce1

Address feedback

7e192c7

HirazawaUi reviewed Jun 11, 2026

View reviewed changes

dims reviewed Jun 16, 2026

View reviewed changes

Comment thread keps/sig-node/5972-dynamic-containers/README.md Outdated

dims reviewed Jun 16, 2026

View reviewed changes

Comment thread keps/sig-node/5972-dynamic-containers/README.md Outdated

tallclair added 3 commits June 16, 2026 16:23

Scrub refernece to control plane bypass

2683c8a

Add explict PodSecurityAdmission coverage

0925040

Tighten use cases, goals, and beta criteria

6bdc1ea

dims reviewed Jun 16, 2026

View reviewed changes

Comment thread keps/sig-node/5972-dynamic-containers/README.md

dims reviewed Jun 16, 2026

View reviewed changes

Clarify node drain requirement for downgrade

b57b6c3

kannon92 reviewed Jun 16, 2026

View reviewed changes

Comment thread keps/sig-node/5972-dynamic-containers/README.md

tallclair added 3 commits June 16, 2026 19:08

Expand test plan

07d3fb3

Move Beta release to TBD

6fcf8d9

Regenerate TOC

476b63b

kannon92 reviewed Jun 16, 2026

View reviewed changes

Comment thread keps/prod-readiness/sig-node/5972.yaml Outdated

Fix formatting of PRR file

0fa1eb0

k8s-ci-robot assigned derekwaynecarr Jun 16, 2026

liggitt reviewed Jun 16, 2026

View reviewed changes

kannon92 reviewed Jun 16, 2026

View reviewed changes

micahhausler reviewed Jun 17, 2026

View reviewed changes

This was referenced Jun 18, 2026

[codex] docs: track dynamic containers kep pacoxu/AI-Infra#373

Merged

Track KEP-5972 Dynamic Containers follow-up updates pacoxu/AI-Infra#374

Open


		#### The Agentic Sandbox (AI-Native)

		An "Agent" container spawns ephemeral tool-execution sandboxes on-the-fly within preallocated pods.


		#### Alpha

		- Feature implemented behind a feature flag (`DynamicContainers`).


		This proposal does not introduce any new fields. The API changes are limited to two new subresources

		1. `/dynamic`: Enables the new update functionality, in addition to updates allowed on the main


		The API server will resume accepting additions and removals to `.spec.containers`, and the Kubelet will resume reconciling the changes.

		There are no storage changes in the API.


		### Risks and Mitigations

		#### Third-Party Controllers that assume containers are static


		## Motivation

		Dynamic Container mutation is the next step in an ongoing effort to transform Kubernetes into a more dynamic execution environment that is capable of meeting the evolving demands of modern AI workloads, including high-performance batch workloads leveraging special-purpose frameworks, and low-latency execution environments for agentic workloads.

Conversation

tallclair commented Jun 8, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

k8s-ci-robot commented Jun 11, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dims Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tallclair Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

dims Jun 16, 2026 •

edited

Loading

tallclair Jun 16, 2026 •

edited

Loading

derekwaynecarr Jun 16, 2026 •

edited

Loading

liggitt Jun 17, 2026 •

edited

Loading

liggitt Jun 17, 2026 •

edited

Loading