Skip to content

Wait for pod resize#1922

Open
hugoponthieu wants to merge 16 commits into
kube-rs:mainfrom
hugoponthieu:wait-for-pod-resize
Open

Wait for pod resize#1922
hugoponthieu wants to merge 16 commits into
kube-rs:mainfrom
hugoponthieu:wait-for-pod-resize

Conversation

@hugoponthieu
Copy link
Copy Markdown
Contributor

Motivation

Since 1.33 the resize of pods resources is available. The goal of this PRs is to provide a convenient way to wait for the operation to complete.
This is linked to #1854

Solution

I implemented:

  • is_pod_resized

More context

The documentation provide some information to get the state of the resize state. It is not mandatory to use that to know the state of the resize request.
The field provided by the kube api is: Pod.status.resize. When this string is empty it means that the resize is done as said the kube rust open api:

    /// Status of resources resize desired for pod's containers. It is empty if no resources resize is pending. Any changes to container resources will automatically set this to "Proposed" Deprecated: Resize status is moved to two pod conditions PodResizePending and PodResizeInProgress. PodResizePending will track states where the spec has been resized, but the Kubelet has not yet allocated the resources. PodResizeInProgress will track in-progress resizes, and should be present whenever allocated resources != acknowledged resources.
    pub resize: Option<std::string::String>

Signed-off-by: hugoPonthieu <hugo.ponthieu@etu.umontpellier.fr>
Signed-off-by: hugoPonthieu <hugo.ponthieu@etu.umontpellier.fr>
Signed-off-by: hugoPonthieu <hugo.ponthieu@etu.umontpellier.fr>
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.4%. Comparing base (bc318bc) to head (f34eab3).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1922     +/-   ##
=======================================
+ Coverage   76.3%   76.4%   +0.2%     
=======================================
  Files         89      89             
  Lines       8506    8555     +49     
=======================================
+ Hits        6486    6534     +48     
- Misses      2020    2021      +1     
Files with missing lines Coverage Δ
kube-runtime/src/wait.rs 96.0% <100.0%> (+1.3%) ⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: hugoPonthieu <hugo.ponthieu@etu.umontpellier.fr>
Comment thread kube-runtime/src/wait.rs Outdated

/// An await condition for `Pod` that returns `true` once an in-place resize operation has completed
///
/// A resize is considered complete when the `status.resize` field is either absent or empty.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaikt this would also rely on the restart having fully started before doing the await_condition? Otherwise we might return early if we start waiting at the same time as starting the resize (if we are unlucky).

Also, the docs for the status.resize field suggests that this field is deprecated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end i used only condition ! Are you ok with the implementation ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is definitely better, but I do agree that doxx's suggestion from the KEP below would be safer.
you should have all the data still since the Condition takes a Pod.

@clux clux added the changelog-add changelog added category for prs label Jan 29, 2026
@doxxx93
Copy link
Copy Markdown
Member

doxxx93 commented Feb 2, 2026

Thanks for working on this! I tested the implementation on a kind cluster (K8s v1.35) and found some issues.

Race Condition (as @clux mentioned)

When calling await_condition immediately after patch_resize, the condition returns true right away because PodResizePending/PodResizeInProgress conditions haven't been set yet by the kubelet.

patch_resize() → [no conditions yet] → is_pod_resized() = true (incorrect)
                 ↓
              [kubelet sets PodResizeInProgress]
                 ↓
              [resize completes, condition removed]

I verified this by running the example - it reports "resize completed" instantly without actually waiting.

Suggested Approach

Per KEP-1287, a more robust check would compare:

  • spec.containers[].resources (desired)
  • status.containerStatuses[].resources (actual)

When these match, the resize is truly complete - regardless of condition timing.

Minor Issues

  • pod_resized_complete and pod_resized_no_field tests use identical YAML
  • Example checks container_status.resources but should verify spec.containers[].resources changed

Copy link
Copy Markdown
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit more of a review. the most important point is still the condition itself, but have put some extra pointers on the example.

Comment thread examples/pod_resize_wait.rs Outdated
"spec": {
"containers": [{
"name": "app",
"image": "nginx:1.14.2",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7y old docker image. try stable-alpine

Comment thread kube-runtime/src/wait.rs Outdated

/// An await condition for `Pod` that returns `true` once an in-place resize operation has completed
///
/// A resize is considered complete when the `status.resize` field is either absent or empty.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is definitely better, but I do agree that doxx's suggestion from the KEP below would be safer.
you should have all the data still since the Condition takes a Pod.

Comment on lines +50 to +60
let pp = PostParams::default();
match pods.create(&pp, &pod).await {
Ok(created) => info!("Created pod: {}", created.name_any()),
Err(kube::Error::Api(ae)) if ae.code == 409 => {
info!("Pod already exists, deleting and recreating...");
pods.delete("resize-wait-demo", &DeleteParams::default()).await?;
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
pods.create(&pp, &pod).await?;
}
Err(e) => return Err(e.into()),
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be simplified with server side apply; pseudo code;
let p = pods.patch(pod, serverside).await?;

if the pod exists, no need to delete it, we are overwriting it to have the parameters in the json above

Comment thread examples/pod_resize_wait.rs Outdated

info!("Waiting for resize to complete...");
let resized = await_condition(pods.clone(), "resize-wait-demo", conditions::is_pod_resized());
match tokio::time::timeout(std::time::Duration::from_secs(30), resized).await {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just ? the error herer

Comment thread examples/pod_resize_wait.rs Outdated
Comment on lines +105 to +114
if let Some(status) = &pod.status {
info!("Resize status: {:?}", status.resize);
if let Some(container_status) = status.container_statuses.as_ref().and_then(|cs| cs.first()) {
info!(
"Container resources after CPU resize: {:?}",
container_status.resources
);
info!("Container restart count: {}", container_status.restart_count);
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to factor out some kind of inspector fn that dumps info of a &Pod rather than inlining it in all the match arms

@hugoponthieu hugoponthieu requested a review from clux February 7, 2026 15:47
Copy link
Copy Markdown
Member

@doxxx93 doxxx93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The spec vs status comparison approach is clean and properly handles the race condition.

One note — I tested on kind (K8s 1.33) and found that is_pod_resized() blocks indefinitely when a resize is Infeasible (e.g. exceeds node capacity), since spec and status never converge. The kubelet immediately sets PodResizePending(reason: Infeasible) and drops it from the retry queue (allocation_manager.go#L261).

client-go/kubectl don't have a dedicated resize wait helper either, so this is consistent. But from an API design perspective, I'm not sure what the ideal approach would be — if we add something like is_pod_resize_infeasible(), users would almost always need to write is_pod_resized().or(is_pod_resize_infeasible()), which feels like it should just be the default behavior.

Worth noting that kubernetes#135341 is tracking moving feasibility checks to admission, which would make patch_resize() itself return an error for infeasible cases — potentially making this a non-issue in the future. So the current timeout-based approach in the example might be good enough for now.

Comment thread kube-runtime/src/wait.rs
/// `PodResizePending`/`PodResizeInProgress` conditions could return `true` before the
/// kubelet has even started processing the resize.
///
/// See: <https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/#pod-resize-status>
Copy link
Copy Markdown
Member

@doxxx93 doxxx93 Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about adding a caveat about the Infeasible case here? If the resize is rejected by the kubelet, spec and status will never converge, so is_pod_resized() blocks indefinitely.

Suggested change
/// See: <https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/#pod-resize-status>
/// See: <https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/#pod-resize-status>
///
/// Note: if the resize is rejected as `Infeasible` (e.g. exceeds node capacity),
/// spec and status will never converge and this condition will never return `true`.
/// Always wrap with [`tokio::time::timeout`].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-add changelog added category for prs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants