Skip to content

In-place Pod resizing support#12613

Merged
scholzj merged 3 commits intostrimzi:mainfrom
scholzj:in-place-pod-resizing
Apr 14, 2026
Merged

In-place Pod resizing support#12613
scholzj merged 3 commits intostrimzi:mainfrom
scholzj:in-place-pod-resizing

Conversation

@scholzj
Copy link
Copy Markdown
Member

@scholzj scholzj commented Apr 8, 2026

Type of change

  • Enhancement / new feature

Description

This PR implements the proposal SEP-131 and adds support for in-place pod resizing. It is based on StrimziPodSets, so it is supported in Kafka, Connect, and MM2 nodes. It is not supported in Bridge or any of the support deployments (UO, TO, CC, ...). As described in the proposal, this is an opt-in feature which users have to activate.

Checklist

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md

@scholzj scholzj added this to the 1.0.0 milestone Apr 8, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 86.36364% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.06%. Comparing base (5b74ce8) to head (d918fa1).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...perator/cluster/model/InPlacePodResizingUtils.java 87.87% 0 Missing and 8 partials ⚠️
...io/strimzi/operator/cluster/model/PodRevision.java 75.86% 3 Missing and 4 partials ⚠️
...ter/operator/assembly/StrimziPodSetController.java 84.21% 0 Missing and 3 partials ⚠️
...io/strimzi/operator/cluster/model/PodSetUtils.java 83.33% 1 Missing and 1 partial ⚠️
...perator/cluster/operator/resource/KafkaRoller.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##               main   #12613    +/-   ##
==========================================
  Coverage     75.05%   75.06%            
- Complexity     6449     6500    +51     
==========================================
  Files           375      376     +1     
  Lines         24937    25057   +120     
  Branches       3214     3261    +47     
==========================================
+ Hits          18716    18808    +92     
- Misses         4896     4906    +10     
- Partials       1325     1343    +18     
Files with missing lines Coverage Δ
...o/strimzi/operator/cluster/model/KafkaCluster.java 93.10% <100.00%> (+0.05%) ⬆️
...zi/operator/cluster/model/KafkaConnectCluster.java 94.46% <100.00%> (+0.12%) ⬆️
.../strimzi/operator/cluster/model/RestartReason.java 100.00% <100.00%> (ø)
.../strimzi/operator/cluster/model/WorkloadUtils.java 100.00% <100.00%> (ø)
...ter/operator/assembly/AbstractConnectOperator.java 87.30% <100.00%> (ø)
.../cluster/operator/assembly/KafkaConnectRoller.java 100.00% <100.00%> (ø)
...tor/cluster/operator/assembly/ReconcilerUtils.java 80.66% <100.00%> (-0.45%) ⬇️
...n/java/io/strimzi/operator/common/Annotations.java 28.57% <ø> (ø)
...perator/cluster/operator/resource/KafkaRoller.java 76.86% <0.00%> (ø)
...io/strimzi/operator/cluster/model/PodSetUtils.java 71.42% <83.33%> (+8.92%) ⬆️
... and 3 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@scholzj
Copy link
Copy Markdown
Member Author

scholzj commented Apr 8, 2026

/gha run pipeline=regression

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

⏳ System test verification started: link

The following 6 job(s) will be executed:

  • regression-brokers-and-security-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operators-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operands-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-brokers-and-security-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operators-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operands-arm64 (oracle-vm-8cpu-32gb-arm64)

Tests will start after successful build completion.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

🎉 System test verification passed: link

@scholzj
Copy link
Copy Markdown
Member Author

scholzj commented Apr 9, 2026

/gha run pipeline=regression

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

⏳ System test verification started: link

The following 6 job(s) will be executed:

  • regression-brokers-and-security-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operators-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operands-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-brokers-and-security-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operators-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operands-arm64 (oracle-vm-8cpu-32gb-arm64)

Tests will start after successful build completion.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

🎉 System test verification passed: link

@scholzj
Copy link
Copy Markdown
Member Author

scholzj commented Apr 9, 2026

/gha run pipeline=regression kubeVersion=1.30

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

⏳ System test verification started: link

The following 6 job(s) will be executed:

  • regression-brokers-and-security-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operators-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operands-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-brokers-and-security-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operators-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operands-arm64 (oracle-vm-8cpu-32gb-arm64)

Tests will start after successful build completion.

@scholzj
Copy link
Copy Markdown
Member Author

scholzj commented Apr 9, 2026

/gha run pipeline=regression kubeVersion=1.30.14

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

⏳ System test verification started: link

The following 6 job(s) will be executed:

  • regression-brokers-and-security-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operators-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operands-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-brokers-and-security-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operators-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operands-arm64 (oracle-vm-8cpu-32gb-arm64)

Tests will start after successful build completion.

@scholzj
Copy link
Copy Markdown
Member Author

scholzj commented Apr 9, 2026

/gha run pipeline=regression kubeVersion=oldest

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

⏳ System test verification started: link

The following 6 job(s) will be executed:

  • regression-brokers-and-security-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operators-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operands-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-brokers-and-security-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operators-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operands-arm64 (oracle-vm-8cpu-32gb-arm64)

Tests will start after successful build completion.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

🎉 System test verification passed: link

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

❌ System test verification failed: link

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

🎉 System test verification passed: link

Signed-off-by: Jakub Scholz <www@scholzj.com>
@scholzj scholzj force-pushed the in-place-pod-resizing branch from edddc2c to bcbbf77 Compare April 9, 2026 19:03
@scholzj scholzj marked this pull request as ready for review April 9, 2026 19:04
@scholzj scholzj added this to Roadmap Apr 9, 2026
@scholzj scholzj moved this to 1.0.0 (Planned April 2026) in Roadmap Apr 9, 2026
@scholzj
Copy link
Copy Markdown
Member Author

scholzj commented Apr 9, 2026

/gha run pipeline=regression kubeVersion=oldest

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

⏳ System test verification started: link

The following 6 job(s) will be executed:

  • regression-brokers-and-security-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operators-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operands-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-brokers-and-security-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operators-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operands-arm64 (oracle-vm-8cpu-32gb-arm64)

Tests will start after successful build completion.

@github-actions
Copy link
Copy Markdown

❌ System test verification failed: link

@scholzj
Copy link
Copy Markdown
Member Author

scholzj commented Apr 10, 2026

Note: The ST failures are not related to this PR (see https://cloud-native.slack.com/archives/C018247K8T0/p1775859930375999). In any case, the ST added for this feature was correctly skipped.

@github-actions
Copy link
Copy Markdown

❌ System test verification failed: link

Copy link
Copy Markdown
Contributor

@PaulRMellor PaulRMellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Doc updates look good.
A few minor suggestions

Comment thread documentation/modules/con-common-configuration-properties.adoc Outdated
Comment thread documentation/modules/con-common-configuration-properties.adoc Outdated
Comment thread documentation/modules/con-common-configuration-properties.adoc Outdated
Comment thread documentation/modules/con-common-configuration-properties.adoc Outdated
Comment thread documentation/modules/con-common-configuration-properties.adoc Outdated
Comment thread documentation/modules/con-common-configuration-properties.adoc Outdated
Comment thread documentation/modules/con-common-configuration-properties.adoc Outdated
Comment thread documentation/modules/con-common-configuration-properties.adoc Outdated
Comment thread documentation/modules/con-common-configuration-properties.adoc Outdated
Comment thread documentation/shared/attributes.adoc Outdated
Copy link
Copy Markdown
Member

@im-konge im-konge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. I have just one nit and one question.


// Create dedicated controller and broker KafkaNodePools and Kafka CR
KubeResourceManager.get().createResourceWithWait(
KafkaNodePoolTemplates.mixedPoolPersistentStorage(testStorage.getNamespaceName(), testStorage.getMixedPoolName(), testStorage.getClusterName(), 3)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question - would it matter if we try it in separate mode (separate roles) and mixed mode? Or it doesn't matter? Should we then add some tests for KafkaConnect etc.?

And thanks for writing the ST :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the logic is the same everywhere, and we have unit tests to check that it is plugged. So I do not think we really care about the different roles or Connect that much. Maybe a Connect test might be marginally useful. I will open an issue for tracking and get to it later (I guess nobody ever did help-wanted issues STs). But I do not think we need to have this in this PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another useful test might be the deferred waiting ... but TBH, that is much harder to emulate in a stable way. So not sure that is worth it not because of what it does but the complexity and stability issues. You would need to:

  • Deploy some dummy blocker application with resource comsumption blocking the required amount of your Kubernetes resources
  • Resize the Kafka nodes to use more resources then available but less than the total capacity
  • Check it waits
  • Delete the blocker application
  • Check the resizing completes

But with the Kubernetes nodes having different sizes etc., this is IMHO hard to automate in a way that works everywhere.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I do not think we need to have this in this PR.

Yeah I agree it's not crucial for this PR, this ST is enough, thanks for that. It was more like an idea what can we write for the feature to have it covered.

But with the Kubernetes nodes having different sizes etc., this is IMHO hard to automate in a way that works everywhere.

Yes, we had the same issue with the Drain Cleaner tests with provisioning the Pods on different AZs on AWS and other platforms (as it worked differently on each platform). So I guess this is a good manual test. Thanks

scholzj and others added 2 commits April 13, 2026 16:35
Co-authored-by: PaulRMellor <47596553+PaulRMellor@users.noreply.github.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
Signed-off-by: Jakub Scholz <www@scholzj.com>
@scholzj
Copy link
Copy Markdown
Member Author

scholzj commented Apr 13, 2026

/gha run pipeline=regression

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 13, 2026

⏳ System test verification started: link

The following 6 job(s) will be executed:

  • regression-brokers-and-security-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operators-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operands-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-brokers-and-security-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operators-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operands-arm64 (oracle-vm-8cpu-32gb-arm64)

Tests will start after successful build completion.

@github-actions
Copy link
Copy Markdown

❌ System test verification failed: link

@github-actions
Copy link
Copy Markdown

🎉 System test verification passed: link

@scholzj scholzj merged commit cc00feb into strimzi:main Apr 14, 2026
32 checks passed
@scholzj scholzj deleted the in-place-pod-resizing branch April 14, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 1.0.0 (Planned April 2026)

Development

Successfully merging this pull request may close these issues.

5 participants