Skip to content

Conversation

@maxcao13
Copy link
Member

@maxcao13 maxcao13 commented Apr 29, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds a lot of e2e tests to validate the InPlaceOrRecreate feature. I've tried to separate feature gated tests from regular tests.

I've added this check checkInPlaceOrRecreateTestsEnabled(f, true, true) to make sure that

  • InPlaceOrRecreate feature gate is enabled for the necessary components, otherwise skip
  • InPlacePodVerticalScaling feature gate is enabled on cluster
    • This can be dropped after we do a release that bumps to k8s 1.33, i'm keeping it for now since we are still on 1.32

Still note quite sure how we want to handle VPA feature gated tests here. As of now, e2e tests will always run with InPlaceOrRecreate=true and all tests will run, but maybe we want to have a separate prow config which just exercises feature gated tests using special ginkgo test tags? We can ignore that for now in order to get this feature out the door, but maybe it's something to think about.

Which issue(s) this PR fixes:

Part of AEP-4016 [InPlaceOrRecreate]
This PR is part of the larger feature PR in #7673

Depends on: #7962
After it's merged I'll rebase.

Note that this PR is merging into the in-place-updates feature branch, which will be merged when this feature is all reviewed and ready.

Special notes for your reviewer:

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

[AEP-4016]: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support

@k8s-ci-robot
Copy link
Contributor

@maxcao13: The label(s) kind/test cannot be applied, because the repository doesn't have them.

In response to this:

What type of PR is this?

/kind test

What this PR does / why we need it:

Adds a lot of e2e tests to validate the InPlaceOrRecreate feature. I've tried to separate feature gated tests from regular tests.

I've added this check checkInPlaceOrRecreateTestsEnabled(f, true, true) to make sure that

  • InPlaceOrRecreate feature gate is enabled for the necessary components, otherwise skip
  • InPlacePodVerticalScaling feature gate is enabled on cluster
    • This can be dropped after we do a release that bumps to k8s 1.33, i'm keeping it for now since we are still on 1.32

Still note quite sure how we want to handle VPA feature gated tests here. As of now, e2e tests will always run with InPlaceOrRecreate=true and all tests will run, but maybe we want to have a separate prow config which just exercises feature gated tests using special ginkgo test tags? We can ignore that for now in order to get this feature out the door, but maybe it's something to think about.

Which issue(s) this PR fixes:

Part of AEP-4016 [InPlaceOrRecreate]
This PR is part of the larger feature PR in #7673

Depends on: #7962
After it's merged I'll rebase.

Note that this PR is merging into the in-place-updates feature branch, which will be merged when this feature is all reviewed and ready.

Special notes for your reviewer:

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

[AEP-4016]: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 29, 2025
@k8s-ci-robot k8s-ci-robot requested a review from omerap12 April 29, 2025 19:18
@k8s-ci-robot k8s-ci-robot requested a review from voelzmo April 29, 2025 19:18
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 29, 2025
@maxcao13
Copy link
Member Author

/hold depends on #7962

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 29, 2025
@omerap12
Copy link
Member

/assign @raywainman for another pair of eyes

@k8s-ci-robot
Copy link
Contributor

@omerap12: GitHub didn't allow me to assign the following users: another, pair, of, eyes, for.

Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @raywainman for another pair of eyes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@raywainman
Copy link
Member

@maxcao13 do you mind rebasing once the other PR is merged?

@maxcao13 maxcao13 force-pushed the in-place-updates-e2e branch from baa02c9 to 1680325 Compare May 5, 2025 21:04
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 5, 2025
@maxcao13
Copy link
Member Author

maxcao13 commented May 5, 2025

/unhold
unholding since the updater PR merged and I've rebased.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 5, 2025

// WaitForPodsUpdatedWithoutEviction waits for pods to be updated without any evictions taking place over the polling
// interval.
func WaitForPodsUpdatedWithoutEviction(f *framework.Framework, initialPods *apiv1.PodList) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is kinda weird but I guess it's working... wondering if we can verify that a container has been updated in-place rather than just iterating on the names.

Copy link
Member Author

@maxcao13 maxcao13 May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, I've thought about trying to track status.ContainerRestarts but doesn't hold up since we can't guarantee the container won't restart.

Maybe it's also possible to track the resize condition to go from nil -> PodResizeInProgress -> nil and not confirm it doesn't to Deferred or Infeasible or something, but that's probably not the best idea if the conditions change super fast.

Maybe someone else has better ideas, or maybe even someone from sig-node knows.

Copy link
Member

@omerap12 omerap12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Great work, Max - really appreciated! I've added a few small comments.

@omerap12
Copy link
Member

omerap12 commented May 6, 2025

I guess the e2e testing is taking longer now, right? I'm not sure how we could implement separate Prow job configurations specifically for feature-gated tests. We should definitely think about that, but it's not urgent right now.

@maxcao13
Copy link
Member Author

maxcao13 commented May 7, 2025

I guess the e2e testing is taking longer now, right? I'm not sure how we could implement separate Prow job configurations specifically for feature-gated tests. We should definitely think about that, but it's not urgent right now.

Yeah, it's taking a bit longer now, which probably isn't great since it'll slow all PRs down after this entire thing is merged to main.

I think it might be good to at least prepare for that here. I think we can use label filters to mark suites with Label("FG:InPlaceOrRecreate) using ginkgo.label-filter, so we can filter only feature-gated tests. My next commit will be this, I'd love to know what people think.

I've tried it locally and it works. You just have to specify --ginkgo.label-filter="FG:InPlaceOrRecreate" in your go test arguments.

EDIT: Although this doesn't account for the fact that after the feature gate is GA'd, we wouldn't be running the tests separately anymore. So either we are okay with the full_vpa tests being at most double in length, or we somehow find a way to make the tests more time efficient, or we have to start parallelizing some e2e tests somehow.

@maxcao13 maxcao13 force-pushed the in-place-updates-e2e branch 2 times, most recently from 843639e to 1a19a4d Compare May 7, 2025 03:29
@omerap12
Copy link
Member

omerap12 commented May 7, 2025

I guess the e2e testing is taking longer now, right? I'm not sure how we could implement separate Prow job configurations specifically for feature-gated tests. We should definitely think about that, but it's not urgent right now.

Yeah, it's taking a bit longer now, which probably isn't great since it'll slow all PRs down after this entire thing is merged to main.

I think it might be good to at least prepare for that here. I think we can use label filters to mark suites with Label("FG:InPlaceOrRecreate) using ginkgo.label-filter, so we can filter only feature-gated tests. My next commit will be this, I'd love to know what people think.

I've tried it locally and it works. You just have to specify --ginkgo.label-filter="FG:InPlaceOrRecreate" in your go test arguments.

I'm thinking about how we can integrate this into the CI process.
@adrianmoisey DevOps.

@maxcao13
Copy link
Member Author

maxcao13 commented May 8, 2025

Note to reviewers: The e2e tests I added aren't actually running right now. They require the FEATURE_GATES env var to be enabled in the tests. Not sure how we want to go about this.

maxcao13 added 2 commits May 8, 2025 13:33
This commit refactors the VPA e2e test ginkgo wrappers so that they we can easily supply ginkgo decorators.
This allows us to add ginkgo v2 labels to suites so that later we can run tests that only run FG tests.
For now, this would only be useful for FG:InPlaceOrRecreate

Signed-off-by: Max Cao <[email protected]>
@maxcao13 maxcao13 force-pushed the in-place-updates-e2e branch from 1a19a4d to c4b561d Compare May 8, 2025 20:34
@raywainman
Copy link
Member

/lgtm

/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 8, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maxcao13, raywainman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 8, 2025
@k8s-ci-robot k8s-ci-robot merged commit 97a67b7 into kubernetes:in-place-updates May 8, 2025
6 of 7 checks passed
@maxcao13 maxcao13 deleted the in-place-updates-e2e branch May 8, 2025 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants