-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add e2e tests for manageJobsWithoutQueueName #4112
Add e2e tests for manageJobsWithoutQueueName #4112
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
/assign @mimowo PTAL! Thanks! 😊 |
49bd6c4
to
7b42d2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, the new testing suite will be great to make sure the configuration works, which is really tricky and took lots of manual testing for now.
My main hesitation is about the way we run the new suite, and as pointed out in the comment I would be leaning towards a dedicated CI job. However, let me collect more feedback. I'm ok with the (2.) or (3.) approaches (from the comment) as interim.
otoh we may want to test more e2e configurations in the future. So I have an idea to mix the strategies and have a new suite customconfigs, where we could test custom configurations and restart kueue between them. i believe in practice for which configuration we will have only a handful of tests which are specific to them, and the main body of tests will be still in singlecluster suite. So this way we would avoid creating a ci job per config, while also offload the main run. wdyt? For now I would suggest in this PR to put the e2e tests under e2e/customconfig/managedwithoutqueuebame. wdyt? |
util.ExpectObjectToBeDeleted(ctx, k8sClient, defaultRf, true) | ||
}) | ||
|
||
ginkgo.It("should suspend it", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we join it with "should unsuspend it"? Looks like logic the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean merging both test cases (ginkgo.It
) so that the should unsuspend it
test logic becomes part of the should suspend it
test case?
If we need to create custom configurations with the ability to run different setups, I suggest moving the creation and deletion logic of Kueue to the Go part instead of running it on the bash script. We can start Kueue in BeforeSuite and delete it in AfterSuite, allowing each test suite to run with the required configuration. |
can you elaborate a bit why? it might be a good idea but Im not sure how this relates to the PR. Also would it affect the singlecluster and multicluster suites? Also is this an alternative to the 1- 3 ideas or within one of them? EDIT: iiuc moving the kubectl apply before suite introduces overhead, so i would prefer to avoid it for the main singlecluster suite which is often used againt an already running cluster with kueue. |
+1 |
413bf14
to
d6c294b
Compare
Yes, this is similar to what I had in mind (although better put and explained 😆 ): use this suite and technique to test custom configs.
sgtm! What do you think on adding the word job to the testfile so that the reader can also see what kind is being tested? Or maybe name the testfile as the config under test?. e.g: |
@mimowo @mbobrovskyi @PBundyra Thanks so much for your reviews! I answered most of your comments. PTAL! |
I'm good with including the job word, but for a slightly different reason, but to correlate the config with the config field name:
Alternative name which is good to me: |
47cbea6
to
8647a49
Compare
/retest |
8647a49
to
6c84b56
Compare
Yes, I got the same error happening intermittently in my machine. After some research I noticed that the cluster queue gets eventually deleted, but sometimes it happens after the test timeout. I wanted to make sure that this happens also in CI (and it does 😅 ) So I've taken your suggestion and used |
/test all |
2 similar comments
/test all |
/test all |
I've rerun all the tests three times in a row to confirm that the flaky tests are gone 👍🏻 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mbobrovskyi @mszadkow would you like to give it another pass?
ginkgo.AfterEach(func() { | ||
gomega.Expect(util.DeleteAllJobsInNamespace(ctx, k8sClient, ns)).Should(gomega.Succeed()) | ||
// Force remove workloads to be sure that cluster queue can be removed. | ||
gomega.Expect(util.DeleteWorkloadsInNamespace(ctx, k8sClient, ns)).Should(gomega.Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to remove Pods
, similar to what we do in util.DeleteNamespace
, to avoid waiting for LongTimeout
on ClusterQueue
and ResourceFlavor
. I'm wondering if we can merge BeforeEach and AfterEach into the Describe block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried that (merging the BeforeEach
and AfterEach
into the Describe
block and reducing the waiting) but unfortunately the timeout error appears again :(
6c84b56
to
64df89c
Compare
64df89c
to
dfd1536
Compare
dfd1536
to
beef1f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
This looks great IMO. Please follow up with extracting the tests to a separate CI Job.
There should be 2 PRs:
- first in test-infra to introduce the new CI job and running
test-e2e-customconfigs
- second in Kueue to stop running
test-e2e-customconfigs
as part ofrun-test-e2e
Then, I would like to yet see test cases for Deployment, StatefulSet and LWS showing how Kueue handles the config.
LGTM label has been added. Git tree hash: 59e446220ad6b6516bce84584ac5021ac7d78353
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kaisoz, mimowo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This PR adds e2e tests for the
manageJobsWithoutQueueName: true
configuration. This is one of the tasks required to close #3767Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
The test suite sets
manageJobsWithoutQueueName: true
during the setup phase of the description block (just before running the test specs). It then initiates a rollout of the controller deployment to ensure that the new configuration is picked up by the pods. To confirm that the deployment rollout has started, the setup function waits for the Deployment to have the conditionDeploymentCondition=false
.To ensure that this condition is met, the rollout strategy is set to
Recreate
, which guarantees that the pods are terminated before being recreated, resulting in the condition not being satisfied (see DeploymentConditionTypes docs).Does this PR introduce a user-facing change?