Updated timeouts and moved test Fleet-51 to special #422

sbulage · 2025-11-25T10:33:29Z

In recent days, we have seen that test Fleet-51 is failing multiple times on CI.

https://github.com/rancher/fleet-e2e/actions/runs/19657675609/job/56297791570#step:10:565
Screenshot showing actual test failures

Problem:

What should happen:

Fleet-51 test case does:
- create a new workspace in cluster
- move one of the cluster to the newly created workspace
- create GitRepo in the newly created workspace and verify the resources
- move back the cluster to it's default workspace.
- Delete newly create workspace and GitRepo.

What is happening:

Currently Fleet-51 test:
- Creating new workspace
- Trying to move one of cluster to newly created workspace but it took more time and test fails with timeout.
- Moved cluster stays in newly created workspace.
Currently Fleet-156 is also failing due to
- Not waiting enough to load page content

Solution:

Increase timeout
Move test case to special tests specs
Run tests at very end of all tests to no other Test Failures
For test Fleet-156, add wait to load page properly for further testing.

sbulage · 2025-11-25T11:11:08Z

Running CI status:

🟢 2.13-head: https://github.com/rancher/fleet-e2e/actions/runs/19666527204/job/56325741196#step:10:375
🟢 2.12-head: https://github.com/rancher/fleet-e2e/actions/runs/19667448472/job/56327842626#step:10:346 (First Try @p1_2 tag used.)

sbulage · 2025-11-25T14:00:56Z

Second Try with All tests (using all tags)

2.12-head: https://github.com/rancher/fleet-e2e/actions/runs/19668193775/job/56330239603#step:10:790 <-- Only 2 tests failed (hiccup)
- Failed tests: 126 and 125 (hiccup)

mmartin24 · 2025-11-25T14:06:09Z

tests/cypress/e2e/unit_tests/special_fleet_tests.spec.ts

        // Verify is gitrepoJobsCleanup is enabled by default.
        cy.accesMenuSelection('local', 'Workloads', 'CronJobs');
+        // Adding wait for CronJobs page to load correctly.
+        cy.wait(5000);


5 seconds is needed?
Doesn't it display with 2 more only?
Same for the examples below

I tried varying seconds as low as I can but I find optimal 5 seconds to load page correctly. But still there is room for better logic as cy.wait() is not final solution.

mmartin24 · 2025-11-25T14:09:05Z

I get test 51 is being moved, but it is fixed?

sbulage · 2025-11-25T14:13:49Z

I get test 51 is being moved, but it is fixed?

Yes, you can see the 2 retries of 2.12-head CI run proves it. Also, I mentioned the whole test execution and where it is failing.

mmartin24 · 2025-11-25T14:22:29Z

I get test 51 is being moved, but it is fixed?

Yes, you can see the 2 retries of 2.12-head CI run proves it. Also, I mentioned the whole test execution and where it is failing.

The test passes when tried solely the special_tests sets. We have had in the past this test passing when the p1_2 was run alone. That is not proof itself. Full runs are more accurate as they carry more processes not being settled while executing the full run, but still this is not explanation of what was wrong.

Feel free to merge and we will see, but I am not sure I undertand what caused it. It is ok if we don't and we try simply not to have other tests affected by encapsulating it somewhere else. But my question remains: is it fixed? and if so, what caused it?

sbulage · 2025-11-25T15:09:45Z

I get test 51 is being moved, but it is fixed?

Yes, you can see the 2 retries of 2.12-head CI run proves it. Also, I mentioned the whole test execution and where it is failing.

The test passes when tried solely the special_tests sets. We have had in the past this test passing when the p1_2 was run alone. That is not proof itself. Full runs are more accurate as they carry more processes not being settled while executing the full run, but still this is not explanation of what was wrong.

Yes, I totally get that full run will ensure more reliability of failed tests are passing. Link of full test run: Updated timeouts and moved test Fleet-51 to special #422 (comment)
Regarding why Fleet-51 failure cause:
- When cluster (imported-0) moved to newly created workspace it take around 30sec of time (before increase in timeout).
- After 30 seconds timeout over, cluster is still not fully moved to newly created workspace.
- Next tests step executed i.e. GitRepo creation, which shows resource count (6/6) in the UI but bundle count was not loaded.
- And the test is failing. (see below screenshot).
- In the screenshot, we can clearly see bundle count wasn't shown there but we see resource count.
  
  Screenshot showing resource count but not bundle count
What is fixed for test Fleet-51:
- Increasing timeout from 30000 to 40000/60000 to 70000, gives extra time for cluster to be settle properly.
- When next step executed i.e. creation of GitRepo, it will show proper bundle count and resource count on the screen.

Feel free to merge and we will see, but I am not sure I undertand what caused it. It is ok if we don't and we try simply not to have other tests affected by encapsulating it somewhere else. But my question remains: is it fixed? and if so, what caused it?

Please let me know if you require any other details. Happy to explain 😄

mmartin24 · 2025-11-25T15:19:39Z

I get test 51 is being moved, but it is fixed?

Yes, you can see the 2 retries of 2.12-head CI run proves it. Also, I mentioned the whole test execution and where it is failing.

The test passes when tried solely the special_tests sets. We have had in the past this test passing when the p1_2 was run alone. That is not proof itself. Full runs are more accurate as they carry more processes not being settled while executing the full run, but still this is not explanation of what was wrong.

Yes, I totally get that full run will ensure more reliability of failed tests are passing. Link of full test run: Updated timeouts and moved test Fleet-51 to special #422 (comment)

Regarding why Fleet-51 failure cause:

When cluster (imported-0) moved to newly created workspace it take around 30sec of time (before increase in timeout).

After 30 seconds timeout over, cluster is still not fully moved to newly created workspace.

Next tests step executed i.e. GitRepo creation, which shows resource count (6/6) in the UI but bundle count was not loaded.

And the test is failing. (see below screenshot).

In the screenshot, we can clearly see bundle count wasn't shown there but we see resource count.
Screenshot showing resource count but not bundle count

What is fixed for test Fleet-51:

Increasing timeout from 30000 to 40000/60000 to 70000, gives extra time for cluster to be settle properly.

When next step executed i.e. creation of GitRepo, it will show proper bundle count and resource count on the screen.

Feel free to merge and we will see, but I am not sure I undertand what caused it. It is ok if we don't and we try simply not to have other tests affected by encapsulating it somewhere else. But my question remains: is it fixed? and if so, what caused it?

Please let me know if you require any other details. Happy to explain 😄

Feel free to merge it.
2.11 lane works well with 30000. We are increasing to 40000. In 2.12 60000 was already high value. Increasing to 70000 it is just keep increasing but not sure I get why this is.

sbulage · 2025-11-25T15:31:53Z

I get test 51 is being moved, but it is fixed?

Yes, you can see the 2 retries of 2.12-head CI run proves it. Also, I mentioned the whole test execution and where it is failing.

The test passes when tried solely the special_tests sets. We have had in the past this test passing when the p1_2 was run alone. That is not proof itself. Full runs are more accurate as they carry more processes not being settled while executing the full run, but still this is not explanation of what was wrong.

Yes, I totally get that full run will ensure more reliability of failed tests are passing. Link of full test run: Updated timeouts and moved test Fleet-51 to special #422 (comment)

Regarding why Fleet-51 failure cause:

When cluster (imported-0) moved to newly created workspace it take around 30sec of time (before increase in timeout).

After 30 seconds timeout over, cluster is still not fully moved to newly created workspace.

Next tests step executed i.e. GitRepo creation, which shows resource count (6/6) in the UI but bundle count was not loaded.

And the test is failing. (see below screenshot).

In the screenshot, we can clearly see bundle count wasn't shown there but we see resource count.
Screenshot showing resource count but not bundle count

What is fixed for test Fleet-51:

Increasing timeout from 30000 to 40000/60000 to 70000, gives extra time for cluster to be settle properly.

When next step executed i.e. creation of GitRepo, it will show proper bundle count and resource count on the screen.

Feel free to merge and we will see, but I am not sure I undertand what caused it. It is ok if we don't and we try simply not to have other tests affected by encapsulating it somewhere else. But my question remains: is it fixed? and if so, what caused it?

Please let me know if you require any other details. Happy to explain 😄

Feel free to merge it. 2.11 lane works well with 30000. We are increasing to 40000. In 2.12 60000 was already high value. Increasing to 70000 it is just keep increasing but not sure I get why this is.

Oops, I will revert the 40000 value to 30000 that was not intentional.
Also, here is proof still cluster is not ready within 60000 + 1000 wait timeout.
- Screenshot taken from CI's video
Above screenshot clear states that even cluster is not properly ready and it is executing further step i.e. GitRepo creation.
Actual cause of cluster not being fully ready:
- All default resources in the clusters are not ready in time.
- fleet-agent might be taking time to load or to sync with fleet-controller

These might be the several causes I can think of, which can be minimized by increasing timeouts. Sometimes cluster may be up and running within timeout as well.

Updated timeouts and moved test Fleet-51 to special

118f8a5

sbulage self-assigned this Nov 25, 2025

sbulage added fleet-e2e-ci Improvements or additions to the CI framework automation Add or update automation labels Nov 25, 2025

sbulage linked an issue Nov 25, 2025 that may be closed by this pull request

Frequent Test Fleet-51 failure on CI #421

Closed

Added wait to Fleet-156 and move Fleet-51 just above Fleet-156

aa7c35d

sbulage requested a review from mmartin24 November 25, 2025 14:01

mmartin24 reviewed Nov 25, 2025

View reviewed changes

mmartin24 approved these changes Nov 25, 2025

View reviewed changes

Reverting back 2.11 timeout to 30000 (increased not intentionally)

b5d1864

sbulage merged commit e85022d into main Nov 25, 2025
11 of 12 checks passed

sbulage deleted the fix-test-51 branch November 25, 2025 19:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updated timeouts and moved test Fleet-51 to special #422

Updated timeouts and moved test Fleet-51 to special #422

Uh oh!

sbulage commented Nov 25, 2025 •

edited

Loading

Uh oh!

sbulage commented Nov 25, 2025 •

edited

Loading

Uh oh!

sbulage commented Nov 25, 2025 •

edited

Loading

Uh oh!

mmartin24 Nov 25, 2025

Uh oh!

sbulage Nov 25, 2025

Uh oh!

mmartin24 commented Nov 25, 2025

Uh oh!

sbulage commented Nov 25, 2025

Uh oh!

mmartin24 commented Nov 25, 2025

Uh oh!

sbulage commented Nov 25, 2025

Uh oh!

mmartin24 commented Nov 25, 2025

Uh oh!

sbulage commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Updated timeouts and moved test Fleet-51 to special #422

Updated timeouts and moved test Fleet-51 to special #422

Uh oh!

Conversation

sbulage commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem:

What should happen:

What is happening:

Solution:

Uh oh!

sbulage commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbulage commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mmartin24 Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

sbulage Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

mmartin24 commented Nov 25, 2025

Uh oh!

sbulage commented Nov 25, 2025

Uh oh!

mmartin24 commented Nov 25, 2025

Uh oh!

sbulage commented Nov 25, 2025

Uh oh!

mmartin24 commented Nov 25, 2025

Uh oh!

sbulage commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sbulage commented Nov 25, 2025 •

edited

Loading

sbulage commented Nov 25, 2025 •

edited

Loading

sbulage commented Nov 25, 2025 •

edited

Loading