Skip to content

test(e2e): decouple ScyllaDB Manager task property update verification from backup, restore, and repair tests#3470

Merged
scylla-operator-bot[bot] merged 1 commit into
scylladb:masterfrom
rzetelskik:manager-task-e2e-decoupling
Jun 9, 2026
Merged

test(e2e): decouple ScyllaDB Manager task property update verification from backup, restore, and repair tests#3470
scylla-operator-bot[bot] merged 1 commit into
scylladb:masterfrom
rzetelskik:manager-task-e2e-decoupling

Conversation

@rzetelskik

@rzetelskik rzetelskik commented Jun 8, 2026

Copy link
Copy Markdown
Member

Description of your changes:
The e2e-gke-parallel check in #3461 failed on a Manager-related test. Ultimately, the failure is caused by scylladb/scylla-manager#4564: if a task fails and is scheduled for retry, a subsequent PutTask call - even one that doesn't change the schedule - cancels the pending retry. The task never reruns and the test times out waiting for completion. I couldn't identify the root cause of the initial task failure in a reasonable timeframe, but the investigation was sufficient to say it wasn't a regression from the integration perspective; the upstream bug is tracked in CLOUD-2276.

From an integration testing perspective, verifying that an update mid-run triggers a retry brings no value - it exercises a scylla-manager scheduler edge case rather than operator behaviour. I changed the workflow to: create task -> wait for completion -> update task -> wait for completion. That sequence sidesteps the race entirely.

As part of the same change I decoupled task update verification from task deletion. The tests for "delete repair task" and "disable manager integration" now live in a separate DescribeTable, independent of the update test. The same restructuring is applied to the ScyllaDBManagerTask single-DC and multi-DC suites, and the backup task update verification is removed from the object storage suite (the update path is covered by the repair task suite and the ScyllaDBManagerTask suites). The shared cluster setup (create, rollout, CQL data insertion, manager registration) is moved into JustBeforeEach so all sub-tests reuse it without duplication.
These changes should make future flakes easier to isolate.

Which issue is resolved by this Pull Request:
Resolves https://scylladb.atlassian.net/browse/OPERATOR-140

/kind flake
/priority important-soon
/cc

@scylla-operator-bot

Copy link
Copy Markdown
Contributor

@rzetelskik: GitHub didn't allow me to request PR reviews from the following users: rzetelskik.

Note that only scylladb members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

Description of your changes: wip

Which issue is resolved by this Pull Request:
Resolves #

/kind flake
/priority important-soon
/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@scylla-operator-bot scylla-operator-bot Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 8, 2026
@rzetelskik rzetelskik force-pushed the manager-task-e2e-decoupling branch from 963e4c1 to 2d69432 Compare June 8, 2026 14:21
@scylla-operator-bot scylla-operator-bot Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 8, 2026
@rzetelskik rzetelskik changed the title [WIP] test(e2e): decouple ScyllaDB Manager task property update verification from backup, restore, and repair tests test(e2e): decouple ScyllaDB Manager task property update verification from backup, restore, and repair tests Jun 9, 2026
@scylla-operator-bot scylla-operator-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2026
@rzetelskik

Copy link
Copy Markdown
Member Author

/auto-cc

@czeslavo czeslavo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@scylla-operator-bot scylla-operator-bot Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 9, 2026
@scylla-operator-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: czeslavo, rzetelskik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [czeslavo,rzetelskik]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rzetelskik

Copy link
Copy Markdown
Member Author

/test images
/retest

@rzetelskik

Copy link
Copy Markdown
Member Author

@rzetelskik: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke-serial 2d69432 link unknown /test e2e-gke-serial
ci/prow/e2e-gke-parallel-clusterip 2d69432 link unknown /test e2e-gke-parallel-clusterip
ci/prow/e2e-gke-parallel 2d69432 link unknown /test e2e-gke-parallel

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Cluster provisioning failed.
/test all

@rzetelskik

Copy link
Copy Markdown
Member Author

/retest

@scylla-operator-bot scylla-operator-bot Bot merged commit 4c3c945 into scylladb:master Jun 9, 2026
27 checks passed
@rzetelskik rzetelskik deleted the manager-task-e2e-decoupling branch June 9, 2026 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/Operator kind/flake Categorizes issue or PR as related to a flaky test. lgtm Indicates that a PR is ready to be merged. P2 priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. symptom/ci_stability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants