Skip to content

fix(nemesis): make nemesis more safely by adding repair on all nodes #10496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 8, 2025

Conversation

timtimb0t
Copy link
Contributor

@timtimb0t timtimb0t commented Mar 25, 2025

this change add repair at the beggining of disrupt_repair_streaming_err to make this nemesis more safe and avoid c-s data validation errors ref: scylladb/scylladb#21428

fixes: scylladb/scylladb#21428

Testing

https://jenkins.scylladb.com/job/scylla-staging/job/eugene_test_folder/job/repair_repair/6/

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@timtimb0t timtimb0t added the backport/none Backport is not required label Mar 25, 2025
@timtimb0t timtimb0t force-pushed the validation_after_destroying branch from f28bb16 to ada17ec Compare March 25, 2025 11:02
Copy link
Contributor

@aleksbykov aleksbykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need make the repair method a bit safely

@timtimb0t timtimb0t force-pushed the validation_after_destroying branch from ada17ec to 728a8b1 Compare March 26, 2025 11:47
@timtimb0t timtimb0t marked this pull request as ready for review March 26, 2025 11:48
@timtimb0t timtimb0t requested a review from aleksbykov March 26, 2025 11:50
@scylladbbot
Copy link

@timtimb0t new branch manager-3.5 was added, please add backport label if needed

@timtimb0t
Copy link
Contributor Author

@aleksbykov , could you please take a look again?

@timtimb0t timtimb0t force-pushed the validation_after_destroying branch from 728a8b1 to 75c38aa Compare April 24, 2025 08:10
@timtimb0t timtimb0t requested a review from aleksbykov April 24, 2025 08:53
@timtimb0t timtimb0t force-pushed the validation_after_destroying branch from 75c38aa to b56f6ba Compare April 25, 2025 09:40
Copy link
Contributor

@aleksbykov aleksbykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix it, because we broke behavior of this method

@timtimb0t timtimb0t force-pushed the validation_after_destroying branch 3 times, most recently from 2ca2f1d to 205c7cc Compare April 30, 2025 08:44
@timtimb0t timtimb0t requested a review from aleksbykov April 30, 2025 08:51
@timtimb0t timtimb0t force-pushed the validation_after_destroying branch from 205c7cc to 619cfb0 Compare May 5, 2025 08:32
@aleksbykov
Copy link
Contributor

Here https://github.com/timtimb0t/scylla-cluster-tests/blob/619cfb05b0c5bc8126401ef95945f7b7a65cddac/sdcm/nemesis.py#L3629 same code is used, you can replace it with your method. And add please docstring with description why we skip repair error

@timtimb0t timtimb0t force-pushed the validation_after_destroying branch from 619cfb0 to d8b03a0 Compare May 6, 2025 10:03
aleksbykov
aleksbykov previously approved these changes May 7, 2025
Copy link
Contributor

@aleksbykov aleksbykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add documentation

@scylladbbot
Copy link

@timtimb0t new branch branch-2025.2 was added, please add backport label if needed

@timtimb0t
Copy link
Contributor Author

@scylladb/qa-maintainers , could you please merge?

@aleksbykov
Copy link
Contributor

@timtimb0t , please add link to job with fix where the code was executed

@timtimb0t timtimb0t dismissed stale reviews from aleksbykov and temichus via 96acfa2 June 3, 2025 07:34
@timtimb0t timtimb0t force-pushed the validation_after_destroying branch from 4962ede to 96acfa2 Compare June 3, 2025 07:34
@timtimb0t
Copy link
Contributor Author

@fruch , @soyacz could you please take a look?

@timtimb0t timtimb0t force-pushed the validation_after_destroying branch from 96acfa2 to 2c2f1a2 Compare June 4, 2025 09:01
@timtimb0t timtimb0t requested a review from soyacz June 4, 2025 09:02
@timtimb0t timtimb0t force-pushed the validation_after_destroying branch from 2c2f1a2 to 264d957 Compare June 4, 2025 09:55
@timtimb0t
Copy link
Contributor Author

@fruch , @roydahan

@fruch
Copy link
Contributor

fruch commented Jun 4, 2025

@timtimb0t

the original referenced bug talks about disrupt_destroy_data_then_rebuild
scylladb/scylladb#21428 (comment)

but this PR doesn't do anything with that nemesis, so I'm a bit confused

@timtimb0t
Copy link
Contributor Author

@timtimb0t

the original referenced bug talks about disrupt_destroy_data_then_rebuild scylladb/scylladb#21428 (comment)

but this PR doesn't do anything with that nemesis, so I'm a bit confused

it was decided to place fix before the nemesis that failed (ie disrupt_repair_streaming_err) because it failed few times with the same error

@roydahan
Copy link
Contributor

roydahan commented Jun 5, 2025

@timtimb0t
the original referenced bug talks about disrupt_destroy_data_then_rebuild scylladb/scylladb#21428 (comment)
but this PR doesn't do anything with that nemesis, so I'm a bit confused

it was decided to place fix before the nemesis that failed (ie disrupt_repair_streaming_err) because it failed few times with the same error

Just add the -pr for now, it's a path we probably won't exercise but better to be on the safe side.

this change add repair at the beggining of disrupt_repair_streaming_err
to make this nemesis more safe and avoid c-s data validation errors
ref: scylladb/scylladb#21428
@timtimb0t timtimb0t force-pushed the validation_after_destroying branch from 264d957 to 37825f9 Compare June 5, 2025 15:59
@timtimb0t timtimb0t requested a review from roydahan June 5, 2025 16:06
Copy link
Contributor

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@enaydanov enaydanov merged commit de6cdbd into scylladb:master Jun 8, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/none Backport is not required promoted-to-master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[c-s] failed validation after destroying files and rebuild
9 participants