-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(nemesis): make nemesis more safely by adding repair on all nodes #10496
base: master
Are you sure you want to change the base?
Conversation
f28bb16
to
ada17ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need make the repair method a bit safely
sdcm/nemesis.py
Outdated
node.run_nodetool(sub_cmd="repair", publish_event=publish_event) | ||
def repair_nodetool_repair(self, node=None, publish_event=True, all_nodes=False): | ||
if all_nodes: | ||
for node in self.cluster.nodes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use only data nodes self.clsuter.data_nodes
. Repair operation could be skipped on zero-token nodes
node = node if node else self.target_node | ||
with adaptive_timeout(Operations.REPAIR, node, timeout=HOUR_IN_SEC * 48): | ||
node.run_nodetool(sub_cmd="repair", publish_event=publish_event) | ||
def repair_nodetool_repair(self, node=None, publish_event=True, all_nodes=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think, you need go over the method usage, and find out whether this method used in Non-disruptive nemesis(methods), so operation is not failed if node is under Disruptive nemesis. Otherwise, you need to make this changes a bit safely by adding ignore status of nodetool command or add try except
@@ -4273,6 +4278,7 @@ def disrupt_repair_streaming_err(self): | |||
""" | |||
Stop repair in middle to trigger some streaming fails, then rebuild the data on the node. | |||
""" | |||
self.repair_nodetool_repair(all_nodes=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a short comment, why we need it here
this change add repair at the beggining of disrupt_repair_streaming_err to make this nemesis more safe and avoid c-s data validation errors ref: scylladb/scylladb#21428
ada17ec
to
728a8b1
Compare
@timtimb0t new branch |
this change add repair at the beggining of disrupt_repair_streaming_err to make this nemesis more safe and avoid c-s data validation errors ref: scylladb/scylladb#21428
Testing
PR pre-checks (self review)
backport
labelsReminders
sdcm/sct_config.py
)unit-test/
folder)