-
Notifications
You must be signed in to change notification settings - Fork 101
fix(nodetool rebuild): use repair instead of rebuild if no tablets support #9073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix(nodetool rebuild): use repair instead of rebuild if no tablets support #9073
Conversation
@bhalevy , can you please advise, following scylladb/scylladb#20084 (comment) - IIUC, in case scylladb/scylladb#17852 is open all DC nodes should be manually repaired. and secondly, i'm not sure, is it right to backport this fix to 2024/6.x ? (it may have an extensive impact on longevities and testing for this PR) |
5340448
to
f24debe
Compare
sdcm/nemesis.py
Outdated
with self.cluster.cql_connection_patient(self.target_node) as session: | ||
if is_tablets_feature_enabled(session=session) and not is_rebuild_supported: | ||
for node in [n for n in self.cluster.nodes if n.dc_idx == self.target_node.dc_idx]: | ||
node.run_nodetool(sub_cmd="repair") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend doing long_running=True, retry=0
also maybe to consider hard timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also I'm not sure you have guarantee all the nodes in this DC are up and running...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed like:
for node in [n for n in self.cluster.nodes if n.dc_idx == self.target_node.dc_idx and n.db_up()]:
node.run_nodetool(sub_cmd="repair", long_running=True, retry=0)
@fruch , since long_running
support already mentioned, how about using scylla task manager for monitoring such commands progress and results?
There's already a dtest covering task manager in https://github.com/scylladb/scylla-dtest/pull/4957
@bhalevy , @pehala , please advise - |
3635f1a
to
98b7009
Compare
…pport if no tables support for nodetool rebuild, test should use an alternative action of repair. it should then disable load-balancing and repair all nodes in this datacenter. refs: scylladb/scylladb#17575 refs: scylladb/scylladb#20084 (comment)
98b7009
to
9ab928c
Compare
if no tables support for nodetool rebuild, test should use an alternative action of repair. it should then disable load-balancing and repair all nodes in this datacenter.
refs: scylladb/scylladb#17575
refs: scylladb/scylladb#20084 (comment)
Testing
PR pre-checks (self review)
backport
labelsReminders
sdcm/sct_config.py
)unit-test/
folder)