-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(nemesis): add support ipv6 for refuse connection for banned node #10594
base: master
Are you sure you want to change the base?
fix(nemesis): add support ipv6 for refuse connection for banned node #10594
Conversation
Isn't it just a Scylla issue? Even under high load, shouldn't other nodes know that one is down within 10 minutes? |
912fb11
to
a06b7e1
Compare
i found the problem. It was not in timeout, it was related to ipv6. |
Additional staging job is running |
target_node.log.debug("Send signal SIGSTOP to scylla process on node %s", target_node.name) | ||
target_node.remoter.sudo("pkill --signal SIGSTOP -e scylla", timeout=60) | ||
yield | ||
target_node.log.debug("Send signal SIGCONT to scylla process on node %s", target_node.name) | ||
target_node.remoter.sudo(cmd="pkill --signal SIGCONT -e scylla", timeout=60) | ||
|
||
|
||
@contextlib.contextmanager | ||
def block_loaders_payload_for_scylla_node(scylla_node: BaseNode, loader_nodes: list[BaseNode]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add docstring why this is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
@@ -48,4 +76,6 @@ def is_node_removed_from_cluster(removed_node: BaseNode, verification_node: Base | |||
|
|||
def is_node_seen_as_down(down_node: BaseNode, verification_node: BaseNode) -> bool: | |||
LOGGER.debug("Verification node %s", verification_node.name) | |||
return down_node not in verification_node.parent_cluster.get_nodes_up_and_normal(verification_node) | |||
nodes_status = verification_node.parent_cluster.get_nodetool_status(verification_node, dc_aware=False) | |||
down_node_status = nodes_status.get(down_node.ip_address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouln't use down_node.listen_address
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be the same as ip_address and we use it everywhere with nodetool_status
disrupt_refuse_connection_with_* nemesises doesn't support ipv6. - Added command for blocking ports for ipv6 stack. When node is banned and alive, c-s/s-b could connect to it and failed with critical error, because banned node return that other node cluster is down. - Added new node_operation: block_loader_workload_for_scyllanode. This allow to block connections to scylla node from loaders and aboid critical error of c-s/s-b if them connect to banned node and failed to run Fixes: scylladb#10434
a06b7e1
to
2c4e417
Compare
disrupt_refuse_connection_with_* nemesises doesn't support ipv6.
When node is banned and alive, c-s/s-b could connect to it
and failed with critical error, because banned node return
that other node cluster is down.
This allow to block connections to scylla node from loaders
and avoid critical error of c-s/s-b if them connect to
banned node and failed to run
Fixes: #10434
Testing
PR pre-checks (self review)
backport
labelsReminders
sdcm/sct_config.py
)unit-test/
folder)