Skip to content

fix(nemesis): add support ipv6 for refuse connection for banned node #10594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

aleksbykov
Copy link
Contributor

@aleksbykov aleksbykov commented Apr 6, 2025

disrupt_refuse_connection_with_* nemesises doesn't support ipv6.

  • Added command for blocking ports for ipv6 stack.

When node is banned and alive, c-s/s-b could connect to it
and failed with critical error, because banned node return
that other node cluster is down.

  • Added new node_operation: block_loader_workload_for_scyllanode.

This allow to block connections to scylla node from loaders
and avoid critical error of c-s/s-b if them connect to
banned node and failed to run

Fixes: #10434

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@soyacz
Copy link
Contributor

soyacz commented Apr 7, 2025

Isn't it just a Scylla issue? Even under high load, shouldn't other nodes know that one is down within 10 minutes?

@aleksbykov aleksbykov force-pushed the fix-10434-increase-timeout-to-wait-down branch 8 times, most recently from 912fb11 to a06b7e1 Compare April 11, 2025 12:24
@aleksbykov aleksbykov changed the title fix(nemesis): increase timeout waiting node down fix(nemesis): add support ipv6 for refuse connection for banned node Apr 11, 2025
@aleksbykov
Copy link
Contributor Author

Isn't it just a Scylla issue? Even under high load, shouldn't other nodes know that one is down within 10 minutes?

i found the problem. It was not in timeout, it was related to ipv6.

@aleksbykov aleksbykov marked this pull request as ready for review April 11, 2025 12:32
@aleksbykov
Copy link
Contributor Author

Additional staging job is running

disrupt_refuse_connection_with_* nemesises doesn't support ipv6.
 - Added command for blocking ports for ipv6 stack.

When node is banned and alive, c-s/s-b could connect to it
and failed with critical error, because banned node return
that other node cluster is down.
 - Added new node_operation: block_loader_workload_for_scyllanode.
 This allow to block connections to scylla node from loaders
 and aboid critical error of c-s/s-b if them connect to
 banned node and failed to run

Fixes: scylladb#10434
@aleksbykov aleksbykov force-pushed the fix-10434-increase-timeout-to-wait-down branch from a06b7e1 to 2c4e417 Compare April 11, 2025 19:31
Copy link
Contributor

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@temichus
Copy link
Contributor

@scylladb/qa-maintainers, please merge, backport

@vponomaryov vponomaryov merged commit 8691403 into scylladb:master Apr 15, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need to increase wait timeout for disrupt_refuse_connection_with_block_scylla_ports_on_banned_node nemesis
5 participants