fix(seeds): change default seeds_selector from 'all' to 'random' 3 nodes by roydahan · Pull Request #15180 · scylladb/scylla-cluster-tests

roydahan · 2026-06-23T14:46:16Z

Summary

Changes the default seed selection from all nodes to up to 3 random seed nodes
Caps seeds_num by the current cluster size so 1-node/2-node tests remain valid with the default seeds_num: 3
Makes set_seeds() replace the seed set instead of accumulating stale seed flags across reselection
Adds unit coverage for capped random seed selection and seed flag replacement

Motivation

According to the ScyllaDB official docs on seed nodes:

Since ScyllaDB OSS 4.3 / Enterprise 2021.1, a seed node is only used as the initial contact point for a new node joining the cluster to discover ring topology. Once the nodes have joined the cluster, the seed node has no function.

The previous default of seeds_selector: "all" marks every node in the cluster as a seed. This is unnecessary for modern ScyllaDB and was explicitly forbidden in older versions, where at least one non-seed node was required.

Using only one seed is minimal, but it is fragile in SCT: several nemesis operations can terminate, decommission, remove, or replace the seed node. If that happens before a new node is added, SCT may no longer have a valid seed node configured for the new node bootstrap.

Change

	Before	After
`seeds_selector` default	`"all"`	`"random"`
`seeds_num` default	`1` (ignored when selector is `"all"`)	`3`

With seeds_selector: "random" and seeds_num: 3, SCT keeps a bounded set of seed nodes that is more resilient to nemesis operations than a single seed, while avoiding the unnecessary all-nodes-as-seeds configuration.

Small clusters

seeds_num now behaves as a maximum for first/random selection. If a cluster has fewer than 3 nodes, SCT selects all available nodes instead of failing config validation or random.sample().

Testing

uv run sct.py unit-tests -t unit/test_seed_selector.py

According to ScyllaDB official docs (https://opensource.docs.scylladb.com/stable/kb/seed-nodes.html), since ScyllaDB OSS 4.3 / Enterprise 2021.1 seeds are only used as the initial contact point for nodes joining the cluster. Once a node has joined, the seed designation has no ongoing function. Marking every node as a seed ('all') is therefore unnecessary, and was explicitly forbidden in older ScyllaDB versions where at least one non-seed node was required. Changing the default to 'first' (with seeds_num=1) aligns with the minimal-seed recommendation from the official docs while remaining safe for all supported ScyllaDB versions.

coderabbitai · 2026-06-23T14:50:45Z

📝 Walkthrough

Walkthrough

Updates the default test configuration to use seeds_selector: "first" instead of "all". Expands SCTConfiguration documentation for seeds_selector and seeds_num to clarify seed-node semantics. Adds decommissioning logic that prevents invalid seed states when the target node is the last seed by selecting an alternate seed candidate and adjusting the replacement node's seed flag accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

Bug

Suggested reviewers

fruch
cezarmoise

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	⚠️ Warning	The title states 'change default seeds_selector from 'all' to 'random' 3 nodes' but the actual change is to 'first', not 'random'. This is misleading and does not match the implementation.	Update the title to accurately reflect the change: 'fix(seeds): change default seeds_selector from 'all' to 'first' with seeds_num: 1'

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	PR description is complete with summary, motivation, changes, and testing details.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

scylladb-promoter · 2026-06-23T14:54:36Z

❌ Test Summary: FAILED

❌ Precommit: FAILED

Total	Passed	Failed	Skipped
50	16	1	33

Failed Hooks

update-conf-docs

Output

- hook id: update-conf-docs
- files were modified by this hook

/usr/local/lib/python3.14/site-packages/azure/mgmt/compute/models/_models.py:12585: SyntaxWarning: "\W" is an invalid escape sequence. Such sequences will not work in the future. Did you mean "\\W"? A raw string is also an option.
  <br>Has upper characters <br> Has a digit <br> Has a special character (Regex match [\W_])
/usr/local/lib/python3.14/site-packages/azure/mgmt/compute/models/_models.py:12695: SyntaxWarning: "\W" is an invalid escape sequence. Such sequences will not work in the future. Did you mean "\\W"? A raw string is also an option.
  <br>Has upper characters <br> Has a digit <br> Has a special character (Regex match [\W_])
/usr/local/lib/python3.14/site-packages/azure/mgmt/compute/models/_models.py:23850: SyntaxWarning: "\W" is an invalid escape sequence. Such sequences will not work in the future. Did you mean "\\W"? A raw string is also an option.
  <br>Has upper characters <br> Has a digit <br> Has a special character (Regex match [\W_])
docs written into docs/configuration_options.md

✅ Tests: PASSED

Total	Passed	Failed	Errors	Skipped
3656	3625	0	0	31

Full build log

roydahan · 2026-06-23T15:58:11Z

Still need to verify that in case the seed node was terminated/removed/decommissioned by a nemesis, we replace the seed.

In the past we had special handling for seed nodes, I don't know if it's still the case.

With a single-seed default, generic decommission nemesis can target the only seed node. Transfer seed responsibility to another live node before decommissioning the last seed so adding a replacement node still has a valid seed_provider to use during bootstrap.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@sdcm/nemesis/__init__.py`:
- Around line 1295-1305: The seed node update logic in this block (calling
set_seed_flag() and update_seed_provider()) does not work effectively on
Kubernetes where seed_nodes is empty and these operations are no-ops, which
could leave the cluster without a seed node. Add a check to prevent this branch
from executing when running on Kubernetes or when seed providers are not
available, ensuring that the decommission/terminate operation does not proceed
if it would eliminate the last seed node and the seed updates would not actually
take effect due to the cluster backend limitations.
- Around line 1296-1306: The seed candidate selection in the block starting with
the seed_candidates list comprehension uses raw cluster.nodes without validating
the node's availability or allocation status, which risks selecting a node that
is down or already under another nemesis operation. Replace the direct
cluster.nodes filtering with NemesisNodeAllocator to properly reserve the new
seed node, ensuring you filter for up-normal nodes only, and skip the seed
promotion operation if no suitable candidates are available through the
allocator. This prevents the decommissioning logic from accidentally eliminating
the last seed by promoting an unavailable node.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a69696f8-b25d-4523-b0f3-91ad97c267d9

📥 Commits

Reviewing files that changed from the base of the PR and between b45985c and 1e1390d.

📒 Files selected for processing (1)

sdcm/nemesis/__init__.py

Use up to three randomly selected seed nodes by default so nemesis operations that remove or replace one seed still leave other seed contact points available. Cap the requested seed count by the current cluster size so small clusters remain valid, and replace stale seed flags when reselection runs.

fruch · 2026-06-23T19:48:25Z

Still need to verify that in case the seed node was terminated/removed/decommissioned by a nemesis, we replace the seed.

In the past we had special handling for seed nodes, I don't know if it's still the case.

We don't, but this first needs to be reviewed and tested before enabling it by default.

We should add one more option to select 2 or 3, since we can't greentree the one selected will be up (parallel nemesis case, even that we don't do topology changes in parallel), since the safe recommended option would be to have couple of nodes as contact points for fallback if needed (even if it's a short while)

roydahan requested a review from fruch as a code owner June 23, 2026 14:46

github-actions Bot assigned roydahan Jun 23, 2026

roydahan marked this pull request as draft June 23, 2026 16:38

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread sdcm/nemesis/__init__.py Outdated

Comment thread sdcm/nemesis/__init__.py Outdated

roydahan changed the title ~~fix(seeds): change default seeds_selector from 'all' to 'first'~~ fix(seeds): change default seeds_selector from 'all' to 'random' 3 nodes Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(seeds): change default seeds_selector from 'all' to 'random' 3 nodes#15180

fix(seeds): change default seeds_selector from 'all' to 'random' 3 nodes#15180
roydahan wants to merge 3 commits into
masterfrom
fix-seeds-selector-default

roydahan commented Jun 23, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Walkthrough

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

scylladb-promoter commented Jun 23, 2026 •

edited

Loading

Uh oh!

roydahan commented Jun 23, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

fruch commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

roydahan commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Change

Small clusters

Testing

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

scylladb-promoter commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Test Summary: FAILED

❌ Precommit: FAILED

Failed Hooks

✅ Tests: PASSED

Uh oh!

roydahan commented Jun 23, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fruch commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

roydahan commented Jun 23, 2026 •

edited

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

scylladb-promoter commented Jun 23, 2026 •

edited

Loading