Skip to content

fix(limited_voters): validate non-voters as failed for scylla <=2025.1 #10636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

aleksbykov
Copy link
Contributor

@aleksbykov aleksbykov commented Apr 14, 2025

New feature Limited raft voters was merged. Starting from 2025.2 not all nodes are raft voters. Update raft check consistency functionality to not report errors if alive non-voters nodes

for latest sct jobs health validator reported next errors:

2025-04-15 03:07:17.267: (ClusterHealthValidatorEvent Severity.ERROR) period_type=one-time event_id=f817d89c-ef15-48cb-8cbc-6c6af699c123: type=Group0TokenRingInconsistency node=longevity-mv-si-4d-master-db-node-15b46631-7 error=Node longevity-mv-si-4d-master-db-node-15b46631-7 has group0 member with host_id 8176f51c-65e6-49f2-9a4a-06d7a5c4a682 with can_vote False and presents in token ring True. Inconsistency between group0: [{'host_id': '13ad363b-506d-4196-ac5b-a83d82955edf', 'voter': True}, {'host_id': '2afaa902-cb64-40fc-8fe5-d123cdf6154d', 'voter': True}, {'host_id': '3958770c-0b8c-4bef-a794-3d42f9e0beb9', 'voter': True}, {'host_id': '46da3db7-ea01-44a7-8af0-d405f33d14a2', 'voter': True}, {'host_id': '70e6a2d4-e6b7-44e9-827f-0168db14ba04', 'voter': True}, {'host_id': '8176f51c-65e6-49f2-9a4a-06d7a5c4a682', 'voter': False}] and tokenring: [{'host_id': '13ad363b-506d-4196-ac5b-a83d82955edf', 'ip_address': '10.4.1.233'}, {'host_id': '2afaa902-cb64-40fc-8fe5-d123cdf6154d', 'ip_address': '10.4.2.2'}, {'host_id': '3958770c-0b8c-4bef-a794-3d42f9e0beb9', 'ip_address': '10.4.0.158'}, {'host_id': '46da3db7-ea01-44a7-8af0-d405f33d14a2', 'ip_address': '10.4.2.38'}, {'host_id': '8176f51c-65e6-49f2-9a4a-06d7a5c4a682', 'ip_address': '10.4.3.181'}, {'host_id': '70e6a2d4-e6b7-44e9-827f-0168db14ba04', 'ip_address': '10.4.1.238'}]

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@aleksbykov aleksbykov force-pushed the fix-add-support-non-limited-voters branch from 57a934f to 8b91539 Compare April 14, 2025 18:18
@aleksbykov aleksbykov marked this pull request as ready for review April 15, 2025 09:17
@aleksbykov
Copy link
Contributor Author

@scylladb/qa-maintainers , could you please take a look

@soyacz
Copy link
Contributor

soyacz commented Apr 15, 2025

Who decides which nodes are voters and which aren't? Is this constant, or changes with some action (e.g. restart)/time based?
Is there information stored in some system table/other place which are not voters?

@aleksbykov
Copy link
Contributor Author

Who decides which nodes are voters and which aren't? Is this constant, or changes with some action (e.g. restart)/time based? Is there information stored in some system table/other place which are not voters?

@soyacz , Topology coordinator automatically decide to which node will be voters and automatically choose on any topology/restart action new voters.
we can find system.raft_state which nodes are voters and which are not
More info could be found here
https://docs.google.com/document/d/1Q5p4-og8OEwyGyOHzGpFVAlnYMFqyWQUhyJfIKPdZ9M/edit?tab=t.0#heading=h.8nif5rm38vk6

@soyacz
Copy link
Contributor

soyacz commented Apr 16, 2025

Who decides which nodes are voters and which aren't? Is this constant, or changes with some action (e.g. restart)/time based? Is there information stored in some system table/other place which are not voters?

@soyacz , Topology coordinator automatically decide to which node will be voters and automatically choose on any topology/restart action new voters. we can find system.raft_state which nodes are voters and which are not More info could be found here https://docs.google.com/document/d/1Q5p4-og8OEwyGyOHzGpFVAlnYMFqyWQUhyJfIKPdZ9M/edit?tab=t.0#heading=h.8nif5rm38vk6

Looks, there's a minimal number of voters, can we validate at least this?

@soyacz
Copy link
Contributor

soyacz commented Apr 16, 2025

@scylladb/qa-maintainers , could you please take a look

@temichus can you verify the testing logic behind the feature? I saw long document describing it and I don't feel competent enough to fully verify it.

@aleksbykov
Copy link
Contributor Author

Who decides which nodes are voters and which aren't? Is this constant, or changes with some action (e.g. restart)/time based? Is there information stored in some system table/other place which are not voters?

@soyacz , Topology coordinator automatically decide to which node will be voters and automatically choose on any topology/restart action new voters. we can find system.raft_state which nodes are voters and which are not More info could be found here https://docs.google.com/document/d/1Q5p4-og8OEwyGyOHzGpFVAlnYMFqyWQUhyJfIKPdZ9M/edit?tab=t.0#heading=h.8nif5rm38vk6

Looks, there's a minimal number of voters, can we validate at least this?

i am not sure we need it. because it will depend on cluster size and topology(dcs, racks, number of nodes). Raft topology will choose, monitor and reconfigure voters only for raft quorum needs. and it will be difficult to predict voters will be changed or not. Max number of voters is 5 (and it could change in future). if cluster has 3 nodes, all node will be voters, if cluster has 6 nodes, only five nodes going to be a voter and which one exactly, raft will be decide on the fly. also voters will be spread between racks and dcs. So this feature is mostly hidden, and if something goes wrong, we will detect by other mechanisms of health validator and schema/topology operations.

@aleksbykov aleksbykov force-pushed the fix-add-support-non-limited-voters branch from 8b91539 to d286871 Compare April 16, 2025 13:39
@aleksbykov aleksbykov added the backport/none Backport is not required label Apr 16, 2025
@aleksbykov aleksbykov force-pushed the fix-add-support-non-limited-voters branch 2 times, most recently from 7738629 to 60ea0b9 Compare April 16, 2025 14:20
temichus
temichus previously approved these changes Apr 16, 2025
Copy link
Contributor

@temichus temichus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now looks good. please retest and ask some Dev to check the logic.

@aleksbykov aleksbykov force-pushed the fix-add-support-non-limited-voters branch from 60ea0b9 to ba31adc Compare April 16, 2025 14:49
@aleksbykov aleksbykov requested a review from emdroid April 17, 2025 05:34
@aleksbykov
Copy link
Contributor Author

@emdroid , can you take a look please

temichus
temichus previously approved these changes Apr 17, 2025
@aleksbykov aleksbykov force-pushed the fix-add-support-non-limited-voters branch from ba31adc to 54f7ad5 Compare April 17, 2025 11:15
New feature Limited raft voters was merged. Starting from 2025.2
not all nodes are raft voters. Update raft check consistency functionality
to reported errors if alive non-voters nodes
@aleksbykov aleksbykov force-pushed the fix-add-support-non-limited-voters branch from 54f7ad5 to 5bd919e Compare April 17, 2025 11:44
Copy link
Contributor

@vponomaryov vponomaryov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@soyacz soyacz merged commit 67c34d2 into scylladb:master Apr 17, 2025
7 checks passed
@vponomaryov vponomaryov added backport/2025.1 and removed backport/none Backport is not required labels Apr 17, 2025
aleksbykov added a commit to aleksbykov/scylla-cluster-tests that referenced this pull request Apr 17, 2025
In PR: scylladb#10636
after switch to use feature name instead of version
was used wrong logic in if statement.
Check non voters should be done only if feature GROUP0_LIMITED_VOTERS
is not enabled
soyacz pushed a commit that referenced this pull request Apr 18, 2025
In PR: #10636
after switch to use feature name instead of version
was used wrong logic in if statement.
Check non voters should be done only if feature GROUP0_LIMITED_VOTERS
is not enabled
scylladbbot pushed a commit to scylladbbot/scylla-cluster-tests that referenced this pull request Apr 18, 2025
In PR: scylladb#10636
after switch to use feature name instead of version
was used wrong logic in if statement.
Check non voters should be done only if feature GROUP0_LIMITED_VOTERS
is not enabled

(cherry picked from commit d16d124)
@juliayakovlev
Copy link
Contributor

@aleksbykov we need to backport this fix to branch-perf-v15 and branch-perf-v16

@aleksbykov
Copy link
Contributor Author

if performance tests use the health validator , then it is also should be backported to that branches and this one also #10651

@juliayakovlev
Copy link
Contributor

juliayakovlev commented Apr 21, 2025

if performance tests use the health validator , then it is also should be backported to that branches and this one also #10651

@aleksbykov
yes, health validator runs there

scylladbbot pushed a commit to scylladbbot/scylla-cluster-tests that referenced this pull request Apr 23, 2025
In PR: scylladb#10636
after switch to use feature name instead of version
was used wrong logic in if statement.
Check non voters should be done only if feature GROUP0_LIMITED_VOTERS
is not enabled

(cherry picked from commit d16d124)
timtimb0t pushed a commit to timtimb0t/scylla-cluster-tests that referenced this pull request Apr 25, 2025
In PR: scylladb#10636
after switch to use feature name instead of version
was used wrong logic in if statement.
Check non voters should be done only if feature GROUP0_LIMITED_VOTERS
is not enabled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants