Skip to content

fix(longevity_test): Set ignore 'raft topology connection close' error globally#10386

Merged
soyacz merged 1 commit intoscylladb:masterfrom
aleksbykov:fix-20950-error-message
Apr 10, 2025
Merged

fix(longevity_test): Set ignore 'raft topology connection close' error globally#10386
soyacz merged 1 commit intoscylladb:masterfrom
aleksbykov:fix-20950-error-message

Conversation

@aleksbykov
Copy link
Contributor

@aleksbykov aleksbykov commented Mar 12, 2025

Error: 'raft_topology - topology change coordinator fiber got error (connection is closed))' Could appeared in different moment when node/nodes are being restarted. As described in issue: scylladb/scylladb#20950 in comment: scylladb/scylladb#20950 (comment)

the error message could appeared while we have race between raft and gossip. we can ignore this issue while issue will not be fixed on scylla side, gossip mode will be removed from scylla.

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@fruch
Copy link
Contributor

fruch commented Mar 12, 2025

@aleksbykov you added this inside if self.validate_large_collections: clause

so it's not exactly globally as the title says, it would work only if a case has validation for large collection enabled

@aleksbykov aleksbykov force-pushed the fix-20950-error-message branch from 344c659 to b751266 Compare March 25, 2025 16:02
@aleksbykov aleksbykov requested review from fruch, soyacz and temichus March 26, 2025 16:11
@aleksbykov aleksbykov marked this pull request as ready for review March 26, 2025 16:11
temichus
temichus previously approved these changes Apr 1, 2025
Copy link
Contributor

@temichus temichus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the description to expalin second commit with nemesis_during_prepare

temichus
temichus previously approved these changes Apr 6, 2025
@aleksbykov aleksbykov force-pushed the fix-20950-error-message branch 2 times, most recently from 1686dc1 to dedc9ae Compare April 10, 2025 11:49
…globally

Error: 'raft_topology - topology change coordinator fiber got error  (connection is closed))'
Could appeared in different moment when node/nodes are being restarted.
As described in issue: scylladb/scylladb#20950 in comment:
scylladb/scylladb#20950 (comment)

the error message could appeared while we have race between raft and gossip.
we can ignore this issue while issue will not be fixed on scylla side,
gossip mode will be removed from scylla.
@aleksbykov aleksbykov force-pushed the fix-20950-error-message branch from dedc9ae to 59c86ea Compare April 10, 2025 12:31
Copy link
Contributor

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vponomaryov
Copy link
Contributor

aleksbykov added a commit to aleksbykov/scylla-cluster-tests that referenced this pull request Apr 23, 2025
…rameter

PR scylladb/scylla-cluster-tests/scylladb#10386 filter some expected
raft error messages globally. But this change broke integration
unit test. Issue scylladb#10676.

This fix 2 things:
 - change function signature to run with passed sct config for
unit tests or get sct running config
 - correctly filter error events.

ignore_topology_change_coordinator_errors is context manager
created as generator wrapped contextlib.contextmanager
and DBEventsFilter are created using ExitStack as ContextManagers.
To call this contextmanagers, was used
`ignore_topology_change_coordinator_errors().__enter__()` call

but once this call finished, all DBEventsFilters contextmangers
executed theirs __exit__() method and thus all event filters marked
as expired and appriate events are not filterd out. To avoid that,
ExitStack object recreated and all __exit__ methods will be triggered
after `yield`. This allow to run `ignore_topology_change_coordinator_errors`
as context manager to wrap some code/functions/method
or execute it globally withou events expiring

Fixes scylladb#10676
aleksbykov added a commit to aleksbykov/scylla-cluster-tests that referenced this pull request Apr 28, 2025
…rameter

PR scylladb/scylla-cluster-tests/scylladb#10386 filter some expected
raft error messages globally. But this change broke integration
unit test. Issue scylladb#10676.

Move DB event Filter from `ignore_topology_change_coordinator_errors`
context manager to `enable_default_filters`. If the event will be
filtered globally, then no need to filter it with cm.

Fixes scylladb#10676
aleksbykov added a commit to aleksbykov/scylla-cluster-tests that referenced this pull request Apr 28, 2025
…rameter

PR scylladb/scylla-cluster-tests/scylladb#10386 filter some expected
raft error messages globally. But this change broke integration
unit test. Issue scylladb#10676.

Move DB event Filter from `ignore_topology_change_coordinator_errors`
context manager to `enable_default_filters`. If the event will be
filtered globally, then no need to filter it with cm.

Fixes scylladb#10676
aleksbykov added a commit to aleksbykov/scylla-cluster-tests that referenced this pull request Apr 28, 2025
…rameter

PR scylladb/scylla-cluster-tests/scylladb#10386 filter some expected
raft error messages globally. But this change broke integration
unit test. Issue scylladb#10676.

Move DB event Filter from `ignore_topology_change_coordinator_errors`
context manager to `enable_default_filters`. If the event will be
filtered globally, then no need to filter it with cm.

Fixes scylladb#10676
aleksbykov added a commit to aleksbykov/scylla-cluster-tests that referenced this pull request Apr 28, 2025
…efault filters

PR scylladb/scylla-cluster-tests/scylladb#10386 filter some expected
raft error messages globally. But this change broke integration
unit test. Issue scylladb#10676.

Move DB event Filter from `ignore_topology_change_coordinator_errors`
context manager to `enable_default_filters`. If the event will be
filtered globally, then no need to filter it with cm.

Fixes scylladb#10676
fruch pushed a commit that referenced this pull request Apr 29, 2025
…efault filters

PR scylladb/scylla-cluster-tests/#10386 filter some expected
raft error messages globally. But this change broke integration
unit test. Issue #10676.

Move DB event Filter from `ignore_topology_change_coordinator_errors`
context manager to `enable_default_filters`. If the event will be
filtered globally, then no need to filter it with cm.

Fixes #10676
scylladbbot pushed a commit to scylladbbot/scylla-cluster-tests that referenced this pull request Apr 29, 2025
…efault filters

PR scylladb/scylla-cluster-tests/scylladb#10386 filter some expected
raft error messages globally. But this change broke integration
unit test. Issue scylladb#10676.

Move DB event Filter from `ignore_topology_change_coordinator_errors`
context manager to `enable_default_filters`. If the event will be
filtered globally, then no need to filter it with cm.

Fixes scylladb#10676

(cherry picked from commit 987afc1)
vponomaryov pushed a commit that referenced this pull request Apr 29, 2025
…efault filters

PR scylladb/scylla-cluster-tests/#10386 filter some expected
raft error messages globally. But this change broke integration
unit test. Issue #10676.

Move DB event Filter from `ignore_topology_change_coordinator_errors`
context manager to `enable_default_filters`. If the event will be
filtered globally, then no need to filter it with cm.

Fixes #10676

(cherry picked from commit 987afc1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants