Skip to content

disrupt_add_drop_column was trying to execute on a node that was being removed #10647

Open
@timtimb0t

Description

@timtimb0t

Packages

Scylla version: 2025.2.0~dev-20250328.0ee06969595a with build-id 77d59c041af69597b81a5b40bf40eba1bc162d39

Kernel Version: 6.8.0-1025-aws

disrupt_refuse_connection_with_block_scylla_ports_on_banned_node started to execute at 16:43:22 with target node id == 34
disrupt_add_drop_column started to execute at 16:43:46 with the same target node that led to error:

Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 417, in _init_socket
    self.sock.connect((host, port))
OSError: [Errno 113] No route to host

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 520, in connect
    self._connect()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 535, in _connect
    self._init_socket(self.host, self.port)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 422, in _init_socket
    raise ConnectError("Error connecting to host '%s:%s' - %s" % (host, port, str(error_type))) from ex
sdcm.remote.libssh2_client.exceptions.ConnectError: Error connecting to host '10.4.8.15:22' - No route to host

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 593, in run
    self.connect()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 529, in connect
    raise ConnectTimeout(ex_msg) from exc
sdcm.remote.libssh2_client.exceptions.ConnectTimeout: Failed to connect in 60 seconds, last error: (ConnectError)Error connecting to host '10.4.8.15:22' - No route to host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 644, in _run
    return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 577, in _run_execute
    result = connection.run(**command_kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 596, in run
    return self._complete_run(
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 655, in _complete_run
    raise exception
sdcm.remote.libssh2_client.exceptions.FailedToRunCommand: Failed to run a command due to exception!

Command: '/usr/bin/cqlsh --no-color   --request-timeout=120 --connect-timeout=60  -e "describe keyspace1.standard1" 10.4.22.47'

Stdout:



Stderr:



Exception:  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 593, in run
    self.connect()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 529, in connect
    raise ConnectTimeout(ex_msg) from exc

Failed to connect in 60 seconds, last error: (ConnectError)Error connecting to host '10.4.8.15:22' - No route to host


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5688, in wrapper
    result = method(*args[1:], **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2646, in disrupt_add_drop_column
    self._add_drop_column_run_in_cycle()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2375, in _add_drop_column_run_in_cycle
    self._add_drop_column()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2341, in _add_drop_column
    self._add_drop_column_target_table = self._add_drop_column_get_target_table(
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2266, in _add_drop_column_get_target_table
    current_tables = self._get_all_tables_with_no_compact_storage(self._add_drop_column_tables_to_ignore)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2254, in _get_all_tables_with_no_compact_storage
    tables = get_db_tables(session=session,
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 1617, in get_db_tables
    create_table_statement = node.run_cqlsh(f"describe {keyspace_name}.{row.table_name}").stdout.upper()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 2899, in run_cqlsh
    cqlsh_out = self.remoter.run(cmd, timeout=timeout + 120,  # we give 30 seconds to cqlsh timeout mechanism to work
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 653, in run
    result = _run()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 72, in inner
    return func(*args, **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 646, in _run
    if self._run_on_retryable_exception(exc, new_session):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_libssh_cmd_runner.py", line 78, in _run_on_retryable_exception
    raise RetryableNetworkException(str(exc), original=exc)
sdcm.remote.base.RetryableNetworkException: Failed to run a command due to exception!

Command: '/usr/bin/cqlsh --no-color   --request-timeout=120 --connect-timeout=60  -e "describe keyspace1.standard1" 10.4.22.47'

Stdout:



Stderr:



Exception:  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 593, in run
    self.connect()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 529, in connect
    raise ConnectTimeout(ex_msg) from exc

Failed to connect in 60 seconds, last error: (ConnectError)Error connecting to host '10.4.8.15:22' - No route to host

due the target node was already removed

Impact

SCT issue

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 5 nodes (i4i.2xlarge)

Scylla Nodes used in this run:

  • longevity-parallel-topology-schema--db-node-da4d8f11-9 (63.34.125.60 | 10.4.8.107) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-8 (18.203.32.78 | 10.4.10.154) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-7 (52.213.161.58 | 10.4.8.164) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-63 (46.137.75.28 | 10.4.10.10) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-62 (54.194.173.20 | 10.4.9.172) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-61 (34.249.99.76 | 10.4.11.120) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-60 (52.30.6.181 | 10.4.9.87) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-6 (52.213.171.150 | 10.4.11.35) (shards: -1)
  • longevity-parallel-topology-schema--db-node-da4d8f11-59 (52.49.149.49 | 10.4.9.68) (shards: -1)
  • longevity-parallel-topology-schema--db-node-da4d8f11-58 (52.48.68.130 | 10.4.11.169) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-57 (54.78.82.159 | 10.4.11.169) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-56 (54.155.196.171 | 10.4.8.112) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-55 (54.217.212.110 | 10.4.10.203) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-54 (52.208.233.228 | 10.4.10.8) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-53 (34.246.46.128 | 10.4.8.234) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-52 (54.195.1.250 | 10.4.11.140) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-51 (54.155.149.223 | 10.4.11.170) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-50 (54.155.82.31 | 10.4.10.52) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-5 (52.213.49.108 | 10.4.9.187) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-49 (63.33.92.41 | 10.4.9.142) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-48 (54.171.41.78 | 10.4.11.189) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-47 (52.48.108.189 | 10.4.8.39) (shards: -1)
  • longevity-parallel-topology-schema--db-node-da4d8f11-46 (52.31.56.64 | 10.4.10.55) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-45 (34.249.11.166 | 10.4.8.13) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-44 (34.241.65.96 | 10.4.11.71) (shards: -1)
  • longevity-parallel-topology-schema--db-node-da4d8f11-43 (52.48.136.23 | 10.4.11.174) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-42 (52.215.50.4 | 10.4.8.101) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-41 (46.137.187.91 | 10.4.10.49) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-40 (54.220.241.98 | 10.4.10.214) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-4 (34.254.58.235 | 10.4.9.255) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-39 (34.247.185.134 | 10.4.11.158) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-38 (79.125.121.90 | 10.4.10.79) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-37 (54.194.175.44 | 10.4.8.255) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-36 (18.203.58.89 | 10.4.11.131) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-35 (52.209.181.164 | 10.4.8.32) (shards: -1)
  • longevity-parallel-topology-schema--db-node-da4d8f11-34 (34.248.27.178 | 10.4.8.15) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-33 (54.72.132.207 | 10.4.8.64) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-32 (52.209.122.0 | 10.4.11.96) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-31 (34.255.60.26 | 10.4.9.243) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-30 (52.31.22.227 | 10.4.8.164) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-3 (54.220.235.236 | 10.4.11.59) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-29 (63.35.15.69 | 10.4.11.232) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-28 (34.243.83.57 | 10.4.8.106) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-27 (52.215.232.135 | 10.4.10.74) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-26 (34.255.149.122 | 10.4.9.88) (shards: -1)
  • longevity-parallel-topology-schema--db-node-da4d8f11-25 (54.170.168.75 | 10.4.11.33) (shards: -1)
  • longevity-parallel-topology-schema--db-node-da4d8f11-24 (54.76.189.100 | 10.4.8.124) (shards: -1)
  • longevity-parallel-topology-schema--db-node-da4d8f11-23 (52.211.194.203 | 10.4.9.177) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-22 (54.216.8.63 | 10.4.11.61) (shards: -1)
  • longevity-parallel-topology-schema--db-node-da4d8f11-21 (34.252.94.107 | 10.4.11.12) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-20 (54.78.240.235 | 10.4.10.99) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-2 (18.203.207.129 | 10.4.9.118) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-19 (52.208.146.231 | 10.4.8.65) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-18 (52.49.11.151 | 10.4.10.217) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-17 (18.202.124.241 | 10.4.8.81) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-16 (63.34.33.12 | 10.4.10.71) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-15 (54.194.201.121 | 10.4.11.114) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-14 (54.228.27.58 | 10.4.9.130) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-13 (54.247.115.211 | 10.4.11.107) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-12 (52.18.93.139 | 10.4.10.159) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-11 (52.215.42.186 | 10.4.9.187) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-10 (34.251.242.115 | 10.4.9.82) (shards: 7)
  • longevity-parallel-topology-schema--db-node-da4d8f11-1 (52.213.196.126 | 10.4.11.151) (shards: 7)

OS / Image: ami-026d3178ff18b6e49 (aws: undefined_region)

Test: longevity-schema-topology-changes-12h-test
Test id: da4d8f11-d5d1-469a-a85d-dff5ada24fb6
Test name: scylla-master/tier1/longevity-schema-topology-changes-12h-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor da4d8f11-d5d1-469a-a85d-dff5ada24fb6
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs da4d8f11-d5d1-469a-a85d-dff5ada24fb6

Logs:

Jenkins job URL
Argus

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions