Skip to content

connection was lost to loader node #10705

Open
@timtimb0t

Description

@timtimb0t

Argus:
https://argus.scylladb.com/tests/scylla-cluster-tests/f5da493d-5cb0-4f15-9d0f-687385d4b4b3

SCT wasnt able to connect to one of the loaders with the following error:

2025-04-19T04:54:43.162+00:00 longevity-large-partitions-200k-pks-loader-node-f5da493d-0-4 !INFO | sshd[9602]: ssh_dispatch_run_fatal: Connection from 10.142.0.95 port 57000: Broken pipe [preauth]

That led to stress thread failure at step:

Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5689, in wrapper
    result = method(*args[1:], **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 1781, in disrupt_load_and_stream
    self._prepare_test_table(ks='keyspace1', table='standard1')
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2127, in _prepare_test_table
    cs_thread = self.tester.run_stress_thread(
  File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 2024, in run_stress_thread
    return self.run_stress_cassandra_thread(**params)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 2070, in run_stress_cassandra_thread
    cs_thread = CassandraStressThread(loader_set=self.loaders,
  File "/home/ubuntu/scylla-cluster-tests/sdcm/stress_thread.py", line 78, in __init__
    super().__init__(loader_set=loader_set, stress_cmd=stress_cmd, timeout=timeout,
  File "/home/ubuntu/scylla-cluster-tests/sdcm/stress/base.py", line 59, in __init__
    RemoteDocker.pull_image(loader, self.docker_image_name)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/docker_remote.py", line 160, in pull_image
    docker_hub_login(remoter=node.remoter, use_sudo=node.is_docker)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/docker_utils.py", line 529, in docker_hub_login
    docker_info = remote_cmd("docker info", ignore_status=True)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/base.py", line 123, in sudo
    return self.run(cmd=cmd,
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 653, in run
    result = _run()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 72, in inner
    return func(*args, **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 646, in _run
    if self._run_on_retryable_exception(exc, new_session):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_libssh_cmd_runner.py", line 78, in _run_on_retryable_exception
    raise RetryableNetworkException(str(exc), original=exc)
sdcm.remote.base.RetryableNetworkException: Failed to run a command due to exception!

Command: 'sudo docker info'

Stdout:



Stderr:



Exception:  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 593, in run
    self.connect()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 529, in connect
    raise ConnectTimeout(ex_msg) from exc

Failed to connect in 60 seconds, last error: (Timeout)

Not sure about the root cause of this failure

Metadata

Metadata

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions