Open
Description
Packages
Scylla version: 2025.1.0~rc4-20250323.bc983017832c
with build-id 088ceb686f4b2d57120be368a4b86d6ac0a04cd5
Kernel Version: 6.8.0-1024-aws
Issue description
SB generated following critical event:
2025-03-25 00:46:27.003: (ScyllaBenchEvent Severity.CRITICAL) period_type=end event_id=fbbe6055-b1f7-485e-8bc2-94ec977b651f duration=12h10m0s: node=Node longevity-twcs-48h-2025-1-loader-node-4b3cfc69-3 [18.207.92.193 | 10.12.9.109]
stress_cmd=scylla-bench -workload=timeseries -mode=read -partition-count=4000 -concurrency=150 -replication-factor=3 -clustering-row-count=10000 -clustering-row-size=200 -rows-per-request=1 -start-timestamp=1742819763459321842 -write-rate 200 -distribution uniform --connection-count 100 -duration=720m -timeout=30s -retry-number=30 -retry-interval=80ms,1s -nodes 10.12.8.142,10.12.11.100,10.12.11.39,10.12.8.35
errors:
Stress command execution failed with: Command did not complete within 43800 seconds!
Command: "sudo docker exec dc379d85867fcaf9adb599284c98705c92aca0b7dde4ddef6255b6ab67841c27 /bin/sh -c 'scylla-bench -workload=timeseries -mode=read -partition-count=4000 -concurrency=150 -replication-factor=3 -clustering-row-count=10000 -clustering-row-size=200 -rows-per-request=1 -start-timestamp=1742819763459321842 -write-rate 200 -distribution uniform --connection-count 100 -duration=720m -timeout=30s -retry-number=30 -retry-interval=80ms,1s -nodes 10.12.8.142,10.12.11.100,10.12.11.39,10.12.8.35'"
Stdout:
3h19m52s 74920 0 0 19ms 11ms 5.9ms 4.4ms 3.7ms 1.7ms 2ms
3h19m53s 70237 0 0 22ms 11ms 5.9ms 4.4ms 3.9ms 1.9ms 2.1ms
3h19m54s 73280 0 0 24ms 9.2ms 5.8ms 4.3ms 3.6ms 1.8ms 2ms
3h19m55s 83978 0 0 17ms 9.7ms 5.7ms 3.9ms 3.2ms 1.4ms 1.8ms
3h19m56s 84773 0 0 20ms 10ms 5.9ms 4ms 3.1ms 1.4ms 1.8ms
3h19m57s 84805 0 0 24ms 11ms 5.7ms 3.8ms 3.1ms 1.4ms 1.8ms
3h19m58s 83197 0 0 24ms 12ms 6.1ms 4ms 3.2ms 1.4ms 1.8ms
3h19m59s 84793 0 0 23ms 11ms 6ms 3.9ms 3.1ms 1.4ms 1.8ms
3h20m0s 83973 0 0 21ms 10ms 6ms 3.9ms 3.2ms 1.4ms 1.8ms
3h20m1s 87626 0 0 14ms 8.3ms 4.5ms 3.3ms 2.9ms 1.4ms 1.7ms
Stderr:
2025/03/24 13:55:34 gocql: unable to dial control conn 10.12.11.39:9042: dial tcp 10.12.11.39:9042: connect: connection refused
2025/03/24 15:56:29 gocql: unable to dial control conn 10.12.11.100:9042: dial tcp 10.12.11.100:9042: connect: connection refused
2025/03/24 17:55:02 error: failed to connect to "[HostInfo hostname=\"10.12.10.15\" connectAddress=\"10.12.10.15\" peer=\"10.12.10.15\" rpc_address=\"10.12.10.15\" broadcast_address=\"<nil>\" preferred_ip=\"<nil>\" connect_addr=\"10.12.10.15\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"us-east\" rack=\"1c\" host_id=\"95ead47a-461b-4a5b-9e42-d730eb16b0b8\" version=\"v3.0.8\" state=DOWN num_tokens=256]" due to error: gocql: no response received from cassandra within timeout period (potentially executed: true)
2025/03/24 17:56:05 error: failed to connect to "[HostInfo hostname=\"10.12.10.15\" connectAddress=\"10.12.10.15\" peer=\"10.12.10.15\" rpc_address=\"10.12.10.15\" broadcast_address=\"<nil>\" preferred_ip=\"<nil>\" connect_addr=\"10.12.10.15\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"us-east\" rack=\"1c\" host_id=\"95ead47a-461b-4a5b-9e42-d730eb16b0b8\" version=\"v3.0.8\" state=DOWN num_tokens=256]" due to error: gocql: no response received from cassandra within timeout period (potentially executed: true)
2025/03/24 17:57:05 error: failed to connect to "[HostInfo hostname=\"10.12.10.15\" connectAddress=\"10.12.10.15\" peer=\"10.12.10.15\" rpc_address=\"10.12.10.15\" broadcast_address=\"<nil>\" preferred_ip=\"<nil>\" connect_addr=\"10.12.10.15\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"us-east\" rack=\"1c\" host_id=\"95ead47a-461b-4a5b-9e42-d730eb16b0b8\" version=\"v3.0.8\" state=DOWN num_tokens=256]" due to error: gocql: no response received from cassandra within timeout period (potentially executed: true)
2025/03/24 17:58:02 error: failed to connect to "[HostInfo hostname=\"10.12.10.15\" connectAddress=\"10.12.10.15\" peer=\"10.12.10.15\" rpc_address=\"10.12.10.15\" broadcast_address=\"<nil>\" preferred_ip=\"<nil>\" connect_addr=\"10.12.10.15\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"us-east\" rack=\"1c\" host_id=\"95ead47a-461b-4a5b-9e42-d730eb16b0b8\" version=\"v3.0.8\" state=DOWN num_tokens=256]" due to error: gocql: no response received from cassandra within timeout period (potentially executed: true)
2025/03/24 17:59:05 error: failed to connect to "[HostInfo hostname=\"10.12.10.15\" connectAddress=\"10.12.10.15\" peer=\"10.12.10.15\" rpc_address=\"10.12.10.15\" broadcast_address=\"<nil>\" preferred_ip=\"<nil>\" connect_addr=\"10.12.10.15\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"us-east\" rack=\"1c\" host_id=\"95ead47a-461b-4a5b-9e42-d730eb16b0b8\" version=\"v3.0.8\" state=DOWN num_tokens=256]" due to error: gocql: no response received from cassandra within timeout period (potentially executed: true)
2025/03/24 18:00:05 error: failed to connect to "[HostInfo hostname=\"10.12.10.15\" connectAddress=\"10.12.10.15\" peer=\"10.12.10.15\" rpc_address=\"10.12.10.15\" broadcast_address=\"<nil>\" preferred_ip=\"<nil>\" connect_addr=\"10.12.10.15\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"us-east\" rack=\"1c\" host_id=\"95ead47a-461b-4a5b-9e42-d730eb16b0b8\" version=\"v3.0.8\" state=DOWN num_tokens=256]" due to error: gocql: no response received from cassandra within timeout period (potentially executed: true)
2025/03/24 18:01:05 error: failed to connect to "[HostInfo hostname=\"10.12.10.15\" connectAddress=\"10.12.10.15\" peer=\"10.12.10.15\" rpc_address=\"10.12.10.15\" broadcast_address=\"<nil>\" preferred_ip=\"<nil>\" connect_addr=\"10.12.10.15\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"us-east\" rack=\"1c\" host_id=\"95ead47a-461b-4a5b-9e42-d730eb16b0b8\" version=\"v3.0.8\" state=DOWN num_tokens=256]" due to error: gocql: no response received from cassandra within timeout period (potentially executed: true)
2025/03/24 18:02:05 error: failed to connect to "[HostInfo hostname=\"10.12.10.15\" connectAddress=\"10.12.10.15\" peer=\"10.12.10.15\" rpc_address=\"10.12.10.15\" broadcast_address=\"<nil>\" preferred_ip=\"<nil>\" connect_addr=\"10.12.10.15\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"us-east\" rack=\"1c\" host_id=\"95ead47a-461b-4a5b-9e42-d730eb16b0b8\" version=\"v3.0.8\" state=DOWN num_tokens=256]" due to error: gocql: no response received from cassandra within timeout period (potentially executed: true)
Node 10.12.10.15 was a target node for one of previously executed nemeses and was removed from the cluster but SB tried to proceed to execute cql commands on it
Impact
No explicit impact
Installation details
Cluster size: 4 nodes (i3en.2xlarge)
Scylla Nodes used in this run:
- longevity-twcs-48h-2025-1-db-node-4b3cfc69-9 (54.237.128.252 | 10.12.8.252) (shards: 7)
- longevity-twcs-48h-2025-1-db-node-4b3cfc69-8 (18.234.45.247 | 10.12.11.84) (shards: 7)
- longevity-twcs-48h-2025-1-db-node-4b3cfc69-7 (34.227.86.176 | 10.12.11.59) (shards: 7)
- longevity-twcs-48h-2025-1-db-node-4b3cfc69-6 (18.212.213.116 | 10.12.10.15) (shards: 7)
- longevity-twcs-48h-2025-1-db-node-4b3cfc69-5 (54.92.197.116 | 10.12.9.151) (shards: 7)
- longevity-twcs-48h-2025-1-db-node-4b3cfc69-4 (3.89.162.251 | 10.12.8.35) (shards: 7)
- longevity-twcs-48h-2025-1-db-node-4b3cfc69-3 (54.144.251.188 | 10.12.11.39) (shards: 7)
- longevity-twcs-48h-2025-1-db-node-4b3cfc69-2 (54.160.168.112 | 10.12.11.100) (shards: 7)
- longevity-twcs-48h-2025-1-db-node-4b3cfc69-1 (98.84.173.76 | 10.12.8.142) (shards: 7)
OS / Image: ami-0cb84a63946021a33
(aws: undefined_region)
Test: longevity-twcs-48h-test
Test id: 4b3cfc69-57fc-4295-83f6-2411eb2b33bd
Test name: scylla-2025.1/tier1/longevity-twcs-48h-test
Test method: longevity_twcs_test.TWCSLongevityTest.test_custom_time
Test config file(s):
Logs and commands
- Restore Monitor Stack command:
$ hydra investigate show-monitor 4b3cfc69-57fc-4295-83f6-2411eb2b33bd
- Restore monitor on AWS instance using Jenkins job
- Show all stored logs command:
$ hydra investigate show-logs 4b3cfc69-57fc-4295-83f6-2411eb2b33bd
Logs:
- longevity-twcs-48h-2025-1-db-node-4b3cfc69-5 - https://cloudius-jenkins-test.s3.amazonaws.com/4b3cfc69-57fc-4295-83f6-2411eb2b33bd/20250324_122953/longevity-twcs-48h-2025-1-db-node-4b3cfc69-5-4b3cfc69.tar.zst
- longevity-twcs-48h-2025-1-db-node-4b3cfc69-2 - https://cloudius-jenkins-test.s3.amazonaws.com/4b3cfc69-57fc-4295-83f6-2411eb2b33bd/20250324_122953/longevity-twcs-48h-2025-1-db-node-4b3cfc69-2-4b3cfc69.tar.zst
- longevity-twcs-48h-2025-1-db-node-4b3cfc69-6 - https://cloudius-jenkins-test.s3.amazonaws.com/4b3cfc69-57fc-4295-83f6-2411eb2b33bd/20250324_122953/longevity-twcs-48h-2025-1-db-node-4b3cfc69-6-4b3cfc69.tar.zst
- longevity-twcs-48h-2025-1-db-node-4b3cfc69-4 - https://cloudius-jenkins-test.s3.amazonaws.com/4b3cfc69-57fc-4295-83f6-2411eb2b33bd/20250324_122953/longevity-twcs-48h-2025-1-db-node-4b3cfc69-4-4b3cfc69.tar.zst
- longevity-twcs-48h-2025-1-db-node-4b3cfc69-7 - https://cloudius-jenkins-test.s3.amazonaws.com/4b3cfc69-57fc-4295-83f6-2411eb2b33bd/20250324_122953/longevity-twcs-48h-2025-1-db-node-4b3cfc69-7-4b3cfc69.tar.zst
- db-cluster-4b3cfc69.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/4b3cfc69-57fc-4295-83f6-2411eb2b33bd/20250325_004852/db-cluster-4b3cfc69.tar.zst
- sct-runner-events-4b3cfc69.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/4b3cfc69-57fc-4295-83f6-2411eb2b33bd/20250325_004852/sct-runner-events-4b3cfc69.tar.zst
- sct-4b3cfc69.log.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/4b3cfc69-57fc-4295-83f6-2411eb2b33bd/20250325_004852/sct-4b3cfc69.log.tar.zst
- loader-set-4b3cfc69.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/4b3cfc69-57fc-4295-83f6-2411eb2b33bd/20250325_004852/loader-set-4b3cfc69.tar.zst
- monitor-set-4b3cfc69.tar.zst - https://cloudius-jenkins-test.s3.amazonaws.com/4b3cfc69-57fc-4295-83f6-2411eb2b33bd/20250325_004852/monitor-set-4b3cfc69.tar.zst
- builder-4b3cfc69.log.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4b3cfc69-57fc-4295-83f6-2411eb2b33bd/upload_20250325_005016/builder-4b3cfc69.log.tar.gz
Metadata
Metadata
Assignees
Labels
No labels