Skip to content

scylla-bench fails to reconnect after altering table #114

Open
@soyacz

Description

@soyacz

Installation details

Kernel Version: 5.15.0-1026-aws
Scylla version (or git commit hash): 5.2.0~dev-20221209.6075e01312a5 with build-id 0e5d044b8f9e5bdf7f53cc3c1e959fab95bf027c

Cluster size: 9 nodes (i3.2xlarge)

Scylla Nodes used in this run:

  • longevity-counters-multidc-master-db-node-7785df01-9 (54.157.115.162 | 10.12.2.62) (shards: 7)
  • longevity-counters-multidc-master-db-node-7785df01-8 (3.238.92.3 | 10.12.2.95) (shards: 7)
  • longevity-counters-multidc-master-db-node-7785df01-7 (3.236.190.51 | 10.12.0.119) (shards: 7)
  • longevity-counters-multidc-master-db-node-7785df01-6 (54.212.64.38 | 10.15.0.77) (shards: 7)
  • longevity-counters-multidc-master-db-node-7785df01-5 (35.92.94.31 | 10.15.3.207) (shards: 7)
  • longevity-counters-multidc-master-db-node-7785df01-4 (34.219.193.110 | 10.15.3.94) (shards: 7)
  • longevity-counters-multidc-master-db-node-7785df01-3 (52.213.121.166 | 10.4.0.42) (shards: 7)
  • longevity-counters-multidc-master-db-node-7785df01-2 (54.229.18.181 | 10.4.2.143) (shards: 7)
  • longevity-counters-multidc-master-db-node-7785df01-1 (34.245.75.18 | 10.4.0.195) (shards: 7)

OS / Image: ami-0b85d6f35bddaff65 ami-0a1ff01b931943772 ami-08e5c2ae0089cade3 (aws: eu-west-1)

Test: longevity-counters-6h-multidc-test
Test id: 7785df01-a1fe-483a-beb7-2f63b9044b87
Test name: scylla-master/raft/longevity-counters-6h-multidc-test
Test config file(s):

Issue description

Counters test in multidc scenario is failing persistenlty after altering table.
E.g. after running ALTER TABLE scylla_bench.test_counters WITH bloom_filter_fp_chance = 0.45374057709882093 or ALTER TABLE scylla_bench.test_counters WITH read_repair_chance = 0.9;, or even ALTER TABLE scylla_bench.test_counters WITH comment = 'IHQS6RAYS5VQ6CQZYBYEX1GP';
after such changes, scylla-bench is failing tests due error:

2022/12/09 15:26:29 error: failed to connect to "[HostInfo hostname=\"10.12.0.119\" connectAddress=\"10.12.0.119\" peer=\"<nil>\" rpc_address=\"10.12.0.119\" broadcast_address=\"10.12.0.119\" preferred_ip=\"<nil>\" connect_addr=\"10.12.0.119\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"us-eastscylla_node_east\" rack=\"1a\" host_id=\"ec773dfb-ef87-4ab8-abbf-190e3e082e4c\" version=\"v3.0.8\" state=DOWN num_tokens=256]" due to error: gocql: no response to connection startup within timeout

later it looks connection is recovered - so connection issues are not permanent. But it is enough to fail test critically ending the test.

  • Restore Monitor Stack command: $ hydra investigate show-monitor 7785df01-a1fe-483a-beb7-2f63b9044b87
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 7785df01-a1fe-483a-beb7-2f63b9044b87

Logs:

| 20221209_161654 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/7785df01-a1fe-483a-beb7-2f63b9044b87/20221209_161654/grafana-screenshot-longevity-counters-6h-multidc-test-scylla-per-server-metrics-nemesis-20221209_161803-longevity-counters-multidc-master-monitor-node-7785df01-1.png |
| 20221209_161654 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/7785df01-a1fe-483a-beb7-2f63b9044b87/20221209_161654/grafana-screenshot-overview-20221209_161654-longevity-counters-multidc-master-monitor-node-7785df01-1.png |
| 20221209_162553 | db-cluster | https://cloudius-jenkins-test.s3.amazonaws.com/7785df01-a1fe-483a-beb7-2f63b9044b87/20221209_162553/db-cluster-7785df01.tar.gz |
| 20221209_162553 | loader-set | https://cloudius-jenkins-test.s3.amazonaws.com/7785df01-a1fe-483a-beb7-2f63b9044b87/20221209_162553/loader-set-7785df01.tar.gz |
| 20221209_162553 | monitor-set | https://cloudius-jenkins-test.s3.amazonaws.com/7785df01-a1fe-483a-beb7-2f63b9044b87/20221209_162553/monitor-set-7785df01.tar.gz |
| 20221209_162553 | sct | https://cloudius-jenkins-test.s3.amazonaws.com/7785df01-a1fe-483a-beb7-2f63b9044b87/20221209_162553/sct-runner-7785df01.tar.gz

Jenkins job URL

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions