Skip to content

Panic: invalid number of shards during connection pooling #605

@yarongilor

Description

@yarongilor

Packages

Scylla version: 2026.1.0~dev-20251104.fc37518affc8 with build-id d0aed830ec418cca7e757a143b3b85b120c9e396

Kernel Version: 6.14.0-1016-aws

Issue description

versions:

gemini-gocql-driver	v1.15.3	2025-09-06T16:49:42Z		e35803084ebafd200e3f7fd74a5be5dfdb409b2d
gemini	2.1.5	2025-10-14T18:23:43Z		1bad12f14f6832dbdc2211627079d91d8c610bf3

The gemini stress tool ran on loader-1.
At some point it failed for wrong number of shards, trying to connect to db-node-2:

2025-11-05 20:20:56.012: (GeminiStressEvent Severity.ERROR) period_type=end event_id=05d2fbed-cff6-42d9-b164-6c59729889d1 during_nemesis=NoCorruptRepair duration=12h23m40s: node=Node gemini-tombstones-sequence-gemini-t-loader-node-95c071d7-1 [108.130.4.141 | 10.4.2.15] (Type: m6i.2xlarge) (rack: RACK0) gemini_cmd=gemini                 --test-cluster="10.4.0.102,10.4.0.86,10.4.0.43,10.4.3.77,10.4.2.13,10.4.2.150"                 --seed=64                 --schema-seed=64                 --profiling-port=6060                 --bind=0.0.0.0:2112                 --outfile=/gemini_result_e3f7489c-5f71-481d-a5f5-126647e610b9.log                 --replication-strategy="{'class': 'NetworkTopologyStrategy', 'replication_factor': '3'}"                 --oracle-replication-strategy="{'class': 'NetworkTopologyStrategy', 'replication_factor': '1'}" --oracle-cluster="10.4.3.80" --test-statement-log-file=/gemini_test_statements_e3f7489c-5f71-481d-a5f5-126647e610b9.log                     --oracle-statement-log-file=/gemini_oracle_statements_e3f7489c-5f71-481d-a5f5-126647e610b9.log  --level=info --request-timeout=3s --connect-timeout=60s --consistency=QUORUM --async-objects-stabilization-backoff=10ms --async-objects-stabilization-attempts=10 --dataset-size=large --oracle-host-selection-policy=token-aware --test-host-selection-policy=token-aware --drop-schema=true --cql-features=normal --materialized-views=false --use-server-timestamps=true --use-lwt=false --use-counters=false --max-tables=1 --max-columns=16 --min-columns=8 --max-partition-keys=6 --min-partition-keys=2 --max-clustering-keys=4 --min-clustering-keys=2 --partition-key-distribution=uniform --partition-key-buffer-reuse-size=128 --statement-log-file-compression=zstd --duration 24h --warmup 10m --concurrency 200 --mode mixed --max-mutation-retries-backoff 10s --max-mutation-retries 30 --token-range-slices 10000 --max-errors-to-store 1 --statement-ratios '{"mutation":{"insert":0.6,"update":0.2,"delete":0.2}}'
result=Exit code: 2
Command output: ['{"level":"info","ts":"2025-11-05T13:02:33.039352708Z","logger":"store.test_store.gocql","msg":"gocql: unable to dial control conn 10.4.2.84:9042: dial tcp 10.4.2.84:9042: connect: connection refused","cluster":"test"}', '{"level":"info","ts":"2025-11-05T13:13:40.148581036Z","logger":"store.test_store.gocql","msg":"gocql: unable to dial control conn 10.4.0.86:9042: dial tcp 10.4.0.86:9042: connect: connection refused","cluster":"test"}']
errors=['Command error: panic: scylla: 10.4.0.86:9042 invalid number of shards\n\ngoroutine 22495328 [running]:\ngithub.com/gocql/gocql.(*scyllaConnPicker).Put(0xc071554000, 0xc0685a4000)\n\t/home/runner/go/pkg/mod/github.com/scylladb/[email protected]/scylla.go:466 +0x409\ngithub.com/gocql/gocql.(*hostConnPool).connect(0xc05a8c8150)\n\t/home/runner/go/pkg/mod/github.com/scylladb/[email protected]/connectionpool.go:550 +0x287\ngithub.com/gocql/gocql.(*hostConnPool).fill(0xc05a8c8150)\n\t/home/runner/go/pkg/mod/github.com/scylladb/[email protected]/connectionpool.go:389 +0x14f\ngithub.com/gocql/gocql/debounce.(*SimpleDebouncer).Debounce(0xc06a84dcd0, 0xe3bc80?)\n\t/home/runner/go/pkg/mod/github.com/scylladb/[email protected]/debounce/simple_debouncer.go:30 +0x5f\ngithub.com/gocql/gocql.(*hostConnPool).fill_debounce(...)\n\t/home/runner/go/pkg/mod/github.com/scylladb/[email protected]/connectionpool.go:421\ncreated by github.com/gocql/gocql.(*hostConnPool).Pick in goroutine 118\n\t/home/runner/go/pkg/mod/github.com/scylladb/[email protected]/connectionpool.go:309 +0x109\n\n']
  • This issue is a regression.
  • It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 6 nodes (i4i.large)

Scylla Nodes used in this run:

- gemini-tombstones-sequence-gemini-t-oracle-db-node-95c071d7-1 (34.244.234.197 | 10.4.3.80) (shards: 30)


- gemini-tombstones-sequence-gemini-t-db-node-95c071d7-9 (108.130.105.61 | 10.4.2.227) (shards: 2)


- gemini-tombstones-sequence-gemini-t-db-node-95c071d7-8 (34.252.52.130 | 10.4.2.84) (shards: 2)


- gemini-tombstones-sequence-gemini-t-db-node-95c071d7-7 (52.31.125.232 | 10.4.2.115) (shards: 2)


- gemini-tombstones-sequence-gemini-t-db-node-95c071d7-6 (34.244.118.147 | 10.4.2.150) (shards: -1)


- gemini-tombstones-sequence-gemini-t-db-node-95c071d7-5 (176.34.151.34 | 10.4.2.13) (shards: 2)


- gemini-tombstones-sequence-gemini-t-db-node-95c071d7-4 (34.242.12.193 | 10.4.3.77) (shards: 2)


- gemini-tombstones-sequence-gemini-t-db-node-95c071d7-3 (34.249.21.149 | 10.4.0.43) (shards: 2)


- gemini-tombstones-sequence-gemini-t-db-node-95c071d7-2 (34.250.73.158 | 10.4.0.86) (shards: -1)


- gemini-tombstones-sequence-gemini-t-db-node-95c071d7-11 (108.130.127.120 | 10.4.0.169) (shards: 2)


- gemini-tombstones-sequence-gemini-t-db-node-95c071d7-10 (176.34.75.209 | 10.4.1.111) (shards: 2)


- gemini-tombstones-sequence-gemini-t-db-node-95c071d7-1 (3.250.188.132 | 10.4.0.102) (shards: 2)

OS / Image: ami-0037eef98022b60df (aws: N/A)

Test: gemini-sequence-nemesis
Test id: 95c071d7-8cb1-424a-b328-7b7922bc6c25
Test name: scylla-staging/yarongilor/gemini-sequence-nemesis

Test method: `gemini_test.GeminiTest.test_load_random_with_nemesis`

Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 95c071d7-8cb1-424a-b328-7b7922bc6c25
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 95c071d7-8cb1-424a-b328-7b7922bc6c25

Logs:

Jenkins job URL
Argus

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions