Skip to content

Scaling cluster down and up before it's finished gets the cluster stuck #2436

Open
@mflendrich

Description

@mflendrich

Issue originally authored by tnozicka as #1189

Describe the bug
Scalling cluster down and up before it's finished gets the cluster stuck on a node that's decommissioned but never removed.

To Reproduce
Steps to reproduce the behavior:

$ # Create a cluster with 2 nodes
$ yq e 'del(.metadata.generateName) | .metadata.name="example" | .spec.version = "5.2.0-rc2" | .spec.datacenter.racks[0].members = 2' ./test/e2e/fixture/scylla/basic.scyllacluster.yaml| kubectl apply --server-side --force-conflicts -f -

$ # Wait for the cluster to rollout

$ # Scale down to 1 node but don't wait
$ kubectl patch scyllacluster.scylla.scylladb.com/example --type='json' -p='[{"op": "replace", "path": "/spec/datacenter/racks/0/members", "value": 1}]'

$ # Wait for the second node to start decommissioning / go unready

$ # Scale up to 3 nodes while the second node is still decommisioning
$ kubectl patch scyllacluster.scylla.scylladb.com/example --type='json' -p='[{"op": "replace", "path": "/spec/datacenter/racks/0/members", "value": 1}]'

$ # Observe how it gets stuck on the second node

I've been able to reproduced this every time.

Initially I've reproduced it with scylladb 5.0.5 but to make sure this isn't scylladb/scylladb#11302 I've bumped to 5.2.0-rc2.

Expected behavior
Eventually gets to 3 nodes.

Additional context

$ kubectl get scyllacluster,sts,pods,svc,pvc
NAME                                        AGE
scyllacluster.scylla.scylladb.com/example   31m

NAME                                            READY   AGE
statefulset.apps/example-us-east-1-us-east-1a   1/3     27m

NAME                                 READY   STATUS    RESTARTS   AGE
pod/example-us-east-1-us-east-1a-0   2/2     Running   0          12m
pod/example-us-east-1-us-east-1a-1   1/2     Running   0          11m

NAME                                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                                                                          AGE
service/example-client                   ClusterIP   10.101.69.218   <none>        7000/TCP,7001/TCP,7199/TCP,10001/TCP,9180/TCP,5090/TCP,9100/TCP,9042/TCP,9142/TCP,19042/TCP,19142/TCP,9160/TCP   12m
service/example-us-east-1-us-east-1a-0   ClusterIP   10.109.78.80    <none>        7000/TCP,7001/TCP,7199/TCP,10001/TCP,9180/TCP,5090/TCP,9100/TCP,9042/TCP,9142/TCP,19042/TCP,19142/TCP,9160/TCP   12m
service/example-us-east-1-us-east-1a-1   ClusterIP   10.104.226.80   <none>        7000/TCP,7001/TCP,7199/TCP,10001/TCP,9180/TCP,5090/TCP,9100/TCP,9042/TCP,9142/TCP,19042/TCP,19142/TCP,9160/TCP   12m
service/example-us-east-1-us-east-1a-2   ClusterIP   10.109.59.242   <none>        7000/TCP,7001/TCP,7199/TCP,10001/TCP,9180/TCP,5090/TCP,9100/TCP,9042/TCP,9142/TCP,19042/TCP,19142/TCP,9160/TCP   7m54s

NAME                                                        STATUS   VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS    AGE
persistentvolumeclaim/data-example-us-east-1-us-east-1a-0   Bound    local-pv-19c669cf   23Gi       RWO            local-storage   12m
persistentvolumeclaim/data-example-us-east-1-us-east-1a-1   Bound    local-pv-bb0d6ca7   23Gi       RWO            local-storage   11m
NFO  2023-03-17 16:32:35,322 [shard 0] storage_service - decommission[96dc4cae-0a54-4f95-a00b-ad5c6c96ad76]: Stopped heartbeat_updater
INFO  2023-03-17 16:32:35,323 [shard 0] storage_service - decommission[96dc4cae-0a54-4f95-a00b-ad5c6c96ad76]: leaving Raft group 0
INFO  2023-03-17 16:32:35,323 [shard 0] raft_group0 - leaving group 0 (my id = 5fce5ebb-5461-42ab-a4f3-6b52e23fa87d)...
INFO  2023-03-17 16:32:35,365 [shard 0] storage_service - decommission[96dc4cae-0a54-4f95-a00b-ad5c6c96ad76]: left Raft group 0
INFO  2023-03-17 16:32:35,365 [shard 0] storage_service - Stop transport: starts
INFO  2023-03-17 16:32:35,365 [shard 0] migration_manager - stopping migration service
INFO  2023-03-17 16:32:35,365 [shard 0] storage_service - Shutting down native transport server
INFO  2023-03-17 16:32:35,365 [shard 0] cql_server_controller - CQL server stopped
INFO  2023-03-17 16:32:35,365 [shard 0] storage_service - Shutting down native transport server was successful
INFO  2023-03-17 16:32:35,365 [shard 0] storage_service - Shutting down rpc server
INFO  2023-03-17 16:32:35,365 [shard 0] storage_service - Shutting down rpc server was successful
INFO  2023-03-17 16:32:35,365 [shard 0] storage_service - Shutting down alternator server
INFO  2023-03-17 16:32:35,365 [shard 0] storage_service - Shutting down alternator server was successful
INFO  2023-03-17 16:32:35,365 [shard 0] storage_service - Shutting down redis server
INFO  2023-03-17 16:32:35,365 [shard 0] storage_service - Shutting down redis server was successful
INFO  2023-03-17 16:32:35,365 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done
INFO  2023-03-17 16:32:35,365 [shard 0] gossip - My status = LEFT
WARN  2023-03-17 16:32:35,365 [shard 0] gossip - No local state or state is in silent shutdown, not announcing shutdown
INFO  2023-03-17 16:32:35,365 [shard 0] gossip - Disable and wait for gossip loop started
INFO  2023-03-17 16:32:35,542 [shard 0] gossip - failure_detector_loop: Finished main loop
INFO  2023-03-17 16:32:35,542 [shard 0] gossip - Gossip is now stopped
INFO  2023-03-17 16:32:35,542 [shard 0] storage_service - Stop transport: stop_gossiping done
INFO  2023-03-17 16:32:35,542 [shard 0] messaging_service - Stopping nontls server
INFO  2023-03-17 16:32:35,542 [shard 0] messaging_service - Stopping tls server
INFO  2023-03-17 16:32:35,542 [shard 0] messaging_service - Stopping tls server - Done
INFO  2023-03-17 16:32:35,542 [shard 0] messaging_service - Stopping client for address: 10.109.78.80:0
INFO  2023-03-17 16:32:35,542 [shard 0] messaging_service - Stopping client for address: 10.104.226.80:0
INFO  2023-03-17 16:32:35,542 [shard 0] messaging_service - Stopping client for address: 10.109.78.80:0
INFO  2023-03-17 16:32:35,542 [shard 0] messaging_service - Stopping client for address: 10.109.78.80:0
INFO  2023-03-17 16:32:35,542 [shard 0] messaging_service - Stopping client for address: 10.109.78.80:0
INFO  2023-03-17 16:32:35,542 [shard 0] messaging_service - Stopping client for address: 10.109.78.80:0
INFO  2023-03-17 16:32:35,542 [shard 0] messaging_service - Stopping client for address: 10.104.226.80:0
INFO  2023-03-17 16:32:35,543 [shard 0] messaging_service - Stopping client for address: 10.109.78.80:0 - Done
INFO  2023-03-17 16:32:35,543 [shard 0] messaging_service - Stopping client for address: 10.109.78.80:0 - Done
INFO  2023-03-17 16:32:35,543 [shard 0] messaging_service - Stopping client for address: 10.109.78.80:0 - Done
INFO  2023-03-17 16:32:35,543 [shard 0] messaging_service - Stopping client for address: 10.104.226.80:0 - Done
INFO  2023-03-17 16:32:35,544 [shard 0] messaging_service - Stopping client for address: 10.109.78.80:0 - Done
INFO  2023-03-17 16:32:35,544 [shard 0] messaging_service - Stopping client for address: 10.104.226.80:0 - Done
INFO  2023-03-17 16:32:35,544 [shard 0] messaging_service - Stopping client for address: 10.109.78.80:0 - Done
INFO  2023-03-17 16:32:35,544 [shard 0] messaging_service - Stopping nontls server - Done
INFO  2023-03-17 16:32:35,544 [shard 0] storage_service - messaging_service stopped
INFO  2023-03-17 16:32:35,544 [shard 0] storage_service - Stop transport: shutdown messaging_service done
INFO  2023-03-17 16:32:35,544 [shard 0] storage_service - Stop transport: shutdown stream_manager done
INFO  2023-03-17 16:32:35,544 [shard 0] storage_service - Stop transport: done
INFO  2023-03-17 16:32:35,544 [shard 0] storage_service - DECOMMISSIONING: stopped transport
INFO  2023-03-17 16:32:35,544 [shard 0] batchlog_manager - Asked to drain
INFO  2023-03-17 16:32:35,544 [shard 0] batchlog_manager - Drained
INFO  2023-03-17 16:32:35,544 [shard 0] storage_service - DECOMMISSIONING: stop batchlog_manager done
INFO  2023-03-17 16:32:35,907 [shard 0] compaction - [Compact system.local 53173930-c4e1-11ed-a898-c81a46a262e5] Compacting [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/me-11-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/me-10-big-Data.db:level=0:origin=compaction]
INFO  2023-03-17 16:32:35,908 [shard 0] storage_service - DECOMMISSIONING: set_bootstrap_state done
INFO  2023-03-17 16:32:35,908 [shard 0] storage_service - entering DECOMMISSIONED mode
INFO  2023-03-17 16:32:35,908 [shard 0] storage_service - DECOMMISSIONING: done
INFO  2023-03-17 16:32:36,307 [shard 0] compaction - [Compact system.local 53173930-c4e1-11ed-a898-c81a46a262e5] Compacted 2 sstables to [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/me-12-big-Data.db:level=0]. 86kB to 45kB (~52% of original) in 311ms = 144kB/s. ~256 total partitions merged to 1.
INFO  2023-03-17 16:32:36,353 [shard 0] raft_group_registry - marking Raft server 5fce5ebb-5461-42ab-a4f3-6b52e23fa87d as dead for raft groups
INFO  2023-03-17 16:32:36,353 [shard 0] raft_group_registry - marking Raft server afdb17f4-86ef-44ea-bdb4-01ddb1dc2902 as dead for raft groups
I0317 16:32:37.658371       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:32:39.310344       1 sidecar/sync.go:92] "The node is already decommissioned"
I0317 16:32:47.659056       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:32:57.657179       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:33:07.658469       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:33:17.660454       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:33:27.434626       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:33:27.657882       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:33:37.655646       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:33:47.657373       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:33:57.659529       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:34:07.656316       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:34:17.659727       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:34:27.658937       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:34:37.660100       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:34:47.658364       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:34:53.434242       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:34:57.657199       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:35:07.657590       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:35:17.658096       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:35:27.655923       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:35:37.658450       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:35:47.661271       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:35:57.659960       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:36:07.660792       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:36:08.437477       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:36:17.659351       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:36:27.657682       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:36:37.656668       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
I0317 16:36:47.658896       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
...
I0317 16:46:57.659029       1 sidecar/probes.go:122] "readyz probe: node is not ready" Service="test/example-us-east-1-us-east-1a-1"
...

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.lifecycle/from-migrationIndicates that this issue is a copy of a corresponding issue mentioned in the description.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions