AddRemoveDC nemesis fails when trying to decommission the only node in dc

When decommissioning a node, three keyspaces are present: 'keyspace1', 'keyspace_new_dc', and 'scylla_bench'. The decommission process is executed on keyspace_new_dc and us-east_nemesis_dc. However, us-east_nemesis_dc contains only one node, and the keyspace in this DC has a replication factor (RF) of 1. This makes decommissioning the node impossible, as there are no candidate nodes to receive the data.

logs:
```
2025-02-04 22:06:46.905: (DisruptionEvent Severity.ERROR) period_type=end event_id=8e34dc1b-0b98-44d9-b28e-2e4702c12970 duration=54m16s: nemesis_name=AddRemoveDc target_node=Node longevity-200gb-48h-verify-limited--db-node-775d283e-2 [98.81.100.146 | 10.12.3.79] errors=Encountered a bad command exit code!
Command: "/usr/bin/nodetool -u cassandra -pw 'cassandra'  decommission "
Exit code: 4
Stdout:
Stderr:
error executing POST request to http://localhost:10000/storage_service/decommission with parameters {}: remote replied with status code 500 Internal Server Error:
std::runtime_error (Decommission failed. See earlier errors (Rolled back: Failed to drain tablets: std::runtime_error (There are nodes with tablets to drain but no candidate nodes in DC us-east_nemesis_dc. Consider adding new nodes or reducing replication factor.)). Request ID: e966e93a-e343-11ef-09a0-b72664f26659)
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5501, in wrapper
result = method(*args[1:], **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 4863, in disrupt_add_remove_dc
self.cluster.decommission(new_node)
File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 5078, in decommission
node.run_nodetool("decommission", timeout=timeout, long_running=True, retry=0)
File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 2664, in run_nodetool
runner(cmd, timeout=timeout, ignore_status=ignore_status, verbose=verbose, retry=retry)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_long_running.py", line 67, in run_long_running_cmd
raise UnexpectedExit(result=result)
sdcm.remote.libssh2_client.exceptions.UnexpectedExit: Encountered a bad command exit code!
Command: "/usr/bin/nodetool -u cassandra -pw 'cassandra'  decommission "
Exit code: 4
Stdout:
Stderr:
error executing POST request to http://localhost:10000/storage_service/decommission with parameters {}: remote replied with status code 500 Internal Server Error:
std::runtime_error (Decommission failed. See earlier errors (Rolled back: Failed to drain tablets: std::runtime_error (There are nodes with tablets to drain but no candidate nodes in DC us-east_nemesis_dc. Consider adding new nodes or reducing replication factor.)). Request ID: e966e93a-e343-11ef-09a0-b72664f26659)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AddRemoveDC nemesis fails when trying to decommission the only node in dc #10052

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AddRemoveDC nemesis fails when trying to decommission the only node in dc #10052

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions