One of the nodes fails with `No space left on device` during manager-related nemeses in 24-hour Cloud longevity test

## Issue description

- [ ] This issue is a regression.
- [x] It is unknown if this issue is a regression.

I am not 100% certain this problem is caused by the manager, but it seems in all the occurrences it happened while one of the manager's tasks was running (maybe this is just a coincidence though).
If you think it has nothing to do with the manager please move this issue to a different repo (I guess - to scylla-enteprise)

In the problematic longevity build we created a new Cloud cluster in Staging and triggered the following nemeses on it:

1. `2023-08-04 08:17:20` - `2023-08-04 08:52:20` - disrupt_rolling_restart_cluster
2. `2023-08-04 09:23:01` - `2023-08-04 23:34:46` - disrupt_mgmt_backup 

In parallel the load with the following config was running:

```
prepare_write_cmd:
  - "cassandra-stress write cl=QUORUM n=10485760 -schema 'replication(factor=3) compaction(strategy=SizeTieredCompactionStrategy)' -port jmx=6868 -mode cql3 native -rate threads=30 -pop seq=1..10485760 -col 'n=FIXED(10) size=FIXED(512)' -log interval=5"

stress_cmd:
  - "cassandra-stress write cl=QUORUM duration=1400m -schema 'replication(factor=3) compaction (strategy=SizeTieredCompactionStrategy)' -port jmx=6868 -mode cql3 native -rate threads=10 -pop seq=1..10485760 -col 'n=FIXED(10) size=FIXED(512)' -log interval=5"
  - "cassandra-stress read  cl=QUORUM duration=1400m -schema 'replication(factor=3) compaction (strategy=SizeTieredCompactionStrategy)' -port jmx=6868 -mode cql3 native -rate threads=10 -pop seq=1..10485760 -col 'n=FIXED(10) size=FIXED(512)' -log interval=5"
```

During the disrupt_mgmt_backup nemesis, the backup task was started via sctool
```
< t:2023-08-04 09:23:37,080 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/opt/scylla/scylla-cloud -c /opt/scylla/scylla-dbaas.yml cluster manager sctool --cluster-id 11232 --  backup  --location AWS_EU_WEST_1:s3:scylla-cloud-backup-11232-11016-v80c7h"...
```
While the task was running the storage utilization on the node scylla-cloud-operations-24h-devel-db-node-f04b5124-1 (172.18.122.20) spiked and eventually hit 100%. The node threw the error `No space left on device`:
```
2023-08-04 17:44:31.526 <2023-08-04 17:44:31.162>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=c1b5b322-9df8-41e9-ab26-fbb18cd9d7f1: type=NO_SPACE_ERROR regex=No space left on device line_number=7273 node=scylla-cloud-operations-24h-devel-db-node-f04b5124-1
2023-08-04T17:44:31.162 ip-172-18-122-20 !ERROR | scylla[24388]:  [shard 0] storage_service - Shutting down communications due to I/O errors until operator intervention: Disk error: std::system_error (error system:28, No space left on device)
```
Here are the storage utilization graphs from the monitor during the whole build (red line is for the problematic node)
![image](https://github.com/scylladb/scylla-manager/assets/73248646/fe8aaad8-f931-4c58-98ea-73aa9daf19ed)

![image](https://github.com/scylladb/scylla-manager/assets/73248646/4325e725-aa23-4683-8cf2-77ffc7ae88b7)
Some other graphs from the monitor
![image](https://github.com/scylladb/scylla-manager/assets/73248646/302aafc2-1b37-4459-82b4-c585521bcc49)

![image](https://github.com/scylladb/scylla-manager/assets/73248646/6bdc7d6f-92d9-4ce6-9fe5-c40eb8cdf929)

![image](https://github.com/scylladb/scylla-manager/assets/73248646/5d9c1b7d-b0e9-46f8-ab44-bc184fda2e21)


![image](https://github.com/scylladb/scylla-manager/assets/73248646/e784c412-0851-48d5-b73b-9718a4e5d57a)

![image](https://github.com/scylladb/scylla-manager/assets/73248646/0a58a8f8-0b23-489f-94ba-5bc5151bbfb9)


![image](https://github.com/scylladb/scylla-manager/assets/73248646/0138b230-e66b-42d5-9afd-772b5eecc708)



## Impact

One of the cluster nodes runs out of the storage space and stops responding

## How frequently does it reproduce?

It reproduced a few times in a row. The other reproducers will be added in the following comments

## Installation details

Kernel Version: 5.15.0-1039-aws
Scylla version (or git commit hash): `2022.2.11-20230705.27d29485de90` with build-id `f467a0ad8869d61384d8bbc8f20e4fb8fd281f4b`

Cloud manager version: 3.1.0



Cluster size: 6 nodes (i4i.large)

Scylla Nodes used in this run:
- scylla-cloud-operations-24h-devel-db-node-f04b5124-6 (null | 172.18.120.40) (shards: 2)
- scylla-cloud-operations-24h-devel-db-node-f04b5124-5 (null | 172.18.120.12) (shards: 2)
- scylla-cloud-operations-24h-devel-db-node-f04b5124-4 (null | 172.18.121.241) (shards: 2)
- scylla-cloud-operations-24h-devel-db-node-f04b5124-3 (null | 172.18.122.54) (shards: 2)
- scylla-cloud-operations-24h-devel-db-node-f04b5124-2 (null | 172.18.121.23) (shards: 2)
- scylla-cloud-operations-24h-devel-db-node-f04b5124-1 (null | 172.18.122.20) (shards: 2)


OS / Image: `` (aws: undefined_region)

Test: `scylla-cloud-longevity-terraform-operations-24h-aws`
Test id: `f04b5124-5beb-4019-b3c1-3559c4726f7d`
Test name: `siren-tests/longevity-tests/staging/scylla-cloud-longevity-terraform-operations-24h-aws`
Test config file(s):

- [scylla-cloud-operations-nemeses.yaml](https://github.com/scylladb/scylla-cluster-tests/blob/6e3d34cd1b9fc6533a4e5a3874fd36ad401b9a81/siren-tests/longevity-configurations/scylla-cloud-operations-nemeses.yaml)


<details>
<summary>
Logs and commands
</summary>


- Restore Monitor Stack command: `$ hydra investigate show-monitor f04b5124-5beb-4019-b3c1-3559c4726f7d`
- Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=f04b5124-5beb-4019-b3c1-3559c4726f7d)
- Show all stored logs command: `$ hydra investigate show-logs f04b5124-5beb-4019-b3c1-3559c4726f7d`


## Logs:
- **db-cluster-f04b5124.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/f04b5124-5beb-4019-b3c1-3559c4726f7d/20230805_001510/db-cluster-f04b5124.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/f04b5124-5beb-4019-b3c1-3559c4726f7d/20230805_001510/db-cluster-f04b5124.tar.gz)
- **sct-runner-events-f04b5124.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/f04b5124-5beb-4019-b3c1-3559c4726f7d/20230805_001510/sct-runner-events-f04b5124.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/f04b5124-5beb-4019-b3c1-3559c4726f7d/20230805_001510/sct-runner-events-f04b5124.tar.gz)
- **sct-f04b5124.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/f04b5124-5beb-4019-b3c1-3559c4726f7d/20230805_001510/sct-f04b5124.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/f04b5124-5beb-4019-b3c1-3559c4726f7d/20230805_001510/sct-f04b5124.log.tar.gz)
- **loader-set-f04b5124.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/f04b5124-5beb-4019-b3c1-3559c4726f7d/20230805_001510/loader-set-f04b5124.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/f04b5124-5beb-4019-b3c1-3559c4726f7d/20230805_001510/loader-set-f04b5124.tar.gz)
- **monitor-set-f04b5124.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/f04b5124-5beb-4019-b3c1-3559c4726f7d/20230805_001510/monitor-set-f04b5124.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/f04b5124-5beb-4019-b3c1-3559c4726f7d/20230805_001510/monitor-set-f04b5124.tar.gz)
- **siren-manager-set-f04b5124.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/f04b5124-5beb-4019-b3c1-3559c4726f7d/20230805_001510/siren-manager-set-f04b5124.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/f04b5124-5beb-4019-b3c1-3559c4726f7d/20230805_001510/siren-manager-set-f04b5124.tar.gz)


[Jenkins job URL](https://jenkins.scylladb.com/job/siren-tests/job/longevity-tests/job/staging/job/scylla-cloud-longevity-terraform-operations-24h-aws/4/)
[Argus](https://argus.scylladb.com/test/f8cce6c8-b573-4256-bdef-9684f5b7c896/runs?additionalRuns[]=f04b5124-5beb-4019-b3c1-3559c4726f7d)
</details>

                            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

One of the nodes fails with `No space left on device` during manager-related nemeses in 24-hour Cloud longevity test #3546

Issue description

Impact

How frequently does it reproduce?

Installation details

Logs:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

One of the nodes fails with No space left on device during manager-related nemeses in 24-hour Cloud longevity test #3546

Description

Issue description

Impact

How frequently does it reproduce?

Installation details

Logs:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

One of the nodes fails with `No space left on device` during manager-related nemeses in 24-hour Cloud longevity test #3546