Improve handling of complete disk exhaustion

### Is your feature request related to a problem? Please describe.

If the Ra data directory is on a disk which is completely exhausted, multiple processes in the Ra supervision tree can crash permanently from hitting max restart intensity, leaving the Raft subsystem unavailable. This has a few consequences for RabbitMQ:

* Since `ra_log_ets` is terminated, `ra_directory` ETS and DETS tables are closed so `ra:force_delete_server/2` always fails, so QQs cannot be deleted to reclaim space. (`ra_directory:where_is_parent/2` crashes in `ra_server_sup_sup:prepare_server_stop/2`.)
* When using Khepri, the metadata store server process stops, so stale data can't be served by local queries or projections.
    * Since user metadata is stored here, you can't log into the server.
    * This also causes deletion of all vhosts since `rabbit_vhost_process`'s regular check notices that the vhost is not returned by the metadata store.

### Describe the solution you'd like

Ideally we would stop accepting writes once we see `enospc` but keep existing servers running and serving reads. Maybe we can shut down the WAL process or enter a 'read-only' mode that stops accepting writes but does not crash repeatedly.

### Describe alternatives you've considered

Alternatively for RabbitMQ, we could crash RabbitMQ when `ra_sup` exits. RabbitMQ continues to run despite Ra being offline and that leads to other consequences listed above. We could have an `enospc` completely crash the server to avoid those consequences.

### Additional context

<details><summary>Reproduction steps for local <code>enospc</code> on a single node...</summary>

This only works on Linux as macOS doesn't have tmpfs. For this test I'm running off of the tip of `v4.2.x` (`05eee5deb9ecd40beed549c31e4349781fd004ff` at time of writing).

1. `mkdir /tmp/raft-data`
1. `sudo mount -t tmpfs -o size=500M data /tmp/raft-data`
1. ```ini
   # raft.conf
   raft.data_dir = /tmp/raft-data
   # Tuning down WAL size makes ra_log_wal start and crash faster
   # in recovery, increasing chances of hitting max restart intensity.
   raft.wal_max_size_bytes = 67108864
   ```
1. `make run-broker RABBITMQ_CONFIG_FILE=raft.conf`
1. `perf-test -qq -u qq -qpf 1 -qpt 10 -qp qq-%d -x 5 -y 0 -c 100 --rate 1000`

Eventually `perf-test` will start throwing `com.rabbitmq.client.AuthenticationFailureException: ACCESS_REFUSED` as the metadata store is returning no users.

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve handling of complete disk exhaustion #585

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve handling of complete disk exhaustion #585

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions