redpanda-data · Feediver1 · Apr 4, 2025 · Mar 18, 2025 · Mar 18, 2025 · Mar 19, 2025
@@ -1133,6 +1133,76 @@ rpk topic alter-config <topic_name> --set redpanda.remote.read=true
 
 See also: xref:{topic-recovery-link}[Topic Recovery], xref:manage:kubernetes/k-remote-read-replicas.adoc[Remote Read Replicas]
 
+== Pause and resume uploads
+
+IMPORTANT: Redpanda strongly recommends using pause and resume only under the guidance of https://support.redpanda.com/hc/en-us/requests/new[Redpanda Support^] or a member of your account team. 
+
+Starting in version 25.1, you can troubleshoot issues your cluster has interacting with object storage by pausing and resuming uploads. You can do this with no risk of data consistency or data loss. To pause or resume segment uploads to object storage, use the xref:reference:properties/object-storage-properties.adoc#cloud_storage_enable_segment_uploads[`cloud_storage_enable_segment_uploads`] configuration property (default is `true`). This allows segment uploads to proceed after the pause completes and uploads resume.
+
+While uploads are paused, data accumulates locally, which can lead to full disks if the pause is prolonged. If the disks fill, Redpanda throttles produce requests and rejects new Kafka produce requests to prevent data from being written. Additionally, this pauses object storage housekeeping, meaning segments are neither uploaded nor removed from object storage. However, it is still possible to consume data from object storage while uploads are paused.
+
+When you set `cloud_storage_enable_segment_uploads` to `false`, all in-flight segment uploads complete, but no new segment uploads begin until the value is set back to `true`. During this pause, Tiered Storage enforces consistency by ensuring that no segment in local storage is deleted until it successfully uploads to object storage. This means that when uploads are resumed, no user intervention is needed, and no data gaps are created.
+
+Use the `redpanda_cloud_storage_paused_archivers` metric to monitor the status of paused uploads. It displays a non-zero value whenever uploads are paused.
+
+[WARNING]
+====
+Do not use `redpanda.remote.read` or `redpanda.remote.write` to pause and resume segment uploads. Doing so can lead to a gap between local data and data in object storage. In such cases, it is possible that the oldest segment is not aligned with the last uploaded segment. Given that these settings are unsafe, if you choose to set either `redpanda.remote.write` or the cluster configuration setting `cloud_storage_enable_remote_write` to `false`, you receive a warning:
+
+[source,bash]
+----
+Warning: disabling Tiered Storage may lead to data loss. If you only want to pause Tiered Storage temporarily, use the `cloud_storage_enable_segment_uploads` option. Abort?
+# The default is Yes.
+----
+====
+
+
+The following example shows a simple pause and resume with no gaps allowed:
+
+```bash
+rpk cluster config set cloud_storage_enable_segment_uploads false
+# Segments are not uploaded to cloud storage, and cloud storage housekeeping is not running.
+# The new data added to the topics with Tiered Storage is not deleted from disk
+# because it can't be uploaded. The disks may fill up eventually.
+# If the disks fill up, produce requests will be rejected.
+...
+
+rpk cluster config set cloud_storage_enable_segment_uploads true
+# At this point the uploads should resume seamlessly and
+# there should not be any data loss.
+```
+
+For some applications, where the newest data is more valuable than historical data, data accumulation can be worse than data loss. In such cases, where you cannot afford to lose the most recently-produced data by rejecting produce requests after producers have filled the local disks during the period of paused uploads, there is a less safe pause and resume mechanism. This mechanism prioritizes the ability to receive new data over retaining data that cannot be uploaded when disks are full:
+
+- Set the xref:reference:properties/object-storage-properties.adoc#cloud_storage_enable_remote_allow_gaps[`cloud_storage_enable_remote_allow_gaps`] cluster configuration property to `true`. This allows for gaps in the logs of all Tiered Storage topics in the cluster.
+- Set the `redpanda.remote.allow_gaps` configuration property to `true`. This allows gaps for one specific topic. This topic-level configuration option overrides the cluster-level default. 
+
+When you pause uploads and set one of these properties to `true`, there may be gaps in the range of offsets stored in object storage. You can seamlessly resume uploads by setting `*allow_gaps` to `true` at either the cluster or topic level. If set to `false`, disk space could be depleted and produce requests would be throttled.
+
+The following example shows how to pause and resume Tiered Storage uploads while allowing for gaps:
+
+```bash
+rpk cluster config set cloud_storage_enable_segment_uploads false
+# Segment uploads are paused and cloud storage housekeeping is not running.
+# New data is stored on the local volume, which may overflow.
+# To avoid overflow when allowing gaps in the log.
+# In this example, data that is not uploaded to cloud storage may be
+# deleted if a disk fills before uploads are resumed.
+
+rpk topic alter-config $topic-name --set redpanda.remote.allowgaps=true
+# Uploads are paused and gaps are allowed. Local retention is allowed
+# to delete data before it's uploaded, therefore some data loss is possible.
+...
+
+rpk cluster config set cloud_storage_enable_segment_uploads true
+# Uploads are resumed but there could be gaps in the offsets.
+# Wait until you see that the `redpanda_cloud_storage_paused_archivers` 
+# metric is equal to zero, indicating that uploads have resumed.
+
+# Disable the gap allowance previously set for the topic.
+rpk topic alter-config $topic-name --set redpanda.remote.allowgaps=false
+```
+
 == Caching
 
 When a consumer fetches an offset range that isn't available locally in the Redpanda data directory, Redpanda downloads remote segments from object storage. These downloaded segments are stored in the object storage cache.