-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TS Safe pause and resume #1017
base: beta
Are you sure you want to change the base?
TS Safe pause and resume #1017
Conversation
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
PR Change SummaryIntroduced the Safe Pause and Resume feature for Tiered Storage, allowing users to manage uploads to cloud storage without risking data loss or inconsistency.
Modified Files
How can I customize these reviews?Check out the Hyperlint AI Reviewer docs for more information on how to customize the review. If you just want to ignore it on this PR, you can add the Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add What is Hyperlint?Hyperlint is an AI agent that helps you write, edit, and maintain your documentation. Learn more about the Hyperlint AI reviewer and the checks that we can run on your documentation. |
@Feediver1 reminder to add a blurb for this in the What's New! |
@@ -1133,6 +1133,65 @@ rpk topic alter-config <topic_name> --set redpanda.remote.read=true | |||
|
|||
See also: xref:{topic-recovery-link}[Topic Recovery], xref:manage:kubernetes/k-remote-read-replicas.adoc[Remote Read Replicas] | |||
|
|||
== Safe pause and resume | |||
|
|||
Starting in version 25.1, when running Tiered Storage, you can safely pause and resume uploads to cloud storage without risking data consistency or loss. To pause or resume segment uploads to cloud storage, use the `cloud_storage_enable_segment_uploads` configuration property (default is `true`), which allows segment uploads to proceed normally after the pause completes and uploads resume. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should say when and how this will be used:
"Starting in version 25.1, when running Tiered Storage, to troubleshoot and resolve temporary issues related to a cluster's interaction with cloud storage, you can safely ..."
And, just in case this does end up in docs, let's also have an yellow box warning "We highly recommend using pause and resume only under the guidance of Redpanda support / customer success"
# there should not be any data loss. | ||
``` | ||
|
||
For some applications where the newest data is more valuable than historical data, data accumulation can be worse than data loss. In such cases, where you must pause uploads but you cannot afford to accumulate data on disk and lose availability, there are a couple of less safe pause and resume mechanisms: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'In such cases, where you cannot afford to lose the most recently produced data by rejecting produce requests after producers have filled the local disks during the period of paused uploads, there is a less safe pause and resume mechanisms that prioritizes the ability to receive new data over retaining data that cannot be uploaded when disks are full.
- Set the `cloud_storage_enable_remote_allow_gaps` cluster configuration property to `true`, which allows gaps for all topics in the cluster. | ||
- Set the `redpanda.remote.allow_gaps` configuration property to `true`, which allows gaps for one specific topic. This topic-level configuration option overrides the cluster-level default. | ||
|
||
When you pause uploads and set one of these properties to `true`, gaps may result. However, you can seamlessly resume uploads by specifying `*allow_gaps` to `true` at either the cluster or topic level. Otherwise, if set to `false`, uploads could stall if a gap occurs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
' , gaps in the range of offsets store in cloud storage may result."
rpk cluster config set cloud_storage_enable_segment_uploads false | ||
# Segment uploads are paused and cloud storage housekeeping is not running. | ||
# New data is stored on the local volume, which may overflow. | ||
# To avoid overflow, allow for gaps to be created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"To avoid overflow by allowing gaps in the log"
# Segment uploads are paused and cloud storage housekeeping is not running. | ||
# New data is stored on the local volume, which may overflow. | ||
# To avoid overflow, allow for gaps to be created. | ||
# In this example, data that is not uploaded to cloud storage can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be deleted if a disk fills before uploads are resumed
- Set the `cloud_storage_enable_remote_allow_gaps` cluster configuration property to `true`, which allows gaps for all topics in the cluster. | ||
- Set the `redpanda.remote.allow_gaps` configuration property to `true`, which allows gaps for one specific topic. This topic-level configuration option overrides the cluster-level default. | ||
|
||
When you pause uploads and set one of these properties to `true`, gaps may result. However, you can seamlessly resume uploads by specifying `*allow_gaps` to `true` at either the cluster or topic level. Otherwise, if set to `false`, uploads could stall if a gap occurs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lazin should the instruction here basically be to turn on allow gaps IF you see cloud storage uploads are stuck (you get errors, and want to alleviate the problem and continue). If that's true, what does it look like (what messages would you see in the logs, that can be. resolved by allowing gaps)?
The following example shows a simple pause and resume with no gaps allowed: | ||
|
||
```bash | ||
rpk cluster config set cloud_storage_enable_segment_uploads false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identified issues
- Custom Style Guide (code-formatting.adoc) - The line starts with a command
rpk cluster config set
. According to the style guide, sentences should not start with a command. It should be rephrased to integrate the command into a descriptive sentence.
Proposed fix
rpk cluster config set cloud_storage_enable_segment_uploads false | |
To disable segment uploads, use the command: `rpk cluster config set cloud_storage_enable_segment_uploads false`. |
The original line starts with a command, which is against the style guide's recommendation. By rephrasing it to start with a descriptive phrase, we provide context for the command, making the documentation clearer and more informative. This change also aligns with best practices for technical writing, where context is provided before commands or code snippets.
# If the disks fill up, produce requests will be rejected. | ||
... | ||
|
||
rpk cluster config set cloud_storage_enable_segment_uploads true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identified issues
- Custom Style Guide (code-formatting.adoc) - The line starts with a command
rpk cluster config set
. According to the style guide, sentences should not start with a command. It should be rephrased to integrate the command into a descriptive sentence.
Proposed fix
rpk cluster config set cloud_storage_enable_segment_uploads true | |
To enable segment uploads, use the command: `rpk cluster config set cloud_storage_enable_segment_uploads true`. |
The original line starts with a command, which is against the style guide's recommendation. By rephrasing it to start with a purpose ('To enable segment uploads'), the command is integrated into a more descriptive sentence, providing context and improving readability. This change aligns with the style guide's requirements and maintains the technical accuracy of the command.
The following example shows how to pause and resume Tiered Storage uploads while allowing for gaps: | ||
|
||
```bash | ||
rpk cluster config set cloud_storage_enable_segment_uploads false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identified issues
- Custom Style Guide (code-formatting.adoc) - The line starts with a command
rpk cluster config set
. According to the style guide, sentences should not start with a command. It should be rephrased to integrate the command into a descriptive sentence.
Proposed fix
rpk cluster config set cloud_storage_enable_segment_uploads false | |
To disable segment uploads, use the command: `rpk cluster config set cloud_storage_enable_segment_uploads false`. |
The original line starts with a command, which is against the style guide's recommendation. By rephrasing it to start with a descriptive phrase, we provide context for the command, making the documentation clearer and more informative. This change also aligns with best practices for technical writing, where context is provided before commands or code snippets.
# In this example, data that is not uploaded to cloud storage can be | ||
# deleted. | ||
|
||
rpk topic alter-config $topic-name --set redpanda.remote.allowgaps=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identified issues
- Custom Style Guide (code-formatting.adoc) - The line starts with a command
rpk topic alter-config
. According to the style guide, sentences should not start with a command. It should be rephrased to integrate the command into a descriptive sentence.
Proposed fix
rpk topic alter-config $topic-name --set redpanda.remote.allowgaps=true | |
To alter the configuration of a topic, use the command `rpk topic alter-config $topic-name --set redpanda.remote.allowgaps=true`. |
The original sentence starts with a command, which is against the style guide's recommendation. By rephrasing it to start with a descriptive phrase, we provide context and make the sentence more informative. This change also aligns with best practices for technical documentation, enhancing clarity and user understanding.
# to delete data before it's uploaded, therefore some data loss is possible. | ||
... | ||
|
||
rpk cluster config set cloud_storage_enable_segment_uploads true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identified issues
- Custom Style Guide (code-formatting.adoc) - The line starts with a command
rpk cluster config set
. According to the style guide, sentences should not start with a command. It should be rephrased to integrate the command into a descriptive sentence.
Proposed fix
rpk cluster config set cloud_storage_enable_segment_uploads true | |
To enable segment uploads, use the command: `rpk cluster config set cloud_storage_enable_segment_uploads true`. |
The original line starts with a command, which is against the style guide's recommendation. By rephrasing it to start with a purpose ('To enable segment uploads'), the command is integrated into a more descriptive sentence, providing context and improving readability. This change aligns with the style guide's requirements and maintains the technical accuracy of the command.
# metric is equal to zero, indicating that uploads have resumed. | ||
|
||
# Disable the gap allowance previously set for the topic. | ||
rpk topic alter-config $topic-name --set redpanda.remote.allowgaps=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identified issues
- Custom Style Guide (code-formatting.adoc) - The line starts with a command
rpk topic alter-config
. According to the style guide, sentences should not start with a command. It should be rephrased to integrate the command into a descriptive sentence.
Proposed fix
rpk topic alter-config $topic-name --set redpanda.remote.allowgaps=false | |
To alter the configuration of a topic, use the command `rpk topic alter-config $topic-name --set redpanda.remote.allowgaps=false`. |
The original sentence starts with a command, which is against the style guide's recommendation. By rephrasing it to provide context, the sentence becomes more informative and aligns with the style guide. This change also helps in making the documentation more user-friendly by explaining the purpose of the command.
|
||
Starting in version 25.1, when running Tiered Storage, you can safely pause and resume uploads to cloud storage without risking data consistency or loss. To pause or resume segment uploads to cloud storage, use the `cloud_storage_enable_segment_uploads` configuration property (default is `true`), which allows segment uploads to proceed normally after the pause completes and uploads resume. | ||
|
||
While uploads are paused, data accumulates locally, which can lead to full disks if the pause is prolonged. In such cases, Redpanda throttles produce requests, and rejects new Kafka produce requests to prevent data from being written. Additionally, this pauses cloud storage housekeeping, meaning segments are neither uploaded nor removed from cloud storage. However, it is still possible to consume data from cloud storage when you have paused uploads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads as "while uploads are paused, Redpanda throttles produce requests ..."
I think it might be clearer if we change this from "In such cases, Redpanda ..." to "If the disks fill, Redpanda throttles ..."
@@ -1133,6 +1133,65 @@ rpk topic alter-config <topic_name> --set redpanda.remote.read=true | |||
|
|||
See also: xref:{topic-recovery-link}[Topic Recovery], xref:manage:kubernetes/k-remote-read-replicas.adoc[Remote Read Replicas] | |||
|
|||
== Pause and resume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change this to "Pause and resume uploads"
|
||
Use the `redpanda_cloud_storage_paused_archivers` metric to monitor the status of paused uploads. It displays a non-zero value whenever uploads are paused. | ||
|
||
NOTE: Do not use `redpanda.remote.read` or `redpanda.remote.write` to pause and resume segment uploads. Doing so can lead to a gap between local data and the data in the cloud storage. In such cases, it is possible that the oldest segment is not aligned with the last uploaded segment due to the gap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a warning since we're more actively discouraging people to never do this?
Description
Resolves Doc-936
Review deadline: Thursday, March 20
Note: Once the metrics and configuration properties become available in the doc properties, I will add the links to the appropriate places.
Page previews
Safe pause and resume
PR Change Summary
Introduced the Safe Pause and Resume feature for Tiered Storage, allowing users to manage uploads to cloud storage without risking data loss or inconsistency.
Added documentation for the Safe Pause and Resume feature in Tiered Storage.
Explained the configuration properties for pausing and resuming uploads.
Provided guidelines for monitoring paused uploads and managing data consistency.
Modified Files
modules/manage/partials/tiered-storage.adoc
Checks