Skip to content

K8SPXC-1405 Added a note about the retention policy with PITR #219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 14, 2025
Binary file removed __pycache__/main.cpython-311.pyc
Binary file not shown.
70 changes: 44 additions & 26 deletions docs/backups-pitr.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,42 @@
# Store binary logs for point-in-time recovery

Point-in-time recovery functionality allows users to roll back the cluster to a
specific transaction, time (or even skip a transaction in some cases).
Technically, this feature involves continuously saving binary log updates
[to the backup storage](backups-storage.md). Point-in-time recovery is off by
default and is supported by the Operator only with Percona XtraDB Cluster
Point-in-time recovery allows users to roll back the cluster to a
specific transaction or time. You can even skip a transaction if you don't need it anymore. To make a point-in-time recovery, the Operator needs a backup and binary logs (binlogs) of the server to.

A binary log records all changes made to the database, such as updates, inserts, and deletes. It is used to synchronize data across servers for and point-in-time recovery.

Point-in-time recovery is off by
default and is supported by the Operator with Percona XtraDB Cluster
versions starting from 8.0.21-12.1.

To be used, it requires setting a number of keys in the `pitr` subsection
under the `backup` section of the [deploy/cr.yaml :octicons-link-external-16:](https://github.com/percona/percona-xtradb-cluster-operator/blob/main/deploy/cr.yaml) file:
After you [enable point-in-time recovery](#enable-point-in-time-recovery), the Operator spins up a separate point-in-time recovery Pod, which starts saving binary log updates
[to the backup storage](backups-storage.md).


## Considerations

1. You must use either s3-compatible or Azure-compatible storage for both binlog and full backup to make the point-in-time recovery work
2. The Operator saves binlogs without any
cluster-based filtering. Therefore, either use a separate folder per cluster on the same bucket or use different buckets for binlogs.

* `backup.pitr.enabled` key should be set to `true`
Also,we recommend to have an empty bucket or a folder on a bucket for binlogs when you enable point-in-time recovery. This bucket/folder should not contain no binlogs nor files from previous attempts or other clusters.
3. Don't [purge binlogs :octicons-link-external-16:](https://dev.mysql.com/doc/refman/8.0/en/purge-binary-logs.html)
before they are transferred to the backup storage. Doing so breaks point-in-time recovery
4. Disable the [retention policy](operator.md#backupschedulekeep) as it is incompatible with the point-in-time recovery. To clean up the storage, configure the [Bucket lifecycle :octicons-link-external-16:](https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-set-lifecycle-configuration-intro.html) on the storage

* `backup.pitr.storageName` key should point to the name of the storage already
configured in the `storages` subsection
## Enable point-in-time recovery

!!! note
Both binlog and full backup should use s3-compatible storage to make
point-in-time recovery work!
To use point-in-time recovery, set the following keys in the `pitr` subsection
under the `backup` section of the [deploy/cr.yaml :octicons-link-external-16:](https://github.com/percona/percona-xtradb-cluster-operator/blob/main/deploy/cr.yaml) manifest:

* `timeBetweenUploads` key specifies the number of seconds between running the
binlog uploader.
* `backup.pitr.enabled` - set it to `true`

The following example shows how the `pitr` subsection looks like:
* `backup.pitr.storageName` - specify the same storage name that you have defined in the `storages` subsection

* `timeBetweenUploads`- specify the number of seconds between running the
binlog uploader

The following example shows how the `pitr` subsection looks like if you use the S3 storage:

```yaml
backup:
Expand All @@ -33,16 +47,20 @@ backup:
timeBetweenUploads: 60
```

!!! note
For how to restore a database to a specific point in time, see [Restore the cluster with point-in-time recovery](backups-restore.md#restore-the-cluster-with-point-in-time-recovery).

## Binary logs statistics

The point-in-time recovery Pod has statistics metrics for binlogs. They provide insights into the success and failure rates of binlog operations, timeliness of processing and uploads and potential gaps or inconsistencies in binlog data.

The available metrics are:

Point-in-time recovery will be done for binlogs without any
cluster-based filtering. Therefore it is recommended to use a separate
storage, bucket, or directory to store binlogs for the cluster.
Also, it is recommended to have empty bucket/directory which holds binlogs
(with no binlogs or files from previous attempts or other clusters) when
you enable point-in-time recovery.
* `pxc_binlog_collector_success_total` - The total number of successful binlog collection cycles. It helps monitor how often the binlog collector successfully processes and uploads binary logs.
* `pxc_binlog_collector_gap_detected_total` - Tracks the total number of gaps detected in the binlog sequence during collection. Highlights potential issues with missing or skipped binlogs, which could impact replication or recovery.
* `pxc_binlog_collector_last_processing_timestamp` - Records the timestamp of the last successful binlog collection operation.
* `pxc_binlog_collector_last_upload_timestamp` - Records the timestamp of the last successful binlog upload to the storage
* `pxc_binlog_collector_uploaded_total` - The total number of successfully uploaded binlogs

!!! note
You can connect to this Pod using the `<pitr-pod-service>:8080/metrics` endpoint to gather these metrics and further analyze them.

[Purging binlogs :octicons-link-external-16:](https://dev.mysql.com/doc/refman/8.0/en/purge-binary-logs.html)
before they are transferred to backup storage will break point-in-time recovery.
Note that the statistics data is not kept when the point-in-time recovery Pod restarts. This means that the counters like `pxc_binlog_collector_success_total` are reset.