K8SPXC-1405 Added a note about the retention policy with PITR (#219)

nastena1606 · web-flow · commit f282b65c0e20 · 2025-04-14T19:41:11.000+02:00
K8SPXC-1544 Added info about Prometheus metrics
diff --git a/docs/backups-pitr.md b/docs/backups-pitr.md
@@ -1,28 +1,42 @@
 # Store binary logs for point-in-time recovery
 
-Point-in-time recovery functionality allows users to roll back the cluster to a
-specific transaction, time (or even skip a transaction in some cases).
-Technically, this feature involves continuously saving binary log updates
-[to the backup storage](backups-storage.md). Point-in-time recovery is off by
-default and is supported by the Operator only with Percona XtraDB Cluster
+Point-in-time recovery allows users to roll back the cluster to a
+specific transaction or time. You can even skip a transaction if you don't need it anymore. To make a point-in-time recovery, the Operator needs a backup and binary logs (binlogs) of the server to. 
+
+A binary log records all changes made to the database, such as updates, inserts, and deletes. It is used to synchronize data across servers for and point-in-time recovery. 
+
+Point-in-time recovery is off by
+default and is supported by the Operator with Percona XtraDB Cluster
 versions starting from 8.0.21-12.1.
 
-To be used, it requires setting a number of keys in the `pitr` subsection
-under the `backup` section of the [deploy/cr.yaml :octicons-link-external-16:](https://github.com/percona/percona-xtradb-cluster-operator/blob/main/deploy/cr.yaml) file:
+After you [enable point-in-time recovery](#enable-point-in-time-recovery), the Operator spins up a separate point-in-time recovery Pod, which starts saving binary log updates
+[to the backup storage](backups-storage.md). 
+
+
+## Considerations
+
+1. You must use either s3-compatible or Azure-compatible storage for both binlog and full backup to make the point-in-time recovery work
+2. The Operator saves binlogs without any
+    cluster-based filtering. Therefore, either use a separate folder per cluster on the same bucket or use different buckets for binlogs. 
 
-* `backup.pitr.enabled` key should be set to `true`
+    Also,we recommend to have an empty bucket or a folder on a bucket for binlogs when you enable point-in-time recovery. This bucket/folder should not contain no binlogs nor files from previous attempts or other clusters. 
+3. Don't [purge binlogs :octicons-link-external-16:](https://dev.mysql.com/doc/refman/8.0/en/purge-binary-logs.html)
+    before they are transferred to the backup storage. Doing so breaks point-in-time recovery
+4. Disable the [retention policy](operator.md#backupschedulekeep) as it is incompatible with the point-in-time recovery. To clean up the storage, configure the [Bucket lifecycle :octicons-link-external-16:](https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-set-lifecycle-configuration-intro.html) on the storage
 
-* `backup.pitr.storageName` key should point to the name of the storage already
-    configured in the `storages` subsection
+## Enable point-in-time recovery
 
-    !!! note
-        Both binlog and full backup should use s3-compatible storage to make
-        point-in-time recovery work!
+To use point-in-time recovery, set the following keys in the `pitr` subsection
+under the `backup` section of the [deploy/cr.yaml :octicons-link-external-16:](https://github.com/percona/percona-xtradb-cluster-operator/blob/main/deploy/cr.yaml) manifest:
 
-* `timeBetweenUploads` key specifies the number of seconds between running the
-    binlog uploader.
+* `backup.pitr.enabled` - set it to `true`
 
-The following example shows how the `pitr` subsection looks like:
+* `backup.pitr.storageName` - specify the same storage name that you have defined in the `storages` subsection
+
+* `timeBetweenUploads`- specify the number of seconds between running the
+    binlog uploader
+
+The following example shows how the `pitr` subsection looks like if you use the S3 storage:
 
 ```yaml
 backup:
@@ -33,16 +47,20 @@ backup:
     timeBetweenUploads: 60
 ```
 
-!!! note
+For how to restore a database to a specific point in time, see [Restore the cluster with point-in-time recovery](backups-restore.md#restore-the-cluster-with-point-in-time-recovery).
+
+## Binary logs statistics
+
+The point-in-time recovery Pod has statistics metrics for binlogs. They provide insights into the success and failure rates of binlog operations, timeliness of processing and uploads and potential gaps or inconsistencies in binlog data.
+
+The available metrics are:
 
-    Point-in-time recovery will be done for binlogs without any
-    cluster-based filtering. Therefore it is recommended to use a separate
-    storage, bucket, or directory to store binlogs for the cluster.
-    Also, it is recommended to have empty bucket/directory which holds binlogs
-    (with no binlogs or files from previous attempts or other clusters) when
-    you enable point-in-time recovery.
+* `pxc_binlog_collector_success_total` - The total number of successful binlog collection cycles. It helps monitor how often the binlog collector successfully processes and uploads binary logs.
+* `pxc_binlog_collector_gap_detected_total` - Tracks the total number of gaps detected in the binlog sequence during collection. Highlights potential issues with missing or skipped binlogs, which could impact replication or recovery.
+* `pxc_binlog_collector_last_processing_timestamp` - Records the timestamp of the last successful binlog collection operation.
+* `pxc_binlog_collector_last_upload_timestamp` - Records the timestamp of the last successful binlog upload to the storage
+* `pxc_binlog_collector_uploaded_total` - The total number of successfully uploaded binlogs
 
-!!! note
+You can connect to this Pod using the `<pitr-pod-service>:8080/metrics` endpoint to gather these metrics and further analyze them.
 
-    [Purging binlogs :octicons-link-external-16:](https://dev.mysql.com/doc/refman/8.0/en/purge-binary-logs.html)
-    before they are transferred to backup storage will break point-in-time recovery.
+Note that the statistics data is not kept when the point-in-time recovery Pod restarts. This means that the counters like `pxc_binlog_collector_success_total` are reset.