From 837246f914414ad87f6251bdeb90f155ea48a9d2 Mon Sep 17 00:00:00 2001 From: Anastasia Alexadrova Date: Tue, 25 Mar 2025 09:35:25 +0100 Subject: [PATCH 1/4] K8SPXC-1366 Documented new options Updated the About backups section with info about multiple backups, backup flow, backups in an unhealthy cluster modified: docs/backups.md modified: docs/operator.md --- __pycache__/main.cpython-311.pyc | Bin 1705 -> 0 bytes docs/backups.md | 68 +++++++++++++++++++++++++------ docs/operator.md | 21 ++++++++++ 3 files changed, 76 insertions(+), 13 deletions(-) delete mode 100644 __pycache__/main.cpython-311.pyc diff --git a/__pycache__/main.cpython-311.pyc b/__pycache__/main.cpython-311.pyc deleted file mode 100644 index 7ca444f13de0ee776cf1836f2afdf89010b29c18..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 1705 zcmcgsO>7%Q6rR~#|2C5qq7((S;HGMbY3)^2A+^MWLo38j6_U8&C!wV4ops{G-nC}e zZCIQtha5O?C{j^_1jrWxRW2Mka_n&@O4Vp3q)ME)8Rg~^Z+30xhe*9K)_(Ke``*mF zdGnrsAta4py!rD_^nrxXUqa~((8IBJ5~hcUB8sbM7Ex&(&0>m+C@cRh93~7L`b8ka zB)+{xfQP7wmLfYK#p`Gj=B>VE6(8bZ=Oesyh>ufc9Ut+KX$&Gsc;r3XUe1&omf==x z>rhAvocAJpd(!}L5$qlaxQRAF+v;09@S&2x=X%M)3l3K%yyV=6SBjILUdwseTceAv zTX){lw51Bu^L3LMwx#C{yQYbXmgL|Bu_eUlkbdeXDO< zm716;%c0I}U+gLB@%NYx$7zwyq< zf;*ODDzILvpl2+-X1+b_C5Oignm(SH^W!O*(@?ap6>kNu{B}i5)*0QiW;s%*aCVJBD0zn>M_R%JS7mwq71psBQ4n0-R zZmVZo=UNU|FYKrno~oC&)k|ExyrW)TTl`af;hQ`6tybyR(Vl5HeQINPJAK+upWaQL zyl?R2(ANXIWN70ebWUr96YaMXoQ(bc0VnVN(ct99^dL)dmyO8_!0uyzhka8-hj++I>?JrmjN}mtP2q!dl7*U0 i8&z|hy#~CfVdo-1TgDjgqSRXWYbz=iryY;sZvO(0NQ_qi diff --git a/docs/backups.md b/docs/backups.md index 2983db52..77616669 100644 --- a/docs/backups.md +++ b/docs/backups.md @@ -1,25 +1,67 @@ # Providing Backups -The Operator usually stores Percona XtraDB Cluster backups outside the -Kubernetes cluster, on [Amazon S3 or S3-compatible storage :octicons-link-external-16:](https://en.wikipedia.org/wiki/Amazon_S3#S3_API_and_competing_services), -or on [Azure Blob Storage :octicons-link-external-16:](https://azure.microsoft.com/en-us/services/storage/blobs/): +It's important to back up your database to keep your data safe. +Backups help protect your system against data loss and corruption and ensure business stability. They are also a quick way to recover the database if something happens with it. + +A backup starts after you create a Backup object. You can create a Backup object in two ways: + +* manually at any moment. This way you start an [on-demand backup](backups-ondemand.md). +* instruct the Operator to create it automatically according to a schedule that you define for it. This is a [scheduled backup](backups-scheduled.md). + +The Operator does physical backups using the [Percona XtraBackup :octicons-link-external-16:](https://docs.percona.com/percona-xtrabackup/2.4/index.html) tool and the [SST :octicons-link-external-16:](https://galeracluster.com/library/documentation/sst.html) method. + +## Backup storage + +You can store backups outside of Kubernetes cluster in one of the supported cloud storages: + +* [Amazon S3 or S3-compatible storage :octicons-link-external-16:](https://en.wikipedia.org/wiki/Amazon_S3#S3_API_and_competing_services), +* [Azure Blob Storage :octicons-link-external-16:](https://azure.microsoft.com/en-us/services/storage/blobs/): ![image](assets/images/backup-cloud.svg) -But storing backups on [Persistent Volumes :octicons-link-external-16:](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) inside the Kubernetes cluster is also possible: +If you're running a Kubernetes cluster on premises, you can store backups inside it using a [Persistent Volume :octicons-link-external-16:](https://kubernetes.io/docs/concepts/storage/persistent-volumes/). For example, if you don't use a remote backup storage or if storage costs are high for you. ![image](assets/images/backup-pv.svg) -The Operator does logical backups, querying Percona XtraDB Cluster for the -database data and writing the retrieved data to the backup storage. The backups -are done using the [Percona XtraBackup :octicons-link-external-16:](https://docs.percona.com/percona-xtrabackup/2.4/index.html) tool. +## Workflow + +After you create a Backup object, the Operator sets up a backup Pod that runs Percona XtraBackup inside. It also creates a path in the storage to save the backup data. + +The backup Pod starts copying the data files from the Percona XtraDB Cluster to the backup storage. The Percona XtraDB Cluster Pod that serves the data enters the Donor state and stops receiving all requests. + +The backup task is resource-consuming and can affect performance. That's why the Operator uses one of the secondary Percona XtraDB Cluster Pods for backups. The exception is a one-pod deployment, where the same Pod is used for all tasks. + +After the data files are copied, the Operator marks the backup Pod as 'Completed' and deletes it. The Operator also updates the status of the Backup object. + + +## Multiple backups + +You can run several backups. For example, schedule weekly backups on one storage and daily backups on another one. You can also run an on-demand backup to be on the safe side before you do some maintenance work. + +Several backups run in parallel by default if they happen at the same time. If they overload your cluster, you can turn off parallel backups with the `backup.allowParallel` configuration option in the `cr.yaml` file. Then, the Operator queues the backups and runs them sequentially. + +The Operator ensures the sequence by creating a lock for a running backup. It releases the lock after the backup either succeeds or fails and starts the next one from the queue. The lock is also released if you delete a running backup. + +You can fine-tune the queue by assigning a waiting time for a backup to start. Use the `spec.startingDeadlineSeconds` option in the `deploy/cr.yaml` file to set this time for all backups. You can also override it for a specific backup by defining the `startingDeadlineSeconds` option within the backup configuration. This setting has a higher priority. + +If the backup doesn't start within the defined time, the Operator automatically marks it as "failed". + +## Backup suspension for an unhealthy database cluster + +Your database cluster can become unhealthy. For example, when one of the Pods crashes and restarts. The Operator monitors the database cluster state while a backup is running and suspends it for an unhealthy cluster to reduce the load on the cluster. + +To offload the database cluster even more, you can define how long a backup remains suspended. Use the `spec.backup.suspendedDeadlineSeconds` option in the `cr.yaml` file for all backups. Or set it in the `backup.yaml` configuration files for a specific backup. The setting in the `backup.yaml` file has a higher priority. + +After this duration expires, the Operator automatically marks this backup as "failed". + +Otherwise, after the cluster is recovered and reports the Ready status, the Operator resumes the backup and tries to finish it. + +Note that if some files were already saved on the storage when a backup was suspended, this backup will fail to be finished because the Operator doesn't support rewriting these files. If this happens, delete the failed backup and restart it. + +If you want to run backups in an unhealthy cluster, set the `spec.unsafeFlags.backupIfUnhealthy` option in the `deply/cr.yaml` file to `true`. Use this option with caution because it can affect the cluster performance. + + -The Operator allows doing backups in two ways: -* *Scheduled backups* are configured in the - [deploy/cr.yaml :octicons-link-external-16:](https://github.com/percona/percona-xtradb-cluster-operator/blob/main/deploy/cr.yaml) - file to be executed automatically in proper time. -* *On-demand backups* can be done manually at any moment and are configured in - the [deploy/backup/backup.yaml :octicons-link-external-16:](https://raw.githubusercontent.com/percona/percona-xtradb-cluster-operator/main/deploy/backup/backup.yaml). diff --git a/docs/operator.md b/docs/operator.md index dc01d47f..77b1ff7c 100644 --- a/docs/operator.md +++ b/docs/operator.md @@ -2304,6 +2304,27 @@ The timeout value in seconds, after which backup job will automatically fail. | ----------- | ---------- | | :material-numeric-1-box: int | `3600` | +### backup.startingDeadlineSeconds + +The maximum time in seconds for a backup to start. The Operator compares the timestamp of the backup object against the current time. If the backup is not started within the set time, the Operator automatically marks it as "failed". + +You can override this setting for a specific backup in the `deploy/backup/backup.yaml` configuration file. + +| Value type | Example | +| ----------- | ---------- | +| :material-numeric-1-box: int | `300` | + +### backup.suspendedDeadlineSeconds + +The maximum time in seconds for a backup to remain in a suspended state. The Operator compares the timestamp when the backup job was suspended against the current time. After the defined suspension time expires, the backup is automatically marked as "failed". + +You can override this setting for a specific backup in the `deploy/backup/backup.yaml` configuration file. + +| Value type | Example | +| ----------- | ---------- | +| :material-numeric-1-box: int | `1200` | + + ### `backup.imagePullSecrets.name` The [Kubernetes imagePullSecrets :octicons-link-external-16:](https://kubernetes.io/docs/concepts/configuration/secret/#using-imagepullsecrets) for the specified image. From c600cfd9fc40db91dfa0481c74d2fd5cb7d010b3 Mon Sep 17 00:00:00 2001 From: Anastasia Alexadrova Date: Tue, 25 Mar 2025 13:18:15 +0100 Subject: [PATCH 2/4] Updated PXB link --- docs/backups.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/backups.md b/docs/backups.md index 77616669..8ab47f71 100644 --- a/docs/backups.md +++ b/docs/backups.md @@ -8,7 +8,7 @@ A backup starts after you create a Backup object. You can create a Backup object * manually at any moment. This way you start an [on-demand backup](backups-ondemand.md). * instruct the Operator to create it automatically according to a schedule that you define for it. This is a [scheduled backup](backups-scheduled.md). -The Operator does physical backups using the [Percona XtraBackup :octicons-link-external-16:](https://docs.percona.com/percona-xtrabackup/2.4/index.html) tool and the [SST :octicons-link-external-16:](https://galeracluster.com/library/documentation/sst.html) method. +The Operator does physical backups using the [Percona XtraBackup :octicons-link-external-16:](https://docs.percona.com/percona-xtrabackup/8.0/index.html) tool and the [SST :octicons-link-external-16:](https://galeracluster.com/library/documentation/sst.html) method. ## Backup storage From d5651f31ac1e73c6606141f8b74a5a28b56d8643 Mon Sep 17 00:00:00 2001 From: Anastasia Alexadrova Date: Tue, 25 Mar 2025 21:09:44 +0100 Subject: [PATCH 3/4] Updated a note about saved backup files --- docs/backups.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/backups.md b/docs/backups.md index 8ab47f71..0f58cbd2 100644 --- a/docs/backups.md +++ b/docs/backups.md @@ -56,7 +56,7 @@ After this duration expires, the Operator automatically marks this backup as "fa Otherwise, after the cluster is recovered and reports the Ready status, the Operator resumes the backup and tries to finish it. -Note that if some files were already saved on the storage when a backup was suspended, this backup will fail to be finished because the Operator doesn't support rewriting these files. If this happens, delete the failed backup and restart it. +Note that if some files were already saved on the storage when a backup was suspended, the Operator deletes them and reruns the backup. If you want to run backups in an unhealthy cluster, set the `spec.unsafeFlags.backupIfUnhealthy` option in the `deply/cr.yaml` file to `true`. Use this option with caution because it can affect the cluster performance. From dfd3e7af9278518e7fdc3c9cafcd66415aec26de Mon Sep 17 00:00:00 2001 From: Anastasia Alexadrova Date: Wed, 26 Mar 2025 11:40:39 +0100 Subject: [PATCH 4/4] Updated after the review --- docs/backups.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/backups.md b/docs/backups.md index 0f58cbd2..986bbaac 100644 --- a/docs/backups.md +++ b/docs/backups.md @@ -42,7 +42,7 @@ Several backups run in parallel by default if they happen at the same time. If t The Operator ensures the sequence by creating a lock for a running backup. It releases the lock after the backup either succeeds or fails and starts the next one from the queue. The lock is also released if you delete a running backup. -You can fine-tune the queue by assigning a waiting time for a backup to start. Use the `spec.startingDeadlineSeconds` option in the `deploy/cr.yaml` file to set this time for all backups. You can also override it for a specific backup by defining the `startingDeadlineSeconds` option within the backup configuration. This setting has a higher priority. +You can fine-tune the queue by assigning a waiting time for a backup to start. Use the `spec.startingDeadlineSeconds` option in the `deploy/cr.yaml` file to set this time for all backups. You can also override it for a specific on-demand backup by defining the `startingDeadlineSeconds` option within the backup configuration. This setting has a higher priority. If the backup doesn't start within the defined time, the Operator automatically marks it as "failed".