|
| 1 | +//NOTE TO CONTRIBUTORS: |
| 2 | +// |
| 3 | +//If you update any of the content in this assembly file, be sure to also make the same changes in the assemblies in the following directory: backup_and_restore/control_plane_backup_and_restore/disaster-recovery/. |
| 4 | + |
| 5 | +:_mod-docs-content-type: ASSEMBLY |
| 6 | +[id="etcd-disaster-recovery"] |
| 7 | +include::_attributes/common-attributes.adoc[] |
| 8 | += Disaster recovery |
| 9 | +:context: etcd-disaster-recovery |
| 10 | + |
| 11 | +toc::[] |
| 12 | + |
| 13 | +The disaster recovery documentation provides information for administrators on how to recover from several disaster situations that might occur with their {product-title} cluster. As an administrator, you might need to follow one or more of the following procedures to return your cluster to a working state. |
| 14 | + |
| 15 | +[IMPORTANT] |
| 16 | +==== |
| 17 | +Disaster recovery requires you to have at least one healthy control plane host. |
| 18 | +==== |
| 19 | + |
| 20 | +[id="etcd-dr-quorum"] |
| 21 | +== Quorum restoration |
| 22 | + |
| 23 | +You can use the `quorum-restore.sh` script to restore etcd quorum on clusters that are offline due to quorum loss. When quorum is lost, the {product-title} API becomes read-only. After quorum is restored, the {product-title} API returns to read/write mode. |
| 24 | + |
| 25 | +// Restoring etcd quorum for high availability clusters |
| 26 | +include::modules/dr-restoring-etcd-quorum-ha.adoc[leveloffset=+2] |
| 27 | + |
| 28 | +[role="_additional-resources"] |
| 29 | +.Additional resources |
| 30 | +* xref:../../installing/installing_bare_metal/upi/installing-bare-metal.adoc#installing-bare-metal[Installing a user-provisioned cluster on bare metal] |
| 31 | +* xref:../../installing/installing_bare_metal/bare-metal-expanding-the-cluster.adoc#replacing-a-bare-metal-control-plane-node_bare-metal-expanding[Replacing a bare-metal control plane node] |
| 32 | + |
| 33 | +[NOTE] |
| 34 | +==== |
| 35 | +If you have a majority of your control plane nodes still available and have an etcd quorum, xref:../../backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.adoc#replacing-unhealthy-etcd-member[replace a single unhealthy etcd member]. |
| 36 | +==== |
| 37 | + |
| 38 | +[id="etcd-dr-restore"] |
| 39 | +== Restoring to a previous cluster state |
| 40 | + |
| 41 | +To restore the cluster to a previous state, you must have previously backed up the `etcd` data by creating a snapshot. You will use this snapshot to restore the cluster state. For more information, see "Backing up etcd data". |
| 42 | + |
| 43 | +If applicable, you might also need to xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-3-expired-certs.adoc#dr-recovering-expired-certs[recover from expired control plane certificates]. |
| 44 | + |
| 45 | +[WARNING] |
| 46 | +==== |
| 47 | +Restoring to a previous cluster state is a destructive and destablizing action to take on a running cluster. This procedure should only be used as a last resort. |
| 48 | +
|
| 49 | +Before performing a restore, see "About restoring to a previous cluster state" for more information on the impact to the cluster. |
| 50 | +==== |
| 51 | + |
| 52 | +// About restoring to a previous cluster state |
| 53 | +include::modules/dr-restoring-cluster-state-about.adoc[leveloffset=+2] |
| 54 | + |
| 55 | +// Restoring to a previous cluster state for a single node |
| 56 | +include::modules/dr-restoring-cluster-state-sno.adoc[leveloffset=+2] |
| 57 | + |
| 58 | +// Restoring to a previous cluster state |
| 59 | +include::modules/dr-restoring-cluster-state.adoc[leveloffset=+2] |
| 60 | + |
| 61 | +// Restoring a cluster from etcd backup manually |
| 62 | +include::modules/manually-restoring-cluster-etcd-backup.adoc[leveloffset=+2] |
| 63 | + |
| 64 | +[role="_additional-resources"] |
| 65 | +.Additional resources |
| 66 | +* xref:../../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backing-up-etcd-data_backup-etcd[Backing up etcd data] |
| 67 | +* xref:../../installing/installing_bare_metal/upi/installing-bare-metal.adoc#installing-bare-metal[Installing a user-provisioned cluster on bare metal] |
| 68 | +* xref:../../networking/accessing-hosts.adoc#accessing-hosts-on-aws_accessing-hosts[Accessing hosts on Amazon Web Services in an installer-provisioned infrastructure cluster] |
| 69 | +* xref:../../installing/installing_bare_metal/bare-metal-expanding-the-cluster.adoc#replacing-a-bare-metal-control-plane-node_bare-metal-expanding[Replacing a bare-metal control plane node] |
| 70 | + |
| 71 | +include::modules/dr-scenario-cluster-state-issues.adoc[leveloffset=+2] |
| 72 | + |
| 73 | +// Recovering from expired control plane certificates |
| 74 | +include::modules/dr-recover-expired-control-plane-certs.adoc[leveloffset=+1] |
| 75 | + |
| 76 | +//Testing restore procedures |
| 77 | +include::modules/dr-testing-restore-procedures.adoc[leveloffset=+1] |
| 78 | + |
| 79 | +[role="_additional-resources"] |
| 80 | +.Additional resources |
| 81 | +* xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[Restoring to a previous cluster state] |
0 commit comments