Skip to content

Commit b2b6553

Browse files
committed
using modules instead of xrefs
1 parent 3fa9e24 commit b2b6553

File tree

3 files changed

+84
-15
lines changed

3 files changed

+84
-15
lines changed

etcd/etcd-backup-restore/etcd-backup.adoc

+10-2
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,14 @@ Back up your cluster's etcd data by performing a single invocation of the backup
1919

2020
After you have an etcd backup, you can xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[restore to a previous cluster state].
2121

22-
xref:../../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backing-up-etcd-data_backup-etcd[Backing up etcd data]:: To back up etcd, you create an etcd snapshot and back up the resources for the static pods. You can save the backup and used it later if you need to restore etcd.
22+
// Backing up etcd data
23+
include::modules/backup-etcd.adoc[leveloffset=+1]
2324

24-
xref:../../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#creating-automated-etcd-backups_backup-etcd[Creating automated etcd backups]:: You can automate recurring and single backups. Automated backups is a Technology Preview feature.
25+
[role="_additional-resources"]
26+
.Additional resources
27+
* xref:../../hosted_control_planes/hcp_high_availability/hcp-recovering-etcd-cluster.adoc#hcp-recovering-etcd-cluster[Recovering an unhealthy etcd cluster]
28+
29+
// Creating automated etcd backups
30+
include::modules/etcd-creating-automated-backups.adoc[leveloffset=+1]
31+
include::modules/creating-single-etcd-backup.adoc[leveloffset=+2]
32+
include::modules/creating-recurring-etcd-backups.adoc[leveloffset=+2]

etcd/etcd-backup-restore/etcd-disaster-recovery.adoc

+45-7
Original file line numberDiff line numberDiff line change
@@ -13,24 +13,62 @@ The disaster recovery documentation provides information for administrators on h
1313
Disaster recovery requires you to have at least one healthy control plane host.
1414
====
1515

16-
xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/quorum-restoration.adoc#dr-quorum-restoration[Quorum restoration]:: This solution handles situations where you have lost the majority of your control plane hosts, leading to etcd quorum loss and the cluster going offline. This solution does not require an etcd backup.
17-
+
16+
== Quorum restoration
17+
18+
You can use the `quorum-restore.sh` script to restore etcd quorum on clusters that are offline due to quorum loss. When quorum is lost, the {product-title} API becomes read-only. After quorum is restored, the {product-title} API returns to read/write mode.
19+
20+
// Restoring etcd quorum for high availability clusters
21+
include::modules/dr-restoring-etcd-quorum-ha.adoc[leveloffset=+2]
22+
23+
[role="_additional-resources"]
24+
.Additional resources
25+
* xref:../../installing/installing_bare_metal/upi/installing-bare-metal.adoc#installing-bare-metal[Installing a user-provisioned cluster on bare metal]
26+
* xref:../../installing/installing_bare_metal/bare-metal-expanding-the-cluster.adoc#replacing-a-bare-metal-control-plane-node_bare-metal-expanding[Replacing a bare-metal control plane node]
27+
1828
[NOTE]
1929
====
2030
If you have a majority of your control plane nodes still available and have an etcd quorum, xref:../../etcd/etcd-backup-restore/replace-unhealthy-etcd-member.adoc#replace-unhealthy-etcd-member[replace a single unhealthy etcd member].
2131
====
2232

23-
xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[Restoring to a previous cluster state]:: This solution handles situations where you want to restore your cluster to a previous state, for example, if an administrator deletes something critical. If you have taken an etcd backup, you can restore your cluster to a previous state.
24-
+
33+
== Restoring to a previous cluster state
34+
35+
To restore the cluster to a previous state, you must have previously backed up the `etcd` data by creating a snapshot. You will use this snapshot to restore the cluster state. For more information, see "Backing up etcd data".
36+
2537
If applicable, you might also need to xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-3-expired-certs.adoc#dr-recovering-expired-certs[recover from expired control plane certificates].
2638
+
2739
[WARNING]
2840
====
2941
Restoring to a previous cluster state is a destructive and destablizing action to take on a running cluster. This procedure should only be used as a last resort.
3042
31-
Before performing a restore, see xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-scenario-2-restoring-cluster-state-about_dr-restoring-cluster-state[About restoring to a previous cluster state] for more information on the impact to the cluster.
43+
Before performing a restore, see "About restoring to a previous cluster state" for more information on the impact to the cluster.
3244
====
3345

34-
xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-3-expired-certs.adoc#dr-recovering-expired-certs[Recovering from expired control plane certificates]:: This solution handles situations where your control plane certificates have expired. For example, if you shut down your cluster before the first certificate rotation, which occurs 24 hours after installation, your certificates will not be rotated and will expire. You can follow this procedure to recover from expired control plane certificates.
46+
// About restoring to a previous cluster state
47+
include::modules/dr-restoring-cluster-state-about.adoc[leveloffset=+2]
48+
49+
// Restoring to a previous cluster state for a single node
50+
include::modules/dr-restoring-cluster-state-sno.adoc[leveloffset=+2]
51+
52+
// Restoring to a previous cluster state
53+
include::modules/dr-restoring-cluster-state.adoc[leveloffset=+2]
54+
55+
// Restoring a cluster from etcd backup manually
56+
include::modules/manually-restoring-cluster-etcd-backup.adoc[leveloffset=+2]
57+
58+
[role="_additional-resources"]
59+
.Additional resources
60+
* xref:../../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backing-up-etcd-data_backup-etcd[Backing up etcd data]
61+
* xref:../../installing/installing_bare_metal/upi/installing-bare-metal.adoc#installing-bare-metal[Installing a user-provisioned cluster on bare metal]
62+
* xref:../../networking/accessing-hosts.adoc#accessing-hosts[Creating a bastion host to access {product-title} instances and the control plane nodes with SSH]
63+
* xref:../../installing/installing_bare_metal/bare-metal-expanding-the-cluster.adoc#replacing-a-bare-metal-control-plane-node_bare-metal-expanding[Replacing a bare-metal control plane node]
64+
65+
include::modules/dr-scenario-cluster-state-issues.adoc[leveloffset=+2]
66+
67+
// Recovering from expired control plane certificates
68+
include::modules/dr-recover-expired-control-plane-certs.adoc[leveloffset=+1]
69+
70+
include::modules/dr-testing-restore-procedures.adoc[leveloffset=+1]
3571

36-
xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/about-disaster-recovery.adoc#dr-testing-restore-procedures_about-dr[Testing restore procedures]:: Test the restore procedure to ensure that your automation and workload handle the new cluster state gracefully.
72+
[role="_additional-resources"]
73+
.Additional resources
74+
* xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[Restoring to a previous cluster state]

etcd/etcd-backup-restore/replace-unhealthy-etcd-member.adoc

+29-6
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,35 @@ If the control plane certificates are not valid on the member being replaced, th
1717
If a control plane node is lost and a new one is created, the etcd cluster Operator handles generating the new TLS certificates and adding the node as an etcd member.
1818
====
1919

20-
xref:../../backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.adoc#restore-identify-unhealthy-etcd-member_replacing-unhealthy-etcd-member[Identifying an unhealthy etcd member]:: Identify an unhealthy etcd member in your cluster.
20+
// Identifying an unhealthy etcd member
21+
include::modules/restore-identify-unhealthy-etcd-member.adoc[leveloffset=+1]
2122

22-
xref:../../backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.adoc#restore-determine-state-etcd-member_replacing-unhealthy-etcd-member[Determining the state of the unhealthy etcd member]:: Confirm why your etcd member is unhealthy by determining its state:
23+
// Determining the state of the unhealthy etcd member
24+
include::modules/restore-determine-state-etcd-member.adoc[leveloffset=+1]
2325

24-
* The machine for the etcd member is not running or its node is not ready
25-
* The etcd pod for the etcd member is crashlooping
26-
* The machine for a bare metal etcd member is not running or its node is not ready
26+
== Replacing the unhealthy etcd member
2727

28-
xref:../../backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.adoc#replacing-the-unhealthy-etcd-member[Replacing the unhealthy etcd member]:: Replace your etcd member by completing steps that are specific to the state of the etcd member.
28+
Depending on the state of your unhealthy etcd member, use one of the following procedures:
29+
30+
* Replacing an unhealthy etcd member whose machine is not running or whose node is not ready
31+
* Installing a primary control plane node on an unhealthy cluster
32+
* Replacing an unhealthy etcd member whose etcd pod is crashlooping
33+
* Replacing an unhealthy stopped baremetal etcd member
34+
35+
// Replacing an unhealthy etcd member whose machine is not running or whose node is not ready
36+
include::modules/restore-replace-stopped-etcd-member.adoc[leveloffset=+2]
37+
38+
[role="_additional-resources"]
39+
.Additional resources
40+
* xref:../../machine_management/control_plane_machine_management/cpmso-troubleshooting.adoc#cpmso-ts-etcd-degraded_cpmso-troubleshooting[Recovering a degraded etcd Operator]
41+
* link:https://docs.redhat.com/en/documentation/assisted_installer_for_openshift_container_platform/2024/html/installing_openshift_container_platform_with_the_assisted_installer/expanding-the-cluster#installing-primary-control-plane-node-unhealthy-cluster_expanding-the-cluster[Installing a primary control plane node on an unhealthy cluster]
42+
43+
// Replacing an unhealthy etcd member whose etcd pod is crashlooping
44+
include::modules/restore-replace-crashlooping-etcd-member.adoc[leveloffset=+2]
45+
46+
// Replacing an unhealthy baremetal stopped etcd member
47+
include::modules/restore-replace-stopped-baremetal-etcd-member.adoc[leveloffset=+2]
48+
49+
[role="_additional-resources"]
50+
.Additional resources
51+
* xref:../../machine_management/deleting-machine.adoc#machine-lifecycle-hook-deletion-etcd_deleting-machine[Quorum protection with machine lifecycle hooks]

0 commit comments

Comments
 (0)