You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/cloud/high-availability/failovers.mdx
+35-10Lines changed: 35 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,8 +26,8 @@ This lets Workflow Executions continue with minimal interruptions or data loss.
26
26
You can also [manually initiate failovers](/cloud/high-availability/failovers) based on your situational monitoring or for testing.
27
27
28
28
Returning control from the replica to the primary is called a <ToolTipTermterm="failback" />.
29
-
The replica is active for a brief duration during an incident.
30
-
After the incident, Temporal fails back to the primary.
29
+
After a Temporal-managed failover, Temporal automatically fails back to the original region once it is healthy.
30
+
See [Returning to the primary with failbacks](#failbacks) for details on automatic and manual failback options.
31
31
32
32
## Failovers
33
33
@@ -45,9 +45,9 @@ This process is known as failover.
45
45
46
46
Failovers prevent data loss and application interruptions.
47
47
Existing Workflows continue, and new Workflows start as the incident is addressed.
48
-
Once the incident is resolved, Temporal Cloud performs a "failback," shifting Workflow Execution processing back to the original Namespace.
49
48
50
49
Temporal Cloud handles failovers automatically, ensuring continuity without manual intervention.
50
+
Once the incident is resolved, Temporal Cloud automatically performs a [failback](#failbacks), shifting Workflow Execution processing back to the original region.
51
51
52
52
<CaptionedImagesrc="/img/cloud/high-availability/failover.png"title="On failover, the replica becomes active and the Namespace endpoint directs access to it." />
53
53
@@ -283,7 +283,7 @@ Temporal manages retries for the failover workflow.
283
283
In the rare event that an internal error prevents the failover from completing, the Temporal on-call team is automatically paged to intervene and force the failover to completion.
284
284
285
285
Temporal fails over the primary to the replica.
286
-
When you're ready to fail back, follow these failover instructions to move the primary back to the original.
286
+
See [Returning to the primary with failbacks](#failbacks) for details on how and when failback occurs.
287
287
288
288
### Post-failover event information {#info}
289
289
@@ -294,16 +294,41 @@ After failover, the replica becomes active, taking over in the isolation domain
294
294
You don't need to monitor Temporal Cloud's failover response in real time.
295
295
Whenever there is a failover event, Temporal Cloud [notifies you via email](/cloud/notifications#admin-notifications)
296
296
297
-
### Returning to the primary with failbacks
297
+
### Returning to the primary with failbacks{#failbacks}
298
298
299
-
After Temporal-initiated failovers, Temporal Cloud shifts Workflow Execution processing back to the original region or isolation domain that was active before the incident once the incident is resolved.
300
-
This is called a "failback".
299
+
After a Temporal-managed (automatic) failover, Temporal Cloud automatically fails back to the original region once it is healthy.
300
+
Follow [Temporal's status page](https://status.temporal.io) for updates on the original region's health.
301
301
302
-
:::note
302
+
#### After a Temporal-managed failover
303
303
304
-
To failback a manually-initiated failover, follow the [Manual Failover](#manual-failovers) directions to failover back to the original primary.
304
+
When Temporal triggers an automatic failover due to an outage, Temporal will also trigger an automatic failback to the original region once the region recovers.
305
+
No action is required from you.
305
306
306
-
:::
307
+
If you prefer to manage failback yourself, you have two options:
308
+
309
+
-**Opt out of automatic failback (manage failback manually):**
310
+
[Disable Temporal-managed failovers](#disabling-temporal-initiated) on the Namespace.
311
+
When you're ready to fail back to the original region, [trigger a failover](#manual-failovers) to that region and then re-enable Temporal-managed failovers.
312
+
313
+
-**Stay on the new region permanently ("fail forward"):**
314
+
[Trigger a failover](#manual-failovers) to the region that is already active.
315
+
This tells Temporal that you want to treat the new region as your primary for as long as it's healthy.
316
+
Temporal-managed automatic failovers remain enabled, so Temporal will still protect you if the new region has an outage.
317
+
318
+
#### After a user-triggered failover
319
+
320
+
If you triggered a failover yourself during an outage (instead of relying on a Temporal-managed failover), Temporal will _not_ automatically fail back for you.
321
+
You must [trigger a failover](#manual-failovers) back to the original region when it is healthy.
322
+
Monitor [Temporal's status page](https://status.temporal.io) for updates on region health.
323
+
324
+
Automatic failback is only available after Temporal-managed (automatic) failovers.
325
+
326
+
#### How to check whether your Namespace will be automatically failed back
327
+
328
+
If you're not sure whether your Namespace will be automatically failed back, check the list of failovers in the Temporal Cloud Web UI on your Namespace's detail page:
329
+
330
+
- If the most recent failover was **Temporal-triggered**, then Temporal will automatically fail back the Namespace when the original region is healthy.
331
+
- If the most recent failover was **user-triggered**, then the Namespace will _not_ be automatically failed back. You must trigger the failback yourself.
0 commit comments