Skip to content

Commit 2db2673

Browse files
Clarifying the behavior of automatic failovers (#4391)
Co-authored-by: Brian MacDonald <brian.macdonald@temporal.io>
1 parent a0efa3b commit 2db2673

1 file changed

Lines changed: 35 additions & 10 deletions

File tree

docs/cloud/high-availability/failovers.mdx

Lines changed: 35 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ This lets Workflow Executions continue with minimal interruptions or data loss.
2626
You can also [manually initiate failovers](/cloud/high-availability/failovers) based on your situational monitoring or for testing.
2727

2828
Returning control from the replica to the primary is called a <ToolTipTerm term="failback" />.
29-
The replica is active for a brief duration during an incident.
30-
After the incident, Temporal fails back to the primary.
29+
After a Temporal-managed failover, Temporal automatically fails back to the original region once it is healthy.
30+
See [Returning to the primary with failbacks](#failbacks) for details on automatic and manual failback options.
3131

3232
## Failovers
3333

@@ -45,9 +45,9 @@ This process is known as failover.
4545

4646
Failovers prevent data loss and application interruptions.
4747
Existing Workflows continue, and new Workflows start as the incident is addressed.
48-
Once the incident is resolved, Temporal Cloud performs a "failback," shifting Workflow Execution processing back to the original Namespace.
4948

5049
Temporal Cloud handles failovers automatically, ensuring continuity without manual intervention.
50+
Once the incident is resolved, Temporal Cloud automatically performs a [failback](#failbacks), shifting Workflow Execution processing back to the original region.
5151

5252
<CaptionedImage src="/img/cloud/high-availability/failover.png" title="On failover, the replica becomes active and the Namespace endpoint directs access to it." />
5353

@@ -283,7 +283,7 @@ Temporal manages retries for the failover workflow.
283283
In the rare event that an internal error prevents the failover from completing, the Temporal on-call team is automatically paged to intervene and force the failover to completion.
284284

285285
Temporal fails over the primary to the replica.
286-
When you're ready to fail back, follow these failover instructions to move the primary back to the original.
286+
See [Returning to the primary with failbacks](#failbacks) for details on how and when failback occurs.
287287

288288
### Post-failover event information {#info}
289289

@@ -294,16 +294,41 @@ After failover, the replica becomes active, taking over in the isolation domain
294294
You don't need to monitor Temporal Cloud's failover response in real time.
295295
Whenever there is a failover event, Temporal Cloud [notifies you via email](/cloud/notifications#admin-notifications)
296296

297-
### Returning to the primary with failbacks
297+
### Returning to the primary with failbacks {#failbacks}
298298

299-
After Temporal-initiated failovers, Temporal Cloud shifts Workflow Execution processing back to the original region or isolation domain that was active before the incident once the incident is resolved.
300-
This is called a "failback".
299+
After a Temporal-managed (automatic) failover, Temporal Cloud automatically fails back to the original region once it is healthy.
300+
Follow [Temporal's status page](https://status.temporal.io) for updates on the original region's health.
301301

302-
:::note
302+
#### After a Temporal-managed failover
303303

304-
To failback a manually-initiated failover, follow the [Manual Failover](#manual-failovers) directions to failover back to the original primary.
304+
When Temporal triggers an automatic failover due to an outage, Temporal will also trigger an automatic failback to the original region once the region recovers.
305+
No action is required from you.
305306

306-
:::
307+
If you prefer to manage failback yourself, you have two options:
308+
309+
- **Opt out of automatic failback (manage failback manually):**
310+
[Disable Temporal-managed failovers](#disabling-temporal-initiated) on the Namespace.
311+
When you're ready to fail back to the original region, [trigger a failover](#manual-failovers) to that region and then re-enable Temporal-managed failovers.
312+
313+
- **Stay on the new region permanently ("fail forward"):**
314+
[Trigger a failover](#manual-failovers) to the region that is already active.
315+
This tells Temporal that you want to treat the new region as your primary for as long as it's healthy.
316+
Temporal-managed automatic failovers remain enabled, so Temporal will still protect you if the new region has an outage.
317+
318+
#### After a user-triggered failover
319+
320+
If you triggered a failover yourself during an outage (instead of relying on a Temporal-managed failover), Temporal will _not_ automatically fail back for you.
321+
You must [trigger a failover](#manual-failovers) back to the original region when it is healthy.
322+
Monitor [Temporal's status page](https://status.temporal.io) for updates on region health.
323+
324+
Automatic failback is only available after Temporal-managed (automatic) failovers.
325+
326+
#### How to check whether your Namespace will be automatically failed back
327+
328+
If you're not sure whether your Namespace will be automatically failed back, check the list of failovers in the Temporal Cloud Web UI on your Namespace's detail page:
329+
330+
- If the most recent failover was **Temporal-triggered**, then Temporal will automatically fail back the Namespace when the original region is healthy.
331+
- If the most recent failover was **user-triggered**, then the Namespace will _not_ be automatically failed back. You must trigger the failback yourself.
307332

308333
## Disabling Temporal-initiated failovers {#disabling-temporal-initiated}
309334

0 commit comments

Comments
 (0)