Skip to content

time-skipping integration with cross-cluster replication#10138

Open
feiyang3cat wants to merge 3 commits intotemporalio:mainfrom
feiyang3cat:ts/replicaiton-new
Open

time-skipping integration with cross-cluster replication#10138
feiyang3cat wants to merge 3 commits intotemporalio:mainfrom
feiyang3cat:ts/replicaiton-new

Conversation

@feiyang3cat
Copy link
Copy Markdown
Contributor

@feiyang3cat feiyang3cat commented Apr 30, 2026

What changed?

  1. time skipping regenerates timer tasks in an idempotent way
  2. calling time-skipping regeneration in PartialRefresh
  3. add logic of passive timer queues for time skipping timer tasks

Why?

the goal is to make sure time skipping works correctly under all replication patterns (state-based, event-based)

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

@feiyang3cat feiyang3cat requested review from a team as code owners April 30, 2026 22:36
@feiyang3cat feiyang3cat changed the title timeskipping support replication wip: fit the time-skipping feature in replication Apr 30, 2026
@feiyang3cat feiyang3cat force-pushed the ts/replicaiton-new branch 6 times, most recently from c9e46d3 to 384f53a Compare May 1, 2026 00:57
// RegenerateTimerTasksForTimeSkipping regenerates the timer tasks for time skipping.
// This function is not idempotent, but when called twice, logically the timerTasks regenerated will have the same contents,
// and the only difference is the TaskID.
// TODO@time-skipping: currently not safe to call in replication context
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RegenerateTimerTasksForTimeSkipping is made idempotent for replication purpose

@feiyang3cat feiyang3cat force-pushed the ts/replicaiton-new branch 2 times, most recently from 2880ec6 to 8977071 Compare May 1, 2026 01:41
Comment thread service/history/workflow/task_refresher.go Outdated
@feiyang3cat feiyang3cat changed the title wip: fit the time-skipping feature in replication Integrate the time-skipping feature into the replication process May 1, 2026
@feiyang3cat feiyang3cat changed the title Integrate the time-skipping feature into the replication process wip: Integrate the time-skipping feature into the replication process May 1, 2026
return nil
}
return ms.taskGenerator.RegenerateTimerTasksForTimeSkipping()
case historyi.TransactionPolicyPassive:
Copy link
Copy Markdown
Contributor Author

@feiyang3cat feiyang3cat May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yux0 @robholland
right now, when closeTransaction, the active cluster calls regenerateTimerTasks.
And should regenerateTimerTasks be called for passive cluster as well so that event-state based replication can work correctly?

@feiyang3cat feiyang3cat changed the title wip: Integrate the time-skipping feature into the replication process Integrate the time-skipping feature into the replication process May 1, 2026
@feiyang3cat feiyang3cat changed the title Integrate the time-skipping feature into the replication process wip:Integrate the time-skipping feature into the replication process May 1, 2026
@feiyang3cat feiyang3cat changed the title wip:Integrate the time-skipping feature into the replication process draft:Integrate the time-skipping feature into the replication process May 1, 2026
@feiyang3cat feiyang3cat force-pushed the ts/replicaiton-new branch 2 times, most recently from 91c75c9 to 77a5b4f Compare May 4, 2026 17:48
@feiyang3cat feiyang3cat changed the title draft:Integrate the time-skipping feature into the replication process time-skipping integration with cross-cluster replication May 4, 2026
@feiyang3cat feiyang3cat force-pushed the ts/replicaiton-new branch from 77a5b4f to 273f09c Compare May 4, 2026 17:52
return nil
}

// applyIncomingTimeSkippingInfo is used in state-based replication
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used for state-based replication

@@ -235,17 +233,32 @@ func (t *timerQueueStandbyTaskExecutor) discardChasmTask(
)
}

Copy link
Copy Markdown
Contributor Author

@feiyang3cat feiyang3cat May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the active cluster, this TimeSkippingTimerTask will check time-skipping should be disabled (the fields are in mutable state); it seems to me in the passive cluster, it shall be noop OR something similar to executeUserTimerTimeoutTask with processTimer?

}

// ApplyWorkflowExecutionTimeSkippingTransitionedEvent applies the WorkflowExecutionTimeSkippingTransitionedEvent to the mutable state.
func (ms *MutableStateImpl) ApplyWorkflowExecutionTimeSkippingTransitionedEvent(ctx context.Context, event *historypb.HistoryEvent) error {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for @yux0 @robholland :

  1. for state-based replication: partial refresh is called so the timer tasks are generated
  2. for event-based replication: how could the timer tasks be generated
    (currently, the ApplyXXXEvents doesn't generate timer tasks in the active cluster and the active cluster calls RegenerateTimerTasksForTimeSkipping at the time the time of closeTransaction -> prepareTasks

and should we call RegenerateTimerTasksForTimeSkipping both for Active and Passive policy?

func (ms *MutableStateImpl) closeTransactionRegenerateTimerTasksForTimeSkipping(
	transactionPolicy historyi.TransactionPolicy,
) error {
	switch transactionPolicy {
	case historyi.TransactionPolicyActive:
		if !ms.IsWorkflowExecutionRunning() {
			return nil
		}
		return ms.taskGenerator.RegenerateTimerTasksForTimeSkipping()
	case historyi.TransactionPolicyPassive:
		return nil
	default:
		return serviceerror.NewInternalf("unknown transaction policy: %v", transactionPolicy)
	}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still unclear on why we need to support event-based replication for a new feature.

State-based replication is fully enabled in cloud. Enabled by default in oss, so when this feature is available, oss users will be using state-based replication as well. S2C migration for any new release (>= 1.32) have to use state-based replication as well given the potential usage of chasm executions (which will be enabled by default).

@feiyang3cat feiyang3cat force-pushed the ts/replicaiton-new branch from 273f09c to 070939e Compare May 4, 2026 21:52
@feiyang3cat feiyang3cat force-pushed the ts/replicaiton-new branch from 070939e to 236252a Compare May 4, 2026 22:14
@feiyang3cat feiyang3cat force-pushed the ts/replicaiton-new branch from 236252a to 3249657 Compare May 4, 2026 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants