time-skipping integration with cross-cluster replication#10138
time-skipping integration with cross-cluster replication#10138feiyang3cat wants to merge 3 commits intotemporalio:mainfrom
Conversation
c9e46d3 to
384f53a
Compare
| // RegenerateTimerTasksForTimeSkipping regenerates the timer tasks for time skipping. | ||
| // This function is not idempotent, but when called twice, logically the timerTasks regenerated will have the same contents, | ||
| // and the only difference is the TaskID. | ||
| // TODO@time-skipping: currently not safe to call in replication context |
There was a problem hiding this comment.
RegenerateTimerTasksForTimeSkipping is made idempotent for replication purpose
2880ec6 to
8977071
Compare
| return nil | ||
| } | ||
| return ms.taskGenerator.RegenerateTimerTasksForTimeSkipping() | ||
| case historyi.TransactionPolicyPassive: |
There was a problem hiding this comment.
@yux0 @robholland
right now, when closeTransaction, the active cluster calls regenerateTimerTasks.
And should regenerateTimerTasks be called for passive cluster as well so that event-state based replication can work correctly?
91c75c9 to
77a5b4f
Compare
77a5b4f to
273f09c
Compare
| return nil | ||
| } | ||
|
|
||
| // applyIncomingTimeSkippingInfo is used in state-based replication |
There was a problem hiding this comment.
this is used for state-based replication
| @@ -235,17 +233,32 @@ func (t *timerQueueStandbyTaskExecutor) discardChasmTask( | |||
| ) | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
in the active cluster, this TimeSkippingTimerTask will check time-skipping should be disabled (the fields are in mutable state); it seems to me in the passive cluster, it shall be noop OR something similar to executeUserTimerTimeoutTask with processTimer?
| } | ||
|
|
||
| // ApplyWorkflowExecutionTimeSkippingTransitionedEvent applies the WorkflowExecutionTimeSkippingTransitionedEvent to the mutable state. | ||
| func (ms *MutableStateImpl) ApplyWorkflowExecutionTimeSkippingTransitionedEvent(ctx context.Context, event *historypb.HistoryEvent) error { |
There was a problem hiding this comment.
Question for @yux0 @robholland :
- for state-based replication: partial refresh is called so the timer tasks are generated
- for event-based replication: how could the timer tasks be generated
(currently, the ApplyXXXEvents doesn't generate timer tasks in the active cluster and the active cluster calls RegenerateTimerTasksForTimeSkipping at the time the time of closeTransaction -> prepareTasks
and should we call RegenerateTimerTasksForTimeSkipping both for Active and Passive policy?
func (ms *MutableStateImpl) closeTransactionRegenerateTimerTasksForTimeSkipping(
transactionPolicy historyi.TransactionPolicy,
) error {
switch transactionPolicy {
case historyi.TransactionPolicyActive:
if !ms.IsWorkflowExecutionRunning() {
return nil
}
return ms.taskGenerator.RegenerateTimerTasksForTimeSkipping()
case historyi.TransactionPolicyPassive:
return nil
default:
return serviceerror.NewInternalf("unknown transaction policy: %v", transactionPolicy)
}
}
There was a problem hiding this comment.
I am still unclear on why we need to support event-based replication for a new feature.
State-based replication is fully enabled in cloud. Enabled by default in oss, so when this feature is available, oss users will be using state-based replication as well. S2C migration for any new release (>= 1.32) have to use state-based replication as well given the potential usage of chasm executions (which will be enabled by default).
273f09c to
070939e
Compare
070939e to
236252a
Compare
236252a to
3249657
Compare
What changed?
Why?
the goal is to make sure time skipping works correctly under all replication patterns (state-based, event-based)
How did you test it?