Skip to content

CHASM: Non-Workflow Mutable State P1 #7595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 5, 2025
Merged

Conversation

yycptt
Copy link
Member

@yycptt yycptt commented Apr 10, 2025

What changed?

Changes are mainly on replication side:

  • Update mutable state GetCurrentVersion/StartVersion/CloseVersion() methods
  • Update mutable state executionInfo.LastEventTaskID. and make sure it's updated even if a transition doesn't generate any events.
  • Update state based replication logic to handle no event case (more changes in CHASM: Non-Workflow Mutable State P3 State Replicator #7700).

Why?

  • CHASM runs may not have events at all. We need to make sure logic continue to work in that case.

How did you test it?

  • Existing tests
  • Will have functional tests later when CHASM is ready to ensure things can work e2e.

Potential risks

Documentation

Is hotfix candidate?

@yycptt yycptt requested review from yux0 and xwduan April 10, 2025 00:07
if ms.IsWorkflowExecutionRunning() {
// Do NOT use ms.IsWorkflowExecutionRunning() for the check.
// Zombie workflow is not considered running but also not closed.
if ms.executionState.State != enumsspb.WORKFLOW_EXECUTION_STATE_COMPLETED {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to double check all the callers. This is breaking ndc workflow right now, which may call it when workflow is in zombie state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guaranteeing caller not calling this when workflow is zombie will require a big refactoring. For now, I special handled zombie case and returns LastWriteVersion instead.

transactionPolicy historyi.TransactionPolicy,
workflowEventsSeq []*persistence.WorkflowEvents,
) {
if transactionPolicy != historyi.TransactionPolicyActive {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we replicating the vector clock from active to passive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the LastRunningClock (was LastEventTaskID), is part of executionInfo and won't be sanitized during replication. We only check the clock when the failover version are the same, so we know the clock is from the same cluster and comparable.

@yycptt yycptt marked this pull request as ready for review April 15, 2025 22:17
@yycptt yycptt requested a review from a team as a code owner April 15, 2025 22:17
@yycptt yycptt requested review from yux0 and xwduan April 28, 2025 23:43
@yycptt yycptt changed the title [CHASM] Support Non-Workflow Mutable State: Part 1 CHASM: Non-Workflow Mutable State P1 May 2, 2025
@yycptt yycptt force-pushed the non-workflow-ms branch from 8dcaef2 to ed3e2cf Compare May 2, 2025 21:41
if ms.executionInfo.VersionHistories != nil {
return ms.currentVersion
}

if ms.transitionHistoryEnabled && len(ms.executionInfo.TransitionHistory) != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check this first before checking versionhistories?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not necessary. I guess those checks exists only because we want to support very very old workflows in DB which is created even before VersionHistory & xdc replication is a thing.
So for almost all workflows, VersionHistories will not a nil and the logic will just return ms.currentVersion.

I added those new logic just to be consistent with we already have, but TBH I don't really expect the logic to ever reach them...

@yycptt yycptt merged commit 2890e29 into temporalio:main May 5, 2025
51 checks passed
@yycptt yycptt deleted the non-workflow-ms branch May 5, 2025 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants