Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CHASM] Support Non-Workflow Mutable State: Part 1 #7595

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

yycptt
Copy link
Member

@yycptt yycptt commented Apr 10, 2025

What changed?

Changes are mainly on replication side:

  • Update mutable state GetCurrentVersion/StartVersion/CloseVersion() methods
  • Update mutable state executionInfo.LastEventTaskID. and make sure it's updated even if a transition doesn't generate any events.
  • Update state based replication logic to handle no event case.

Why?

  • CHASM runs may not have events at all. We need to make sure logic continue to work in that case.

How did you test it?

Potential risks

Documentation

Is hotfix candidate?

@yycptt yycptt requested review from yux0 and xwduan April 10, 2025 00:07

// TODO: we may need to handle the case where mutable state only starts to generate events during middle of its execution.
// In that case targetVersionHistories maybe empty and sourceVersionHistories is not empty.
// The LCA logic doesn't work in that case today.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use transition history in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no quite following. This method is specifically for dealing with events, and see which events are missing on the target side. Transition history doesn't give any info re. events.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distinguish this case seems straight forward:
target transition history is not empty but target version history is empty?

if err != nil {
return 0, err

if !versionhistory.IsEmptyVersionHistory(versionHistory) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use IsCurrentVersionHistoryEmpty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to get the current version history (then to get the first item) if it's not empty.

}
}

return common.EmptyVersion, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this expected in some case to return empty version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we have bunch of logic probably trying to handle backward compatibility (by checking for version history is nil) and return EmptyVersion. Like GetCurrentVersion(). If you have any context on that, plz let me know.

I'd say for any new workflows/chasm runs, it's not expected to reach this line (EmptyVersion can still be returned above for local namespace). For very old workflows, i am not sure.

if ms.IsWorkflowExecutionRunning() {
// Do NOT use ms.IsWorkflowExecutionRunning() for the check.
// Zombie workflow is not considered running but also not closed.
if ms.executionState.State != enumsspb.WORKFLOW_EXECUTION_STATE_COMPLETED {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to double check all the callers. This is breaking ndc workflow right now, which may call it when workflow is in zombie state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guaranteeing caller not calling this when workflow is zombie will require a big refactoring. For now, I special handled zombie case and returns LastWriteVersion instead.

) {
return historyi.TransactionPolicyActive, serviceerror.NewInternal("Workflow cannot suppress workflow by older workflow")
}

// if workflow is in zombie or finished state, keep as is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move those lines seems changed semantic a little bit, what's the reason of moving?


// TODO: we may need to handle the case where mutable state only starts to generate events during middle of its execution.
// In that case targetVersionHistories maybe empty and sourceVersionHistories is not empty.
// The LCA logic doesn't work in that case today.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distinguish this case seems straight forward:
target transition history is not empty but target version history is empty?

transactionPolicy historyi.TransactionPolicy,
workflowEventsSeq []*persistence.WorkflowEvents,
) {
if transactionPolicy != historyi.TransactionPolicyActive {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we replicating the vector clock from active to passive?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants