Skip to content

Comments

[Reset] Handle completion events from restarted children#7295

Closed
gow wants to merge 2 commits intomainfrom
cg/reset_before_child_init_7
Closed

[Reset] Handle completion events from restarted children#7295
gow wants to merge 2 commits intomainfrom
cg/reset_before_child_init_7

Conversation

@gow
Copy link
Contributor

@gow gow commented Feb 7, 2025

What changed?

In this PR we are doing 2 things,

  1. Cleaning up the entries in ChildrenInitializedPostResetPoint
  2. Verifying that the results are indeed from the correct restarted child before accepting the completion event.

Why?

  • Cleans up the mutable state and reduces bloat.
  • Avoids a race condition where a old child completed in between the time when new child Init is recorded and when it is actually started.

How did you test it?

Added unit tests + manual testing.

Potential risks

N/A

Documentation

Pending

Is hotfix candidate?

No

@gow gow requested a review from a team as a code owner February 7, 2025 20:00
@gow gow marked this pull request as draft February 7, 2025 20:00
@gow gow requested a review from yycptt February 7, 2025 20:00
@gow gow marked this pull request as ready for review February 7, 2025 22:18
return nil, consts.ErrChildExecutionNotFound
}
initiatedAttr := initiatedEvent.GetStartChildWorkflowExecutionInitiatedEventAttributes()
childID := fmt.Sprintf("%s:%s", initiatedAttr.GetWorkflowType().Name, initiatedAttr.GetWorkflowId())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we really should create a util function for this concatenation logic. I think I've seen it in 3 different places.

childrenInitializedAfterResetPoint := mutableState.GetChildrenInitializedPostResetPoint()
if len(childrenInitializedAfterResetPoint) > 0 {
// This parent was reset and it also has some children that potentially were restarted.
initiatedEvent, err := mutableState.GetChildExecutionInitiatedEvent(ctx, parentInitiatedID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I don't think we need to load the initiatedEvent. Both child workflow type and child workflowID are available in ChildExecutionInfo in mutable state I think.

}
// The results are from the child that this parent initiated. We should stop tracking the child and process the result now.
delete(childrenInitializedAfterResetPoint, childID)
mutableState.SetChildrenInitializedPostResetPoint(childrenInitializedAfterResetPoint)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz update ms.approximateSize when setting the reset point.

@github-actions
Copy link

This PR was marked as stale. Please update or close it.

@github-actions github-actions bot added the stale label Jun 11, 2025
@yiminc yiminc closed this Aug 11, 2025
@yiminc yiminc deleted the cg/reset_before_child_init_7 branch August 11, 2025 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants