Skip to content

Fix canceling external workflows that are children in testsuite#1968

Merged
yuandrew merged 11 commits intotemporalio:masterfrom
yuandrew:issue-1961-fix
Feb 26, 2026
Merged

Fix canceling external workflows that are children in testsuite#1968
yuandrew merged 11 commits intotemporalio:masterfrom
yuandrew:issue-1961-fix

Conversation

@yuandrew
Copy link
Copy Markdown
Contributor

@yuandrew yuandrew commented Jun 9, 2025

What was changed

Dispatch a new coroutine with workflow context when canceling external workflow in workflow testsuite

Why?

Fix testsuite specific error

Checklist

  1. Closes Getting a strange error when writing tests which does not occur in real temporal deployment #1961

  2. How was this tested:
    Added test


Note

Medium Risk
Changes test-suite cancellation scheduling for child workflows by invoking workflowCancelHandler inside the dispatcher coroutine, which could affect ordering/timing of cancellations in unit tests but is limited to the test environment.

Overview
Prevents a testsuite-only panic when a child workflow cancels itself via RequestCancelExternalWorkflow by deferring the workflowCancelHandler (and onChildWorkflowCanceledListener) into a new dispatcher coroutine when the workflow is a syncWorkflowDefinition child.

Adds a regression test that starts a child workflow with a cancellable timer, requests cancellation via RequestCancelExternalWorkflow, and yields to ensure the cancel callback is processed without crashing.

Written by Cursor Bugbot for commit 9a72279. This will update automatically on new commits. Configure here.

@yuandrew yuandrew requested a review from a team as a code owner June 9, 2025 15:57
@Quinn-With-Two-Ns
Copy link
Copy Markdown
Contributor

Do you know why the feature tests are failing for your PR?

@yuandrew
Copy link
Copy Markdown
Contributor Author

Seems like it was a transient issue, features tests are passing now after merging with master

Comment thread internal/internal_workflow_testsuite.go Outdated
env.postCallback(func() {
env.onChildWorkflowCanceledListener(env.workflowInfo)
}, false)
sd.dispatcher.NewCoroutine(sd.rootCtx, "cancel-self", true, func(ctx Context) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't running the postCallback enough?

Copy link
Copy Markdown
Contributor Author

@yuandrew yuandrew Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment explaining why this would be needed, not sure it's worth a larger refactor, lmk if you think otherwise

// The way testWorkflowEnvironment is setup today, we close the child workflow dispatcher before calling
// the workflowCancelHandler. A larger refactor would be needed to handle this similar to non-test code.
// Maybe worth doing when https://github.com/temporalio/go-sdk/issues/50 is tackled.

@yuandrew yuandrew changed the title Fix canceling external workflow in Selector Fix canceling external workflows that are children in testsuite Jul 14, 2025
Comment thread internal/internal_workflow_testsuite.go Outdated
// the workflowCancelHandler. A larger refactor would be needed to handle this similar to non-test code.
// Maybe worth doing when https://github.com/temporalio/go-sdk/issues/50 is tackled.
if sd, ok := env.workflowDef.(*syncWorkflowDefinition); ok && env.isChildWorkflow() {
sd.dispatcher.NewCoroutine(sd.rootCtx, "cancel-self", true, func(ctx Context) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this run immediately? Any compatibly concern with running this in a coroutine now and changing the sequence with onChildWorkflowCanceledListener?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good callout, the onChildWorkflowCanceledListener logic should only run during the coroutine we spawn

Comment thread internal/internal_workflow_testsuite.go
Comment thread internal/internal_workflow_testsuite.go
Comment thread internal/workflow_testsuite_test.go
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Comment thread internal/internal_workflow_testsuite.go
Comment thread internal/workflow_testsuite_test.go Outdated
}

// Give the workflow time to finish canceling the child workflow
return Sleep(ctx, 1*time.Second)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we poll the other workflow until it gets cancelled if we really need to give time for it to cancel? Maybe poll a generous overall time (5 seconds) and then either return on intentionally fail the test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was wrong, this is actually important part of the test that uses ctx to trigger the illegal access from outside of workflow context without the fix. Updated the comment

@yuandrew yuandrew merged commit 0b10a3a into temporalio:master Feb 26, 2026
22 of 24 checks passed
@yuandrew yuandrew deleted the issue-1961-fix branch February 26, 2026 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Getting a strange error when writing tests which does not occur in real temporal deployment

3 participants