-
Notifications
You must be signed in to change notification settings - Fork 896
fix(test): eliminate data race in TestBatchWorkflowV2_TuneSignal #7892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -150,23 +150,28 @@ func TestBatchWorkflowV2_TuneSignal(t *testing.T) { | |
| var mu sync.Mutex | ||
| var capturedParams []BatchParams | ||
|
|
||
| // firstActivityDone is kept open while the test runs, which prevents the | ||
| // first activity mock goroutine from returning before the workflow has had | ||
| // a chance to process the tune signal. The Cadence test framework delivers | ||
| // CanceledError to the activity future independently of the goroutine, so | ||
| // blocking here does not prevent the workflow from completing. The channel | ||
| // is closed by t.Cleanup once the assertions are done. | ||
| // firstActivityDone gates the first activity mock goroutine so it does not | ||
| // return before the workflow has had a chance to process the tune signal. | ||
| // The Cadence test framework delivers CanceledError to the activity future | ||
| // independently of the goroutine, so blocking here does not prevent the | ||
| // workflow from completing. | ||
| firstActivityDone := make(chan struct{}) | ||
| t.Cleanup(func() { close(firstActivityDone) }) | ||
|
|
||
| // activityWG tracks running activity mock goroutines so we can wait for | ||
| // them to finish before reading capturedParams, avoiding a data race. | ||
| var activityWG sync.WaitGroup | ||
|
|
||
| env.OnActivity(batchActivityV2Name, mock.Anything, mock.Anything). | ||
| Return(func(_ context.Context, p BatchParams) (HeartBeatDetails, error) { | ||
| activityWG.Add(1) | ||
| defer activityWG.Done() | ||
|
|
||
| mu.Lock() | ||
| capturedParams = append(capturedParams, p) | ||
| n := len(capturedParams) | ||
| mu.Unlock() | ||
| if n == 1 { | ||
| // Block until the test is done. This prevents the future from | ||
| // Block until the workflow is done. This prevents the future from | ||
| // resolving with nil before the workflow processes the tune signal, | ||
| // which would cause the workflow to take the early-exit path. | ||
| <-firstActivityDone | ||
|
|
@@ -193,6 +198,11 @@ func TestBatchWorkflowV2_TuneSignal(t *testing.T) { | |
| assert.Equal(t, 8, result.SuccessCount) | ||
| assert.Equal(t, 3, result.CurrentPage) | ||
|
|
||
| // Release the first activity goroutine and wait for all activity goroutines | ||
| // to finish before reading capturedParams. | ||
| close(firstActivityDone) | ||
| activityWG.Wait() | ||
|
|
||
| mu.Lock() | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If all activities have finished, we don't need this lock, right?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed the final read-side lock. We still need the mutex while the two activity goroutines append into |
||
| captured := make([]BatchParams, len(capturedParams)) | ||
| copy(captured, capturedParams) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Bug: WaitGroup.Add(1) called inside goroutine instead of before it
At line 166,
activityWG.Add(1)is called inside the activity callback goroutine rather than before it starts. Per thesync.WaitGroupdocumentation, "calls with a positive delta that occur when the counter is zero must happen before a Wait." If the Go scheduler delays the goroutine start,activityWG.Wait()(line 204) could observe a zero counter and return beforeAdd(1)has executed, readingcapturedParamswhile the goroutine is still writing to it.In practice this is very unlikely because the Cadence test framework invokes the second activity synchronously during
ExecuteWorkflow(which completes beforeWaitis called), and the first activity must have started to block on the channel. Still, it violates the documented contract and could theoretically cause the race detector to fire under extreme scheduling.A cleaner approach: add both counts eagerly before the mock is invoked, or use
activityWG.Add(1)inside a wrapper that is called synchronously by the framework before spawning the goroutine.Suggested fix:
Was this helpful? React with 👍 / 👎 | Reply
gitar fixto apply this suggestion