Drain polled tasks on shutdown by yuandrew · Pull Request #2261 · temporalio/sdk-go

yuandrew · 2026-03-31T23:30:04Z

What was changed

Refactored worker shutdown to use a two-stage approach: pollers shut down first, then the task dispatcher drains any remaining tasks before exiting. This ensures tasks polled during shutdown are processed rather than silently dropped.

Key changes:

Added pollerWG to baseWorker to track poller goroutines separately from the global stopWG
A closer goroutine waits for all pollers to finish, then closes taskQueueCh
runTaskDispatcher now ranges over taskQueueCh
pollTask always sends to taskQueueCh
Removed the 5s timeout hack in doPoll from Fix test flakes #2253 (no longer needed)

Why?

PR #2199 changed shutdown to let the server complete in-flight polls, instead of cancelling them. This exposed a pre-existing race when a poller receives a task during shutdown, Go would silently dropping the task. The dispatcher had the same issue — it could exit on stopCh before reading pending tasks from the channel.

This aligns the Go SDK's shutdown with how Core SDK handles it:

Set a flag so pollers stop polling after their current attempt
Close channels from pollers → task processing
Wait for all in-flight tasks to complete & channels to be empty

Checklist

Closes Drain polled tasks on shutdown #1197
How was this tested:
- New unit test TestTaskNotDroppedDuringShutdown — verifies a task polled during shutdown is processed, not dropped
- Existing TestDoPollGracefulShutdown — validates both graceful and legacy poll completion
Any docs updates needed?
No

Note

Medium Risk
Changes worker shutdown ordering and channel lifecycle, which is concurrency-sensitive and could impact task processing/Stop() behavior under load or broken connections.

Overview
Refactors worker shutdown into a two-stage drain: pollers are tracked via a new pollerWG, and once all pollers exit the worker closes taskQueueCh so the dispatcher can range and drain remaining tasks before exiting.

Adjusts polling/shutdown semantics to avoid dropped tasks: pollTask now always sends polled tasks to taskQueueCh (no stop-select drop), runTaskDispatcher processes queued polled tasks even after limiter cancellation, and basePoller.doPoll removes the prior 5s shutdown timer hack and simply waits for the poll to complete when workerPollCompleteOnShutdown is enabled.

Adds TestTaskNotDroppedDuringShutdown to validate a task returned during shutdown is still dispatched/processed, and tweaks an integration-test sleep comment/timing guidance.

^{Reviewed by Cursor Bugbot for commit 57fdffe. Bugbot is set up for automated code reviews on this repo. Configure here.}

…hutdown-refactor

Sushisource · 2026-04-01T16:59:37Z

-		select {
-		case bw.taskQueueCh <- &polledTask{task: task, permit: slotPermit}:
-			didSendTask = true
-		case <-bw.stopCh:


Just checking the stop channel is still used elsewhere since we removed it in two spots

yep! still used in plenty of other places

yuandrew · 2026-04-01T23:22:47Z

recheck

cursor · 2026-04-02T00:27:41Z

 	bw.limiterContextCancel()

+	// Wait for pollers to finish. (pollTaskServiceTimeOut) bounds this if the connection is broken.
+	bw.pollerWG.Wait()


Unbounded pollerWG.Wait() bypasses user-configured stopTimeout

Medium Severity

bw.pollerWG.Wait() in Stop() blocks without any timeout, and runs before awaitWaitGroup(&bw.stopWG, bw.options.stopTimeout). Combined with doPoll now waiting unconditionally on <-doneC (bounded only by pollTaskServiceTimeOut = 70s), Stop() can block for up to 70 seconds before the user's stopTimeout even begins counting. Previously, a 5-second fallback cancellation bounded this. In failure scenarios (broken gRPC connection, unresponsive server), total Stop() duration becomes ~70s + stopTimeout instead of just stopTimeout.

Additional Locations (1)

internal/internal_task_pollers.go#L316-L317

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

This reverts commit be5c0e4.

Allow for task finishing on shutdown

dd4de56

yuandrew changed the title ~~Allow for task finishing on shutdown~~ Drain polled tasks on shutdown Mar 31, 2026

yuandrew and others added 3 commits March 31, 2026 16:31

Merge branch 'master' into shutdown-refactor

e8c6bff

go run . check

ae7b8f8

Merge branch 'shutdown-refactor' of github.com:yuandrew/sdk-go into s…

48ebedf

…hutdown-refactor

yuandrew marked this pull request as ready for review March 31, 2026 23:36

yuandrew requested a review from a team as a code owner March 31, 2026 23:36

Make test match prod behavior

5bdf128

Sushisource approved these changes Apr 1, 2026

View reviewed changes

Merge branch 'master' into shutdown-refactor

8d6418d

Fix test race with new behavior

29b355a

yuandrew force-pushed the shutdown-refactor branch from b930672 to 29b355a Compare April 1, 2026 23:28

Prevent zombie poller goroutines

d1f9d7c

cursor Bot reviewed Apr 2, 2026

View reviewed changes

add debug logging

be5c0e4

cursor Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread internal/internal_task_pollers.go Outdated

yuandrew and others added 2 commits April 30, 2026 08:48

Merge branch 'main' into shutdown-refactor

850ffba

Revert "add debug logging"

57fdffe

This reverts commit be5c0e4.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drain polled tasks on shutdown#2261

Drain polled tasks on shutdown#2261
yuandrew wants to merge 11 commits intotemporalio:mainfrom
yuandrew:shutdown-refactor

yuandrew commented Mar 31, 2026 •

edited by cursor Bot

Loading

Uh oh!

Sushisource Apr 1, 2026

Uh oh!

yuandrew Apr 1, 2026

Uh oh!

yuandrew commented Apr 1, 2026

Uh oh!

cursor Bot Apr 2, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuandrew commented Mar 31, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was changed

Why?

Checklist

Uh oh!

Sushisource Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

yuandrew Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

yuandrew commented Apr 1, 2026

Uh oh!

cursor Bot Apr 2, 2026

Choose a reason for hiding this comment

Unbounded pollerWG.Wait() bypasses user-configured stopTimeout

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuandrew commented Mar 31, 2026 •

edited by cursor Bot

Loading

Unbounded `pollerWG.Wait()` bypasses user-configured `stopTimeout`