Skip to content

No longer cancel pending polls on shutdown#2199

Open
yuandrew wants to merge 6 commits intotemporalio:masterfrom
yuandrew:shutdown-poll-complete
Open

No longer cancel pending polls on shutdown#2199
yuandrew wants to merge 6 commits intotemporalio:masterfrom
yuandrew:shutdown-poll-complete

Conversation

@yuandrew
Copy link
Contributor

@yuandrew yuandrew commented Feb 21, 2026

What was changed

Plumbed worker_instance_key to all poll calls, and TaskQueueType and TaskQueue to the ShutdownWorker RPC call

If worker_poll_complete_on_shutdown namespace capability exists, then don't cancel outstanding poll requests on shutdown. Instead expect server to return an empty result.

Why?

During worker shutdown, the SDK cancels outstanding poll requests client-side by racing them against the shutdown token. This
creates a race condition: the server may dispatch a task to a poll at the same instant the SDK kills the connection, orphaning the
task until start_to_close timeout.

Checklist

  1. Closes

  2. How was this tested:

added test

  1. Any docs updates needed?

Note

Medium Risk
Touches core worker polling/shutdown behavior and RPC payloads; although gated by a namespace capability, misbehavior could impact task delivery or shutdown latency.

Overview
When the namespace advertises worker_poll_complete_on_shutdown, workers now stop canceling in-flight long-poll RPCs during shutdown and instead wait for polls to complete (expecting the server to return empty responses), reducing the chance of orphaned tasks.

This plumbs WorkerInstanceKey into Nexus polling, and expands ShutdownWorker to include the worker’s TaskQueue and active TaskQueueTypes; integration/dev-server config and tests are updated accordingly, including new unit/integration coverage for graceful shutdown behavior.

Written by Cursor Bugbot for commit 1d252b2. This will update automatically on new commits. Configure here.

@yuandrew yuandrew requested a review from a team as a code owner February 21, 2026 02:22
@Sushisource
Copy link
Member

LGTM but are we going to wait to merge this until there's a CLI tag we can use for an integration test?

@yuandrew
Copy link
Contributor Author

I don't think we have to, since it doesn't seem like this is integration-testable? Since we don't control server to coordinate this race

@cretz
Copy link
Contributor

cretz commented Feb 23, 2026

I don't think we have to, since it doesn't seem like this is integration-testable? Since we don't control server to coordinate this race

I think we should at least wait until we can manually test that it works with a released server before merging. No benefit to merging now before server impl, but a lot of potential downside if logic is wrong. I can understand if it's internal enough to not have a good thing to assert, but I'd still see if we can find something, e.g. gRPC interceptors, to confirm server behaves as expected.

yuandrew added 2 commits March 9, 2026 19:38
# Conflicts:
#	contrib/datadog/go.mod
#	contrib/opentelemetry/go.mod
#	contrib/opentelemetry/go.sum
#	test/go.mod
#	test/go.sum
@yuandrew
Copy link
Contributor Author

Tested locally with server built on main, need to wait for OSS release to get integration test to pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants