No longer cancel pending polls on shutdown#2199
No longer cancel pending polls on shutdown#2199yuandrew wants to merge 6 commits intotemporalio:masterfrom
Conversation
|
LGTM but are we going to wait to merge this until there's a CLI tag we can use for an integration test? |
|
I don't think we have to, since it doesn't seem like this is integration-testable? Since we don't control server to coordinate this race |
I think we should at least wait until we can manually test that it works with a released server before merging. No benefit to merging now before server impl, but a lot of potential downside if logic is wrong. I can understand if it's internal enough to not have a good thing to assert, but I'd still see if we can find something, e.g. gRPC interceptors, to confirm server behaves as expected. |
# Conflicts: # contrib/datadog/go.mod # contrib/opentelemetry/go.mod # contrib/opentelemetry/go.sum # test/go.mod # test/go.sum
|
Tested locally with server built on main, need to wait for OSS release to get integration test to pass |
What was changed
Plumbed
worker_instance_keyto all poll calls, andTaskQueueTypeandTaskQueueto theShutdownWorkerRPC callIf
worker_poll_complete_on_shutdownnamespace capability exists, then don't cancel outstanding poll requests on shutdown. Instead expect server to return an empty result.Why?
During worker shutdown, the SDK cancels outstanding poll requests client-side by racing them against the shutdown token. This
creates a race condition: the server may dispatch a task to a poll at the same instant the SDK kills the connection, orphaning the
task until start_to_close timeout.
Checklist
Closes
How was this tested:
added test
Note
Medium Risk
Touches core worker polling/shutdown behavior and RPC payloads; although gated by a namespace capability, misbehavior could impact task delivery or shutdown latency.
Overview
When the namespace advertises
worker_poll_complete_on_shutdown, workers now stop canceling in-flight long-poll RPCs during shutdown and instead wait for polls to complete (expecting the server to return empty responses), reducing the chance of orphaned tasks.This plumbs
WorkerInstanceKeyinto Nexus polling, and expandsShutdownWorkerto include the worker’sTaskQueueand activeTaskQueueTypes; integration/dev-server config and tests are updated accordingly, including new unit/integration coverage for graceful shutdown behavior.Written by Cursor Bugbot for commit 1d252b2. This will update automatically on new commits. Configure here.