Skip to content

Commit b8d9428

Browse files
fix: clean up Temporal server-side versioning data on TWD deletion (#240)
- Add a finalizer to `TemporalWorkerDeployment` to run Temporal server-side cleanup before K8s deletion - Add a finalizer to `TemporalConnection` to prevent it from being deleted while any TWD still references it - On TWD deletion, set current version to unversioned, clear ramping version, and delete registered versions ## Problem When a `TemporalWorkerDeployment` CRD is deleted (e.g., switching back to plain Deployments), the Temporal server retains the build ID routing configuration. The matching service continues routing new tasks to the deleted build ID's physical queue, while unversioned workers poll a different physical queue. Tasks sit in `Scheduled` state indefinitely with no errors. A secondary race condition exists: Helm deletes both the `TemporalConnection` and `TWD` in the same upgrade. Without the connection, the controller cannot talk to Temporal to clean up. This is solved by adding a finalizer to the `TemporalConnection` that blocks its deletion until all referencing TWDs are gone. ## Changes **`internal/controller/worker_controller.go`:** **TWD finalizer (`temporal.io/worker-deployment-cleanup`):** - Added to all TWD resources during normal reconciliation - On deletion, triggers `handleDeletion()` which: 1. Sets the current version to unversioned (`BuildID: ""`) -- the critical step that unblocks task dispatch 2. Clears any ramping version 3. Deletes all registered versions with `SkipDrainage: true` 4. Attempts to delete the deployment record itself 5. Removes the connection finalizer if no other TWDs reference it 6. Removes its own finalizer, allowing K8s to complete deletion **TemporalConnection finalizer (`temporal.io/connection-in-use`):** - Added to the `TemporalConnection` during normal TWD reconciliation via `ensureConnectionFinalizer()` - Prevents the connection from being deleted while any TWD still references it - Removed by `removeConnectionFinalizerIfUnused()` during TWD deletion, after checking no other TWDs in the same namespace reference the connection - Guarantees the connection is always available during TWD cleanup -- no race condition with Helm deleting both resources simultaneously **RBAC updates:** - Added `update;patch` verbs for `temporalconnections` (was `get;list;watch`) - Added `update` verb for `temporalconnections/finalizers` ## Deletion flow ``` Helm upgrade (TWD disabled) | v Helm deletes TWD CRD + TemporalConnection CRD simultaneously | +--> TemporalConnection: has finalizer, K8s sets DeletionTimestamp but blocks deletion | +--> TWD: has finalizer, K8s sets DeletionTimestamp, triggers Reconcile | v handleDeletion() runs: 1. Fetches TemporalConnection (guaranteed to exist via finalizer) 2. Connects to Temporal server 3. Sets current version to unversioned 4. Deletes versions 5. Removes connection finalizer (no other TWDs reference it) 6. Removes TWD finalizer | v TWD deleted by K8s | v TemporalConnection: no more finalizers, deleted by K8s ``` Issue #55 Closes #166 --------- Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
1 parent 35371a9 commit b8d9428

5 files changed

Lines changed: 625 additions & 16 deletions

File tree

helm/temporal-worker-controller-crds/templates/temporal.io_temporalworkerdeployments.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3753,6 +3753,10 @@ spec:
37533753
type: integer
37543754
signerName:
37553755
type: string
3756+
userAnnotations:
3757+
additionalProperties:
3758+
type: string
3759+
type: object
37563760
required:
37573761
- keyType
37583762
- signerName
@@ -3947,6 +3951,18 @@ spec:
39473951
x-kubernetes-list-map-keys:
39483952
- name
39493953
x-kubernetes-list-type: map
3954+
workloadRef:
3955+
properties:
3956+
name:
3957+
type: string
3958+
podGroup:
3959+
type: string
3960+
podGroupReplicaKey:
3961+
type: string
3962+
required:
3963+
- name
3964+
- podGroup
3965+
type: object
39503966
required:
39513967
- containers
39523968
type: object

helm/temporal-worker-controller/templates/rbac.yaml

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -107,17 +107,8 @@ rules:
107107
- temporal.io
108108
resources:
109109
- temporalconnections
110+
- workerresourcetemplates
110111
verbs:
111-
- get
112-
- list
113-
- watch
114-
- apiGroups:
115-
- temporal.io
116-
resources:
117-
- temporalworkerdeployments
118-
verbs:
119-
- create
120-
- delete
121112
- get
122113
- list
123114
- patch
@@ -126,28 +117,31 @@ rules:
126117
- apiGroups:
127118
- temporal.io
128119
resources:
120+
- temporalconnections/finalizers
129121
- temporalworkerdeployments/finalizers
130122
verbs:
131123
- update
132124
- apiGroups:
133125
- temporal.io
134126
resources:
135-
- temporalworkerdeployments/status
136-
- workerresourcetemplates/status
127+
- temporalworkerdeployments
137128
verbs:
129+
- create
130+
- delete
138131
- get
132+
- list
139133
- patch
140134
- update
135+
- watch
141136
- apiGroups:
142137
- temporal.io
143138
resources:
144-
- workerresourcetemplates
139+
- temporalworkerdeployments/status
140+
- workerresourcetemplates/status
145141
verbs:
146142
- get
147-
- list
148143
- patch
149144
- update
150-
- watch
151145
# GENERATED RULES END
152146
# Rules for managing resources attached via WorkerResourceTemplate.
153147
# Driven entirely by workerResourceTemplate.allowedResources in values.yaml.

0 commit comments

Comments
 (0)