Skip to content

Commit b1b7257

Browse files
jaypipescarlydf
andcommitted
document downgrade concerns for CRD rename migrate (#312)
Adds a section explaining rollback procedures to the guide for the v1.6 -> v1.7 migration that involves the rename of TemporalWorkerDeployment CRD to WorkerDeployment. --------- Signed-off-by: Jay Pipes <jay.pipes@temporal.io> Co-authored-by: Carly de Frondeville <cdefrondeville@berkeley.edu>
1 parent f14adb9 commit b1b7257

2 files changed

Lines changed: 251 additions & 1 deletion

File tree

Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# Downgrade from CRD Rename Guide
2+
3+
Starting with Chart Version v0.26.0 (App Version v1.7.0), the Temporal Worker Controller renames its two primary CRDs and one field reference:
4+
5+
| Old name | New name |
6+
|---|---|
7+
| `TemporalWorkerDeployment` | `WorkerDeployment` |
8+
| `TemporalConnection` | `Connection` |
9+
| `WorkerResourceTemplate.spec.temporalWorkerDeploymentRef` | `WorkerResourceTemplate.spec.workerDeploymentRef` |
10+
11+
The upgrade path is straightforward. See the [upgrade guide](migration-crd-rename.md) for more details.
12+
13+
## Downgrading from v1.7 to v1.6
14+
15+
There are some important things to consider if you want to roll back
16+
(downgrade) the installed version of Temporal Worker Controller after upgrading to v1.7.0.
17+
18+
> **Warning**: You **should not perform a rollback/downgrade of the Temporal
19+
> Worker Controller CRDs Helm Chart**. Doing so is a potentially
20+
> **destructive** operation that can cause your Temporal Worker Deployments to
21+
> be deleted.
22+
>
23+
> See [here][crd-pruning] for more details.
24+
25+
[crd-pruning]: https://github.com/temporalio/temporal-worker-controller/blob/main/docs/crd-management.md#crd-rollback-and-field-pruning
26+
27+
To downgrade the Temporal Worker Controller itself, do:
28+
29+
```bash
30+
helm rollback <RELEASE_NAME> <REVISION_NUMBER>
31+
```
32+
33+
Where `<RELEASE_NAME>` is the Helm Release associated with the Temporal Worker
34+
Controller Helm Chart (**not** the CRDs Chart) and `<REVISION_NUMBER>` is the
35+
Helm release revision number to roll back to. You can get this information by
36+
doing:
37+
38+
```bash
39+
helm history -n <TWC_NAMESPACE> <TWC_RELEASE_NAME>
40+
```
41+
42+
Where `<TWC_NAMESPACE>` is the Kubernetes Namespace you installed Temporal
43+
Worker Controller in and `<TWC_RELEASE>` is the name of the Helm Release
44+
associated with the Temporal Worker Controller Helm Chart.
45+
46+
Once you have downgraded the Temporal Worker Controller, you will need to take
47+
some corrective actions depending on how far down the migration path you went
48+
when upgrading to the v1.7 Temporal Worker Controller release.
49+
50+
---
51+
### Scenario 1: You did not migrate your resources
52+
53+
If you upgraded the Temporal Worker Controller to v1.7 -- i.e. you successfully
54+
completed Step 2 above -- but **did not** complete Step 3 (migrating your
55+
resources), execute the following `kubectl` command to remove the CRD rename
56+
validation guard on the old `TemporalWorkerDeployment` and `TemporalConnection`
57+
Custom Resource Definitions:
58+
59+
```bash
60+
kubectl patch crd temporalworkerdeployments.temporal.io --type='json' -p='[{"op": "remove", "path": "/spec/versions/0/schema/openAPIV3Schema/x-kubernetes-validations/1"}]'
61+
kubectl patch crd temporalconnections.temporal.io --type='json' -p='[{"op": "remove", "path": "/spec/versions/0/schema/openAPIV3Schema/properties/spec/x-kubernetes-validations"}]'
62+
```
63+
You will also need to manually remove the `migration-guard` finalizer that was added
64+
to your `TemporalWorkerDeployment` and `TemporalConnection` resources by the 1.7 controller:
65+
66+
67+
Get a list of all the original `TemporalWorkerDeployment` object names and UIDs:
68+
69+
```bash
70+
kubectl get -n <NAMESPACE> temporalworkerdeployments.temporal.io -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.uid}{"\n"}{end}'
71+
```
72+
73+
For each of the `TemporalWorkerDeployments` listed above:
74+
75+
```bash
76+
kubectl patch -n <NAMESPACE> temporalworkerdeployments/<TWD_NAME> --type=merge -p='{"metadata":{"finalizers":["temporal.io/delete-protection"]}}'
77+
```
78+
79+
Get a list of all the original `TemporalConnection` object names and UIDs:
80+
81+
```bash
82+
kubectl get -n <NAMESPACE> temporalworkerdeployments.temporal.io -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.uid}{"\n"}{end}'
83+
```
84+
85+
For each of the `TemporalConnections` listed above:
86+
87+
```bash
88+
kubectl patch -n <NAMESPACE> temporalconnections/<TC_NAME> --type=merge -p='{"metadata":{"finalizers":["temporal.io/delete-protection"]}}'
89+
```
90+
91+
---
92+
### Scenario 2: You did migrate your resources
93+
94+
If you upgraded the Temporal Worker Controller to v1.7 and completed Step 3
95+
above (i.e. you successfully migrated your resources), you will need to
96+
manually restore the OwnerReferences for your Kubernetes Deployments to point
97+
at the original `TemporalWorkerDeployment` resources.
98+
99+
To do so, first, get a list of all the original `TemporalWorkerDeployment`
100+
object names and UIDs:
101+
102+
```bash
103+
kubectl get -n <NAMESPACE> temporalworkerdeployments.temporal.io -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.uid}{"\n"}{end}'
104+
```
105+
106+
Then get a list of all the Kubernetes `Deployments` that are now owned by the new
107+
`WorkerDeployment` resources:
108+
109+
```bash
110+
kubectl get deployments -n <NAMESPACE> -o json | jq -r '
111+
.items[] | select(
112+
.metadata.ownerReferences // [] | any(.kind == "WorkerDeployment")
113+
) | .metadata.name
114+
'
115+
```
116+
117+
Then, for each of the Kubernetes Deployments listed above, execute the
118+
following `kubectl` command to reset the OwnerReferences of Kubernetes
119+
Deployments back to the original `TemporalWorkerDeployment` custom resources:
120+
121+
```bash
122+
kubectl patch -n <NAMESPACE> deployment <DEPLOYMENT_NAME> --type='merge' -p '
123+
{
124+
"metadata": {
125+
"ownerReferences": [
126+
{
127+
"apiVersion": "temporal.io/v1alpha1",
128+
"kind": "TemporalWorkerDeployment",
129+
"name": "<TWD_NAME>",
130+
"uid": "<TWD_UID>",
131+
"controller": true,
132+
"blockOwnerDeletion": true
133+
}
134+
]
135+
}
136+
}'
137+
```
138+
139+
Replace `<TWD_NAME>` and `<TWD_UID>` with the correct
140+
`TemporalWorkerDeployment` custom resource's name and UID you printed out
141+
earlier. It's important that the UID string is correct, because if Kubernetes GC
142+
does not recognize the UID, it will treat those `Deployments` as
143+
orphaned and delete them.
144+
145+
Confirm that your `Deployments` are now owned by the original `TemporalWorkerDeployment` resources:
146+
```bash
147+
kubectl get deployments -n <NAMESPACE> -o json | jq -r '
148+
.items[] | select(
149+
.metadata.ownerReferences // [] | any(.kind == "TemporalWorkerDeployment")
150+
) | .metadata.name
151+
'
152+
```
153+
154+
If you completed Step 4 above and modified `WorkerResourceTemplate` resources,
155+
you will also need to reset the `OwnerReferences` for those resources as well.
156+
157+
```bash
158+
kubectl get workerresourcetemplates -n <NAMESPACE> -o json | jq -r '
159+
.items[] | select(
160+
.metadata.ownerReferences // [] | any(.kind == "WorkerDeployment")
161+
) | .metadata.name
162+
'
163+
```
164+
165+
Then, for each of the `WorkerResourceTemplate` resources listed above, execute
166+
the following `kubectl` command to reset the OwnerReferences of Kubernetes
167+
Deployments back to the original `TemporalWorkerDeployment` custom resources:
168+
169+
```bash
170+
kubectl patch -n <NAMESPACE> wrt <WRT_NAME> --type='merge' -p '
171+
{
172+
"metadata": {
173+
"ownerReferences": [
174+
{
175+
"apiVersion": "temporal.io/v1alpha1",
176+
"kind": "TemporalWorkerDeployment",
177+
"name": "<TWD_NAME>",
178+
"uid": "<TWD_UID>",
179+
"controller": true,
180+
"blockOwnerDeletion": true
181+
}
182+
]
183+
}
184+
}'
185+
```
186+
187+
Again, replace `<TWD_NAME>` and `<TWD_UID>` with the correct
188+
`TemporalWorkerDeployment` custom resource's name and UID you printed out
189+
earlier. It's important that the UID string is correct, because if Kubernetes GC
190+
does not recognize the UID, it will treat those `WorkerResourceTemplates` as
191+
orphaned and delete them.
192+
193+
Confirm that your `WorkerResourceTemplates` are now owned by the original `TemporalWorkerDeployment` resources:
194+
```bash
195+
kubectl get workerresourcetemplates -n <NAMESPACE> -o json | jq -r '
196+
.items[] | select(
197+
.metadata.ownerReferences // [] | any(.kind == "TemporalWorkerDeployment")
198+
) | .metadata.name
199+
'
200+
```
201+
202+
Now you can safely delete the `WorkerDeployment` and `Connection` resources without
203+
deleting any `Deployments` or `WorkerResourceTemplates`. Before deleting the `WorkerDeployment`
204+
and `Connection` resources, you will need to remove the `deletion-protection` finalizer
205+
that the v1.7 controller added to it:
206+
207+
```bash
208+
kubectl patch -n <NAMESPACE> workerdeployments/<WD_NAME> --type=merge -p='{"metadata":{"finalizers":[]}}'
209+
```
210+
```bash
211+
kubectl patch -n <NAMESPACE> connections/<TC_NAME> --type=merge -p='{"metadata":{"finalizers":[]}}'
212+
```
213+
214+
You'll notice that because you did not roll back the CRD chart, there is still a
215+
deprecation warning on the `TemporalWorkerDeployment` and `TemporalConnection` resources.
216+
This can be safely ignored. If you have already safely migrated ownership away from all
217+
`WorkerDeployment` resources, you could _carefully_ roll back the CRD chart to v0.25.0. Rolling
218+
the CRDs back when you still have `WorkerDeployment` resources is very risky, because **any
219+
`Deployments` and `WorkerResourceTemplates` owned by the `WorkerDeployment` resources will
220+
be deleted when the `WorkerDeployment` resources are deleted**, and **rolling back the CRDs
221+
will delete all `WorkerDeployment` and `Connection` instances.**
222+
223+
To recap, here is how to confirm that no `WorkerDeployment` owns any `Deployments` or `WorkerResourceTemplates` in any namespace:
224+
```bash
225+
kubectl get deployments -A -o json | jq -r '
226+
.items[] | select(
227+
.metadata.ownerReferences // [] | any(.kind == "WorkerDeployment")
228+
) | .metadata.name
229+
'
230+
kubectl get workerresourcetemplates -A -o json | jq -r '
231+
.items[] | select(
232+
.metadata.ownerReferences // [] | any(.kind == "WorkerDeployment")
233+
) | .metadata.name
234+
'
235+
```
236+
237+
and here is how to confirm you no longer have any `WorkerDeployment` or `Connection` in any namespace:
238+
```bash
239+
kubectl get workerdeployments -A
240+
kubectl get connections -A
241+
```

docs/migration-crd-rename.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,10 @@ After this release, the Worker Controller will be Generally Available (GA), whic
1717

1818
## Migration steps
1919

20-
> **Dev / non-production environments:** If you don't need to preserve any worker state, the simplest path is to delete all your `TemporalWorkerDeployment` and `TemporalConnection` resources while the v1.6 controller is still running. At that point no migration-guard finalizer has been added yet, so deletion completes after the v1.6 finalizer completes. Note that all related Worker Deployment state in the Temporal server will also be deleted. Then upgrade the controller and create fresh `WorkerDeployment` and `Connection` resources.
20+
> **Dev / non-production environments:** If you don’t need to preserve worker state, you can delete your `TemporalWorkerDeployment` and `TemporalConnection` resources while the v1.6 controller is still running. This will cause the controller to remove the associated Worker Deployment state in Temporal, leaving Task Queues unversioned. Once cleanup completes, upgrade the controller and recreate them as `WorkerDeployment` and `Connection` resources.
21+
>
22+
> In most cases, following the migration steps below is simpler.
23+
2124

2225
### Step 1: Upgrade the CRDs chart
2326

@@ -119,3 +122,9 @@ If you delete a deprecated resource before creating its replacement, the resourc
119122
Ready=False reason=DeletingPendingMigration
120123
message: "This TemporalWorkerDeployment is marked for deletion. Create a WorkerDeployment with the same name and spec to complete migration; deletion will proceed automatically once migration is confirmed."
121124
```
125+
126+
## Downgrading from v1.7 to v1.6
127+
128+
There are some critical things to consider if you want to roll back
129+
(downgrade) the installed version of Temporal Worker Controller after upgrading to v1.7.0.
130+
Please see the [Downgrade Guide](migration-crd-rename-downgrade.md) for details.

0 commit comments

Comments
 (0)