Skip to content

Commit 53bf8f9

Browse files
carlydfclaude
andcommitted
Address code review feedback on docs and resource naming
- Truncate owned resource names to 47 chars (safe for Deployments; avoids per-resource-type special cases if Deployment is ever un-banned) - Fix docs: replace "active Build ID" with "worker version with running workers" throughout; "active" is reserved for Ramping/Current versions - Fix docs: owned-resource deletion is due to versioned Deployment sunset, not a separate "version delete" operation - Fix docs: scaleTargetRef injection applies to any resource type with that field, not just HPA; clarify webhook rejects non-null values because controller owns them - Fix docs: remove undocumented/untested BYO TLS path; cert-manager is required - Fix docs: expand TWOR abbreviation to full name throughout; remove ⏳ autoscaling bullet from README and clarify TWOR is the path for metric/backlog-based autoscaling - Add note on how to inspect the banned kinds list (BANNED_KINDS env var) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 8ae649c commit 53bf8f9

5 files changed

Lines changed: 65 additions & 52 deletions

File tree

README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
> 🚀 **Public Preview**: This project is in [Public Preview](https://docs.temporal.io/evaluate/development-production-features/release-stages) and ready for production use cases*. Core functionality is complete with stable APIs.
77
>
8-
> *Dynamic auto-scaling based on workflow load is not yet implemented. Use cases must work with fixed worker replica counts.
8+
> *Autoscaling based on Temporal task queue depth is not yet built in. You can attach Horizontal Pod Autoscalers or KEDA ScaledObjects to each versioned Deployment via [`TemporalWorkerOwnedResource`](docs/owned-resources.md).
99
1010
**The Temporal Worker Controller makes it simple and safe to deploy Temporal workers on Kubernetes.**
1111

@@ -105,8 +105,7 @@ helm install temporal-worker-controller \
105105
- ✅ **Deletion of resources** associated with drained Worker Deployment Versions
106106
- ✅ **Multiple rollout strategies**: `Manual`, `AllAtOnce`, and `Progressive` rollouts
107107
- ✅ **Gate workflows** - Test new versions with a [pre-deployment test](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning#adding-a-pre-deployment-test) before routing real traffic to them
108-
- ✅ **Per-version attached resources** - Attach HPAs, PodDisruptionBudgets, or any namespaced Kubernetes resource to each active worker version via [`TemporalWorkerOwnedResource`](docs/owned-resources.md)
109-
- ⏳ **Temporal-aware auto-scaling** - Scaling based on workflow task queue depth is not yet implemented
108+
- ✅ **Per-version attached resources** - Attach HPAs, KEDA ScaledObjects, PodDisruptionBudgets, or any namespaced Kubernetes resource to each worker version with running workers via [`TemporalWorkerOwnedResource`](docs/owned-resources.md) — this is also the recommended path for metric-based and backlog-based autoscaling
110109

111110

112111
## 💡 Why Use This?

docs/owned-resources.md

Lines changed: 43 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,67 @@
11
# TemporalWorkerOwnedResource
22

3-
`TemporalWorkerOwnedResource` (TWOR) lets you attach arbitrary Kubernetes resources — HPAs, PodDisruptionBudgets, custom CRDs — to each active versioned Deployment managed by a `TemporalWorkerDeployment`. The controller creates one copy of the resource per active Build ID, automatically wired to the correct versioned Deployment.
3+
`TemporalWorkerOwnedResource` lets you attach arbitrary Kubernetes resources — HPAs, PodDisruptionBudgets, KEDA ScaledObjects, custom CRDs — to each worker version that has running workers. The controller creates one copy of the resource per worker version with a running Deployment, automatically wired to the correct versioned Deployment.
44

55
## Why you need this
66

77
The Temporal Worker Controller creates one Kubernetes `Deployment` per worker version (Build ID). If you attach an HPA directly to a single Deployment, it breaks as versions roll over — the old HPA still targets the old Deployment, the new Deployment has no HPA, and you have to manage cleanup yourself.
88

9-
`TemporalWorkerOwnedResource` solves this by treating the attached resource as a template. The controller renders one instance per active Build ID, injects the correct versioned Deployment name, and cleans up automatically when a version is deleted (via Kubernetes owner reference garbage collection).
9+
`TemporalWorkerOwnedResource` solves this by treating the attached resource as a template. The controller renders one instance per worker version with running workers, injects the correct versioned Deployment name, and cleans up automatically when the versioned Deployment is deleted (e.g., during the sunset process after traffic has drained).
10+
11+
This is also the recommended mechanism for metric-based or backlog-based autoscaling: attach a KEDA `ScaledObject` (or a standard HPA with custom metrics) to your workers and the controller keeps one per running worker version, each pointing at the right Deployment.
1012

1113
## How it works
1214

1315
1. You create a `TemporalWorkerOwnedResource` that references a `TemporalWorkerDeployment` and contains the resource spec in `spec.object`.
1416
2. The validating webhook checks that you have permission to manage that resource type yourself (SubjectAccessReview), and that the resource kind isn't on the banned list.
15-
3. On each reconcile loop, the controller renders one copy of `spec.object` per active Build ID, injects fields (see below), and applies it via Server-Side Apply.
17+
3. On each reconcile loop, the controller renders one copy of `spec.object` per worker version with a running Deployment, injects fields (see below), and applies it via Server-Side Apply.
1618
4. Each copy is owned by the corresponding versioned `Deployment`, so it is garbage-collected automatically when that Deployment is deleted.
17-
5. `TWOR.status.versions` is updated with the applied/failed status for each Build ID.
19+
5. `TemporalWorkerOwnedResource.status.versions` is updated with the applied/failed status for each Build ID.
1820

1921
## Auto-injection
2022

21-
The controller auto-injects two fields when you set them to `null` in `spec.object`. Setting them to `null` is the explicit signal that you want injection — if you omit the field entirely, nothing is injected; if you set a non-null value, the webhook rejects the object.
23+
The controller auto-injects two fields when you set them to `null` in `spec.object`. Setting them to `null` is the explicit signal that you want injection:
24+
- If you omit the field entirely, nothing is injected.
25+
- If you set a non-null value, the webhook rejects the object because the controller owns these fields.
2226

2327
| Field | Injected value |
2428
|-------|---------------|
25-
| `spec.scaleTargetRef` (HPA) | `{apiVersion: apps/v1, kind: Deployment, name: <versioned-deployment-name>}` |
26-
| `spec.selector.matchLabels` (any) | `{temporal.io/build-id: <buildID>, temporal.io/deployment-name: <twdName>}` |
29+
| `spec.scaleTargetRef` (any resource with this field) | `{apiVersion: apps/v1, kind: Deployment, name: <versioned-deployment-name>}` |
30+
| `spec.selector.matchLabels` (any resource with this field) | `{temporal.io/build-id: <buildID>, temporal.io/deployment-name: <twdName>}` |
31+
32+
The `scaleTargetRef` injection applies to any resource type that has a `scaleTargetRef` field — not just HPAs. KEDA `ScaledObjects` and other autoscaler CRDs use the same field and benefit from the same injection.
2733

2834
## Resource naming
2935

30-
Each per-Build-ID copy is named `<twdName>-<tworName>-<buildID>`, cleaned for DNS and truncated to 253 characters. Use `kubectl get hpa` (or whatever kind you attached) after a reconcile to see the created resources.
36+
Each per-Build-ID copy is given a unique, DNS-safe name derived from the `(twdName, tworName, buildID)` triple. Names are capped at 47 characters to be safe for all Kubernetes resource types, including Deployment (which has pod-naming constraints that effectively limit deployment names to ~47 characters). The name always ends with an 8-character hash of the full triple, so uniqueness is guaranteed even when the human-readable prefix is truncated.
37+
38+
Use `kubectl get <kind>` after a reconcile to see the created resources and their names.
39+
40+
## Banned resource kinds
41+
42+
Certain resource kinds are blocked by default to prevent misuse (e.g., using `TemporalWorkerOwnedResource` to spin up arbitrary workloads). The default banned list is:
43+
44+
```
45+
Deployment, StatefulSet, Job, Pod, CronJob
46+
```
47+
48+
The banned list is configured via `ownedResourceConfig.bannedKinds` in Helm values and is visible as the `BANNED_KINDS` environment variable on the controller pod:
49+
50+
```bash
51+
kubectl get pod -n <controller-namespace> -l app.kubernetes.io/name=temporal-worker-controller \
52+
-o jsonpath='{.items[0].spec.containers[0].env[?(@.name=="BANNED_KINDS")].value}'
53+
```
3154

3255
## RBAC
3356

3457
### What the webhook checks
3558

36-
When you create or update a TWOR, the webhook performs SubjectAccessReviews to verify:
59+
When you create or update a `TemporalWorkerOwnedResource`, the webhook performs SubjectAccessReviews to verify:
3760

3861
1. **You** (the requesting user) can create/update the embedded resource type in that namespace.
3962
2. **The controller's service account** can create/update the embedded resource type in that namespace.
4063

41-
If either check fails, the TWOR is rejected. This prevents privilege escalation — you cannot use TWOR to create resources you don't already have permission to create yourself.
64+
If either check fails, the request is rejected. This prevents privilege escalation — you cannot use `TemporalWorkerOwnedResource` to create resources you don't already have permission to create yourself.
4265

4366
### What to configure in Helm
4467

@@ -58,26 +81,13 @@ Add entries for any other resource types you want to attach (e.g., KEDA `ScaledO
5881

5982
### What to configure for your users
6083

61-
Users who create TWORs also need RBAC permission to manage the embedded resource type directly. For example, to let a team create TWORs that embed HPAs, they need the standard `autoscaling` permissions in their namespace — there is nothing TWOR-specific to configure for this.
84+
Users who create `TemporalWorkerOwnedResources` also need RBAC permission to manage the embedded resource type directly. For example, to let a team create `TemporalWorkerOwnedResources` that embed HPAs, they need the standard `autoscaling` permissions in their namespace — there is nothing `TemporalWorkerOwnedResource`-specific to configure for this.
6285

6386
## Webhook TLS
6487

65-
The TWOR validating webhook requires TLS. The recommended approach is to install [cert-manager](https://cert-manager.io/docs/installation/) before deploying the controller — the Helm chart handles everything else automatically (`certmanager.enabled: true` is the default).
66-
67-
If cert-manager is not available in your cluster, set `certmanager.enabled: false` and provide:
68-
1. A `kubernetes.io/tls` Secret named `webhook-server-cert` in the controller namespace, containing `tls.crt` and `tls.key` for the webhook server. The certificate must have DNS SANs:
69-
- `<release-name>-webhook-service.<namespace>.svc`
70-
- `<release-name>-webhook-service.<namespace>.svc.cluster.local`
71-
2. The base64-encoded CA certificate that signed `tls.crt`, passed as `certmanager.caBundle` in Helm values.
72-
73-
```bash
74-
helm install temporal-worker-controller oci://docker.io/temporalio/temporal-worker-controller \
75-
--namespace temporal-system \
76-
--set certmanager.enabled=false \
77-
--set certmanager.caBundle="$(base64 -w0 ca.crt)"
78-
```
88+
The `TemporalWorkerOwnedResource` validating webhook requires TLS. Install [cert-manager](https://cert-manager.io/docs/installation/) before deploying the controller — the Helm chart handles everything else automatically (`certmanager.enabled: true` is the default).
7989

80-
## Example: HPA per Build ID
90+
## Example: HPA per worker version
8191

8292
```yaml
8393
apiVersion: temporal.io/v1alpha1
@@ -90,7 +100,8 @@ spec:
90100
workerRef:
91101
name: my-worker
92102
93-
# The resource template. The controller creates one copy per active Build ID.
103+
# The resource template. The controller creates one copy per worker version
104+
# with a running Deployment.
94105
object:
95106
apiVersion: autoscaling/v2
96107
kind: HorizontalPodAutoscaler
@@ -111,7 +122,7 @@ spec:
111122

112123
See [examples/twor-hpa.yaml](../examples/twor-hpa.yaml) for an example pre-configured for the helloworld demo.
113124

114-
## Example: PodDisruptionBudget per Build ID
125+
## Example: PodDisruptionBudget per worker version
115126

116127
```yaml
117128
apiVersion: temporal.io/v1alpha1
@@ -135,11 +146,12 @@ spec:
135146
## Checking status
136147

137148
```bash
138-
# See all TWORs and which TWD they reference
139-
kubectl get twor -n my-namespace
149+
# See all TemporalWorkerOwnedResources and which TWD they reference
150+
kubectl get temporalworkerownedresource -n my-namespace
140151
141152
# See per-Build-ID apply status
142-
kubectl get twor my-worker-hpa -n my-namespace -o jsonpath='{.status.versions}' | jq .
153+
kubectl get temporalworkerownedresource my-worker-hpa -n my-namespace \
154+
-o jsonpath='{.status.versions}' | jq .
143155
144156
# See the created HPAs
145157
kubectl get hpa -n my-namespace

internal/demo/README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -123,29 +123,29 @@ kubectl get twd
123123

124124
### Testing TemporalWorkerOwnedResource (per-version HPA)
125125

126-
`TemporalWorkerOwnedResource` (TWOR) lets you attach Kubernetes resources — HPAs, PodDisruptionBudgets, etc. — to each active versioned Deployment. The controller creates one copy per active Build ID and wires it to the correct Deployment automatically.
126+
`TemporalWorkerOwnedResource` lets you attach Kubernetes resources — HPAs, PodDisruptionBudgets, etc. — to each worker version with running workers. The controller creates one copy per worker version with a running Deployment and wires it to the correct Deployment automatically.
127127

128-
The TWOR validating webhook enforces that you have permission to create the embedded resource type yourself, and it requires TLS (provided by cert-manager, installed in step 3 above).
128+
The `TemporalWorkerOwnedResource` validating webhook enforces that you have permission to create the embedded resource type yourself, and it requires TLS (provided by cert-manager, installed in step 3 above).
129129

130130
After deploying the helloworld worker (step 5), apply the example HPA:
131131

132132
```bash
133133
kubectl apply -f examples/twor-hpa.yaml
134134
```
135135

136-
Watch the controller create an HPA for each active Build ID:
136+
Watch the controller create an HPA for each worker version with running workers:
137137

138138
```bash
139-
# See TWOR status (Applied: true once the controller reconciles)
140-
kubectl get twor
139+
# See TemporalWorkerOwnedResource status (Applied: true once the controller reconciles)
140+
kubectl get temporalworkerownedresource
141141
142142
# See the per-Build-ID HPAs
143143
kubectl get hpa
144144
```
145145

146-
You should see one HPA per active worker version, with `scaleTargetRef` automatically pointing at the correct versioned Deployment.
146+
You should see one HPA per worker version with running workers, with `scaleTargetRef` automatically pointing at the correct versioned Deployment.
147147

148-
When you deploy a new worker version (e.g., step 8), the controller creates a new HPA for the new Build ID and keeps the old one until that version is deleted.
148+
When you deploy a new worker version (e.g., step 8), the controller creates a new HPA for the new Build ID and keeps the old one until that versioned Deployment is deleted during the sunset process.
149149

150150
See [docs/owned-resources.md](../../docs/owned-resources.md) for full documentation.
151151

internal/k8s/ownedresources.go

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -39,10 +39,12 @@ func OwnedResourceFieldManager(twor *temporaliov1alpha1.TemporalWorkerOwnedResou
3939

4040
const (
4141
// ownedResourceMaxNameLen is the maximum length of a generated owned resource name.
42-
// 253 is the RFC 1123 DNS subdomain limit used by most Kubernetes resource types
43-
// (HPA, PDB, and most CRDs). Resources with stricter limits (e.g. 63-char DNS
44-
// labels) are rare for the autoscaler/policy use cases TWOR targets.
45-
ownedResourceMaxNameLen = 253
42+
// 47 is chosen to be safe for Deployment resources: a pod name is composed of
43+
// "{deployment-name}-{rs-hash}-{pod-hash}" where the hashes add ~17 characters,
44+
// and pod names must be ≤63 characters. Using 47 as the limit ensures that the
45+
// generated names work even if the user un-bans Deployment as an owned resource
46+
// kind, avoiding special-casing per resource type.
47+
ownedResourceMaxNameLen = 47
4648

4749
// ownedResourceHashLen is the number of hex characters used for the uniqueness suffix.
4850
// 8 hex chars = 4 bytes = 32 bits, giving negligible collision probability

internal/k8s/ownedresources_test.go

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ func expectedOwnedResourceName(twdName, tworName, buildID string) string {
2525
h := sha256.Sum256([]byte(twdName + tworName + buildID))
2626
hashSuffix := hex.EncodeToString(h[:4])
2727
raw := CleanStringForDNS(twdName + "-" + tworName + "-" + buildID)
28-
prefix := strings.TrimRight(TruncateString(raw, 253-9), "-")
28+
prefix := strings.TrimRight(TruncateString(raw, 47-9), "-")
2929
return prefix + "-" + hashSuffix
3030
}
3131

@@ -34,14 +34,14 @@ func TestComputeOwnedResourceName(t *testing.T) {
3434
got := ComputeOwnedResourceName("my-worker", "my-hpa", "image-abc123")
3535
// Should start with the human-readable prefix
3636
assert.True(t, strings.HasPrefix(got, "my-worker-my-hpa-image-abc123-"), "got: %q", got)
37-
// Should be ≤ 253 chars
38-
assert.LessOrEqual(t, len(got), 253)
37+
// Should be ≤ 47 chars
38+
assert.LessOrEqual(t, len(got), 47)
3939
})
4040

4141
t.Run("special chars are cleaned for DNS", func(t *testing.T) {
4242
got := ComputeOwnedResourceName("my_worker", "my/hpa", "image:latest")
4343
assert.True(t, strings.HasPrefix(got, "my-worker-my-hpa-image-latest-"), "got: %q", got)
44-
assert.LessOrEqual(t, len(got), 253)
44+
assert.LessOrEqual(t, len(got), 47)
4545
})
4646

4747
t.Run("deterministic — same inputs always produce same name", func(t *testing.T) {
@@ -57,7 +57,7 @@ func TestComputeOwnedResourceName(t *testing.T) {
5757
assert.NotEqual(t, name1, name2)
5858
})
5959

60-
t.Run("very long names are still ≤ 253 chars and distinct per buildID", func(t *testing.T) {
60+
t.Run("very long names are still ≤ 47 chars and distinct per buildID", func(t *testing.T) {
6161
longTWD := strings.Repeat("w", 63)
6262
longTWOR := strings.Repeat("r", 253) // maximum k8s object name
6363
buildID1 := "build-" + strings.Repeat("a", 57)
@@ -66,8 +66,8 @@ func TestComputeOwnedResourceName(t *testing.T) {
6666
n1 := ComputeOwnedResourceName(longTWD, longTWOR, buildID1)
6767
n2 := ComputeOwnedResourceName(longTWD, longTWOR, buildID2)
6868

69-
assert.LessOrEqual(t, len(n1), 253, "name1 length: %d", len(n1))
70-
assert.LessOrEqual(t, len(n2), 253, "name2 length: %d", len(n2))
69+
assert.LessOrEqual(t, len(n1), 47, "name1 length: %d", len(n1))
70+
assert.LessOrEqual(t, len(n2), 47, "name2 length: %d", len(n2))
7171
assert.NotEqual(t, n1, n2, "names must differ even when prefix is fully truncated")
7272
})
7373

0 commit comments

Comments
 (0)