Skip to content

Commit 8ae649c

Browse files
carlydfclaude
andcommitted
Add TemporalWorkerOwnedResource docs, HPA example, and cert-manager setup
- docs/owned-resources.md: full TWOR reference (auto-injection, RBAC, webhook TLS, examples) - examples/twor-hpa.yaml: ready-to-apply HPA example for the helloworld demo - helm/webhook.yaml + values.yaml: add certmanager.caBundle for BYO TLS without cert-manager - internal/demo/README.md: add cert-manager install step and TWOR demo walkthrough - README.md + docs/README.md: add cert-manager prerequisite, TWOR feature bullet, and doc link Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 27987b8 commit 8ae649c

7 files changed

Lines changed: 252 additions & 3 deletions

File tree

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ When you update the image, the controller automatically:
7878
- Helm [v3.0+](https://github.com/helm/helm/releases) if deploying via our Helm chart
7979
- [Temporal Server](https://docs.temporal.io/) (Cloud or self-hosted [v1.29.1](https://github.com/temporalio/temporal/releases/tag/v1.29.1))
8080
- Basic familiarity with Temporal [Workers](https://docs.temporal.io/workers), [Workflows](https://docs.temporal.io/workflows), and [Worker Versioning](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning)
81+
- **[cert-manager](https://cert-manager.io/docs/installation/)** *(required for `TemporalWorkerOwnedResource`)* — the controller installs a validating webhook for TWOR objects that requires TLS. cert-manager handles certificate provisioning automatically. If cert-manager is not available in your cluster, see [Webhook TLS without cert-manager](docs/owned-resources.md#webhook-tls) for the manual setup.
8182

8283
### 🔧 Installation
8384

@@ -104,7 +105,8 @@ helm install temporal-worker-controller \
104105
- ✅ **Deletion of resources** associated with drained Worker Deployment Versions
105106
- ✅ **Multiple rollout strategies**: `Manual`, `AllAtOnce`, and `Progressive` rollouts
106107
- ✅ **Gate workflows** - Test new versions with a [pre-deployment test](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning#adding-a-pre-deployment-test) before routing real traffic to them
107-
-**Load-based auto-scaling** - Not yet implemented (use fixed replica counts)
108+
- ✅ **Per-version attached resources** - Attach HPAs, PodDisruptionBudgets, or any namespaced Kubernetes resource to each active worker version via [`TemporalWorkerOwnedResource`](docs/owned-resources.md)
109+
- ⏳ **Temporal-aware auto-scaling** - Scaling based on workflow task queue depth is not yet implemented
108110

109111

110112
## 💡 Why Use This?
@@ -137,6 +139,7 @@ The Temporal Worker Controller eliminates this operational overhead by automatin
137139
| [Configuration](docs/configuration.md) | Complete configuration reference |
138140
| [Concepts](docs/concepts.md) | Key concepts and terminology |
139141
| [Limits](docs/limits.md) | Technical constraints and limitations |
142+
| [TemporalWorkerOwnedResource](docs/owned-resources.md) | Attach HPAs, PDBs, and other resources to each versioned Deployment |
140143

141144
## 🔧 Worker Configuration
142145

docs/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ Technical constraints and limitations of the Temporal Worker Controller system,
3030
### [Ownership](ownership.md)
3131
How the controller gets permission to manage a Worker Deployment, how a human client can take or give back control.
3232

33+
### [TemporalWorkerOwnedResource](owned-resources.md)
34+
How to attach HPAs, PodDisruptionBudgets, and other Kubernetes resources to each active versioned Deployment. Covers the auto-injection model, RBAC setup, webhook TLS, and examples.
35+
3336
---
3437

3538
*Note: This documentation structure is designed to grow with the project.*

docs/owned-resources.md

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# TemporalWorkerOwnedResource
2+
3+
`TemporalWorkerOwnedResource` (TWOR) lets you attach arbitrary Kubernetes resources — HPAs, PodDisruptionBudgets, custom CRDs — to each active versioned Deployment managed by a `TemporalWorkerDeployment`. The controller creates one copy of the resource per active Build ID, automatically wired to the correct versioned Deployment.
4+
5+
## Why you need this
6+
7+
The Temporal Worker Controller creates one Kubernetes `Deployment` per worker version (Build ID). If you attach an HPA directly to a single Deployment, it breaks as versions roll over — the old HPA still targets the old Deployment, the new Deployment has no HPA, and you have to manage cleanup yourself.
8+
9+
`TemporalWorkerOwnedResource` solves this by treating the attached resource as a template. The controller renders one instance per active Build ID, injects the correct versioned Deployment name, and cleans up automatically when a version is deleted (via Kubernetes owner reference garbage collection).
10+
11+
## How it works
12+
13+
1. You create a `TemporalWorkerOwnedResource` that references a `TemporalWorkerDeployment` and contains the resource spec in `spec.object`.
14+
2. The validating webhook checks that you have permission to manage that resource type yourself (SubjectAccessReview), and that the resource kind isn't on the banned list.
15+
3. On each reconcile loop, the controller renders one copy of `spec.object` per active Build ID, injects fields (see below), and applies it via Server-Side Apply.
16+
4. Each copy is owned by the corresponding versioned `Deployment`, so it is garbage-collected automatically when that Deployment is deleted.
17+
5. `TWOR.status.versions` is updated with the applied/failed status for each Build ID.
18+
19+
## Auto-injection
20+
21+
The controller auto-injects two fields when you set them to `null` in `spec.object`. Setting them to `null` is the explicit signal that you want injection — if you omit the field entirely, nothing is injected; if you set a non-null value, the webhook rejects the object.
22+
23+
| Field | Injected value |
24+
|-------|---------------|
25+
| `spec.scaleTargetRef` (HPA) | `{apiVersion: apps/v1, kind: Deployment, name: <versioned-deployment-name>}` |
26+
| `spec.selector.matchLabels` (any) | `{temporal.io/build-id: <buildID>, temporal.io/deployment-name: <twdName>}` |
27+
28+
## Resource naming
29+
30+
Each per-Build-ID copy is named `<twdName>-<tworName>-<buildID>`, cleaned for DNS and truncated to 253 characters. Use `kubectl get hpa` (or whatever kind you attached) after a reconcile to see the created resources.
31+
32+
## RBAC
33+
34+
### What the webhook checks
35+
36+
When you create or update a TWOR, the webhook performs SubjectAccessReviews to verify:
37+
38+
1. **You** (the requesting user) can create/update the embedded resource type in that namespace.
39+
2. **The controller's service account** can create/update the embedded resource type in that namespace.
40+
41+
If either check fails, the TWOR is rejected. This prevents privilege escalation — you cannot use TWOR to create resources you don't already have permission to create yourself.
42+
43+
### What to configure in Helm
44+
45+
`ownedResourceConfig.rbac.rules` controls what resource types the controller's ClusterRole permits it to manage. The defaults cover HPAs and PodDisruptionBudgets:
46+
47+
```yaml
48+
ownedResourceConfig:
49+
rbac:
50+
rules:
51+
- apiGroups: ["autoscaling"]
52+
resources: ["horizontalpodautoscalers"]
53+
- apiGroups: ["policy"]
54+
resources: ["poddisruptionbudgets"]
55+
```
56+
57+
Add entries for any other resource types you want to attach (e.g., KEDA `ScaledObjects`). For development clusters you can set `rbac.wildcard: true` to grant access to all resource types, but this is not recommended for production.
58+
59+
### What to configure for your users
60+
61+
Users who create TWORs also need RBAC permission to manage the embedded resource type directly. For example, to let a team create TWORs that embed HPAs, they need the standard `autoscaling` permissions in their namespace — there is nothing TWOR-specific to configure for this.
62+
63+
## Webhook TLS
64+
65+
The TWOR validating webhook requires TLS. The recommended approach is to install [cert-manager](https://cert-manager.io/docs/installation/) before deploying the controller — the Helm chart handles everything else automatically (`certmanager.enabled: true` is the default).
66+
67+
If cert-manager is not available in your cluster, set `certmanager.enabled: false` and provide:
68+
1. A `kubernetes.io/tls` Secret named `webhook-server-cert` in the controller namespace, containing `tls.crt` and `tls.key` for the webhook server. The certificate must have DNS SANs:
69+
- `<release-name>-webhook-service.<namespace>.svc`
70+
- `<release-name>-webhook-service.<namespace>.svc.cluster.local`
71+
2. The base64-encoded CA certificate that signed `tls.crt`, passed as `certmanager.caBundle` in Helm values.
72+
73+
```bash
74+
helm install temporal-worker-controller oci://docker.io/temporalio/temporal-worker-controller \
75+
--namespace temporal-system \
76+
--set certmanager.enabled=false \
77+
--set certmanager.caBundle="$(base64 -w0 ca.crt)"
78+
```
79+
80+
## Example: HPA per Build ID
81+
82+
```yaml
83+
apiVersion: temporal.io/v1alpha1
84+
kind: TemporalWorkerOwnedResource
85+
metadata:
86+
name: my-worker-hpa
87+
namespace: my-namespace
88+
spec:
89+
# Reference the TemporalWorkerDeployment to attach to.
90+
workerRef:
91+
name: my-worker
92+
93+
# The resource template. The controller creates one copy per active Build ID.
94+
object:
95+
apiVersion: autoscaling/v2
96+
kind: HorizontalPodAutoscaler
97+
spec:
98+
# null tells the controller to auto-inject the versioned Deployment reference.
99+
# Do not set this to a real value — the webhook will reject it.
100+
scaleTargetRef: null
101+
minReplicas: 2
102+
maxReplicas: 10
103+
metrics:
104+
- type: Resource
105+
resource:
106+
name: cpu
107+
target:
108+
type: Utilization
109+
averageUtilization: 70
110+
```
111+
112+
See [examples/twor-hpa.yaml](../examples/twor-hpa.yaml) for an example pre-configured for the helloworld demo.
113+
114+
## Example: PodDisruptionBudget per Build ID
115+
116+
```yaml
117+
apiVersion: temporal.io/v1alpha1
118+
kind: TemporalWorkerOwnedResource
119+
metadata:
120+
name: my-worker-pdb
121+
namespace: my-namespace
122+
spec:
123+
workerRef:
124+
name: my-worker
125+
object:
126+
apiVersion: policy/v1
127+
kind: PodDisruptionBudget
128+
spec:
129+
minAvailable: 1
130+
# null tells the controller to auto-inject {temporal.io/build-id, temporal.io/deployment-name}.
131+
selector:
132+
matchLabels: null
133+
```
134+
135+
## Checking status
136+
137+
```bash
138+
# See all TWORs and which TWD they reference
139+
kubectl get twor -n my-namespace
140+
141+
# See per-Build-ID apply status
142+
kubectl get twor my-worker-hpa -n my-namespace -o jsonpath='{.status.versions}' | jq .
143+
144+
# See the created HPAs
145+
kubectl get hpa -n my-namespace
146+
```

examples/twor-hpa.yaml

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# TemporalWorkerOwnedResource — HPA example
2+
#
3+
# This example attaches a HorizontalPodAutoscaler to each active versioned Deployment
4+
# of the "helloworld" TemporalWorkerDeployment (from the local demo).
5+
#
6+
# Prerequisites:
7+
# - The helloworld demo is running (skaffold run --profile helloworld-worker)
8+
# - The controller was installed with cert-manager (certmanager.enabled: true, the default)
9+
# - You have permission to create HPAs in this namespace
10+
#
11+
# Apply:
12+
# kubectl apply -f examples/twor-hpa.yaml
13+
#
14+
# Verify:
15+
# kubectl get twor # shows Applied status per Build ID
16+
# kubectl get hpa # shows one HPA per active Build ID
17+
#
18+
# See docs/owned-resources.md for full documentation.
19+
apiVersion: temporal.io/v1alpha1
20+
kind: TemporalWorkerOwnedResource
21+
metadata:
22+
name: helloworld-hpa
23+
# Deploy into the same namespace as the helloworld TemporalWorkerDeployment.
24+
# The default Skaffold setup does not set a namespace, so resources land in "default".
25+
namespace: default
26+
spec:
27+
# Must match the name of the TemporalWorkerDeployment.
28+
workerRef:
29+
name: helloworld
30+
31+
# The resource template. The controller creates one copy per active Build ID,
32+
# naming each one "<twdName>-<tworName>-<buildID>".
33+
object:
34+
apiVersion: autoscaling/v2
35+
kind: HorizontalPodAutoscaler
36+
spec:
37+
# Setting scaleTargetRef to null tells the controller to auto-inject the
38+
# correct versioned Deployment reference for each Build ID. Do not set this
39+
# to a real value — the webhook will reject it.
40+
scaleTargetRef: null
41+
minReplicas: 1
42+
maxReplicas: 3
43+
metrics:
44+
- type: Resource
45+
resource:
46+
name: cpu
47+
target:
48+
type: Utilization
49+
averageUtilization: 70

helm/temporal-worker-controller/templates/webhook.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,9 @@ webhooks:
3434
- name: vtemporalworkerownedresource.kb.io
3535
admissionReviewVersions: ["v1"]
3636
clientConfig:
37+
{{- if and (not .Values.certmanager.enabled) .Values.certmanager.caBundle }}
38+
caBundle: {{ .Values.certmanager.caBundle }}
39+
{{- end }}
3740
service:
3841
name: {{ .Release.Name }}-webhook-service
3942
namespace: {{ .Release.Namespace }}
@@ -64,6 +67,9 @@ webhooks:
6467
- name: vtemporalworkerdeployment.kb.io
6568
admissionReviewVersions: ["v1"]
6669
clientConfig:
70+
{{- if and (not .Values.certmanager.enabled) .Values.certmanager.caBundle }}
71+
caBundle: {{ .Values.certmanager.caBundle }}
72+
{{- end }}
6773
service:
6874
name: {{ .Release.Name }}-webhook-service
6975
namespace: {{ .Release.Namespace }}

helm/temporal-worker-controller/values.yaml

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -130,10 +130,16 @@ certmanager:
130130
# enabled controls creation of the cert-manager Issuer and Certificate used for
131131
# webhook TLS. cert-manager must be installed in the cluster.
132132
# See https://cert-manager.io/docs/installation/
133-
# Set to false only if you are providing your own TLS certificate in a Secret
134-
# named "webhook-server-cert" in the controller namespace.
133+
# Set to false only if you are providing your own TLS certificate (see caBundle below).
135134
enabled: true
136135

136+
# caBundle is only used when enabled: false (i.e. you are managing webhook TLS yourself).
137+
# Set this to the base64-encoded PEM CA certificate that signed the TLS certificate in
138+
# the "webhook-server-cert" Secret. The Kubernetes API server uses this to verify the
139+
# webhook server's TLS certificate.
140+
# Leave empty when enabled: true — cert-manager injects the CA bundle automatically.
141+
caBundle: ""
142+
137143
# Not yet supported
138144
prometheus:
139145
enabled: false

internal/demo/README.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ This guide will help you set up and run the Temporal Worker Controller locally u
1010
- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
1111
- Temporal Cloud account with API key or mTLS certificates
1212
- Understanding of [Worker Versioning concepts](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning) (Pinned and Auto-Upgrade versioning behaviors)
13+
- **[cert-manager](https://cert-manager.io/docs/installation/)** — required for the `TemporalWorkerOwnedResource` validating webhook (TLS). Install it once into your Minikube cluster before deploying the controller (see step 3 below).
1314

1415
> **Note**: This demo specifically showcases **Pinned** workflow behavior. All workflows in the demo will remain on the worker version where they started, demonstrating how the controller safely manages multiple worker versions simultaneously during deployments.
1516
@@ -63,6 +64,13 @@ This guide will help you set up and run the Temporal Worker Controller locally u
6364
- Note: Do not set both mTLS and API key for the same connection. If both present, the TemporalConnection Custom Resource
6465
Instance will not get installed in the k8s environment.
6566

67+
3. Install cert-manager into Minikube (required for the TWOR validating webhook):
68+
```bash
69+
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml
70+
# Wait for cert-manager pods to be ready before continuing
71+
kubectl wait --for=condition=Available deployment --all -n cert-manager --timeout=120s
72+
```
73+
6674
4. Build and deploy the Controller image to the local k8s cluster:
6775
```bash
6876
skaffold run --profile worker-controller
@@ -113,6 +121,34 @@ kubectl logs -n temporal-system deployments/temporal-worker-controller-manager -
113121
kubectl get twd
114122
```
115123

124+
### Testing TemporalWorkerOwnedResource (per-version HPA)
125+
126+
`TemporalWorkerOwnedResource` (TWOR) lets you attach Kubernetes resources — HPAs, PodDisruptionBudgets, etc. — to each active versioned Deployment. The controller creates one copy per active Build ID and wires it to the correct Deployment automatically.
127+
128+
The TWOR validating webhook enforces that you have permission to create the embedded resource type yourself, and it requires TLS (provided by cert-manager, installed in step 3 above).
129+
130+
After deploying the helloworld worker (step 5), apply the example HPA:
131+
132+
```bash
133+
kubectl apply -f examples/twor-hpa.yaml
134+
```
135+
136+
Watch the controller create an HPA for each active Build ID:
137+
138+
```bash
139+
# See TWOR status (Applied: true once the controller reconciles)
140+
kubectl get twor
141+
142+
# See the per-Build-ID HPAs
143+
kubectl get hpa
144+
```
145+
146+
You should see one HPA per active worker version, with `scaleTargetRef` automatically pointing at the correct versioned Deployment.
147+
148+
When you deploy a new worker version (e.g., step 8), the controller creates a new HPA for the new Build ID and keeps the old one until that version is deleted.
149+
150+
See [docs/owned-resources.md](../../docs/owned-resources.md) for full documentation.
151+
116152
### Cleanup
117153

118154
To clean up the demo:

0 commit comments

Comments
 (0)