You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enable Controller-managed versioned scaling resources with WorkerResourceTemplate (#217)
<!--- Note to EXTERNAL Contributors -->
<!-- Thanks for opening a PR!
If it is a significant code change, please **make sure there is an open
issue** for this.
We work best with you when we have accepted the idea first before you
code. -->
<!--- For ALL Contributors 👇 -->
## What was changed
### New CRD: `WorkerResourceTemplate` (WRT)
A new `WorkerResourceTemplate` CRD that lets users attach arbitrary
namespaced Kubernetes resources (HPAs, PDBs, custom scalers, etc.) to a
`TemporalWorkerDeployment`. The controller creates one copy of the
resource per active worker version, with auto-injection of
`scaleTargetRef`, `selector.matchLabels`, and metric selector labels to
point at the correct versioned Deployment.
#### Key behaviors:
- One copy per active Build ID, named `{twdName}-{wrtName}-{buildID}`
(uniquely truncated to 47 chars, DNS-safe)
- Auto-injects `spec.scaleTargetRef` (when set to `{}`) to reference the
versioned Deployment → enables per-version HPA autoscaling, and any
other scaler that uses `scaleTargetRef`
- Auto-injects `selector.matchLabels` (when set to `{}`) with the
correct per-version labels → enables per-version PDB targeting, and
arbitrary CRDs that use `selector.matchLabels` to target versioned
Deployments
- Auto-appends `worker_deployment_name`, `worker_deployment_build_id`,
and `temporal_namespace` to
`spec.metrics[*].external.metric.selector.matchLabels` whenever
`matchLabels` is present (including `{}`). User labels like `task_type:
"Activity"` coexist alongside the injected keys. Absent `matchLabels` =
no injection for that metric entry.
- Applied via Server-Side Apply with field manager
`"temporal-worker-controller"`
- Owner ref on each resource copy points to the `WorkerResourceTemplate`
→ k8s GC deletes all copies when the WRT is deleted
- Apply status written back to
`WorkerResourceTemplate.status.versions[*]` (Applied, Message, BuildID)
- Resource spec lives in `spec.template` (raw JSON/YAML embedded object)
- Target TWD referenced via `spec.temporalWorkerDeploymentRef.name`
#### Validating Webhook
A `WorkerResourceTemplateValidator` webhook enforces:
- `apiVersion` and `kind` required; `metadata.name`/`metadata.namespace`
forbidden (controller sets these)
- Allowed resource kinds configurable via `ALLOWED_KINDS` env var
(default: `HorizontalPodAutoscaler`)
- `minReplicas` ≠ 0 (currently required for
`approximate_task_queue_backlog` metric-based autoscaling to work when
queue is idle, plan to relax this in future)
- `scaleTargetRef` must be absent or `{}` (opt-in sentinel); non-empty
value rejected (controller owns injection)
- `selector.matchLabels` must be absent or `{}` (opt-in sentinel);
non-empty value rejected (controller owns injection)
- `metrics[*].external.metric.selector.matchLabels` must not contain the
controller-owned keys `worker_deployment_name`,
`worker_deployment_build_id`, or `temporal_namespace`; user labels (e.g.
`task_type`) are allowed
- SAR check: requesting user must be able to create/update the embedded
resource type
- SAR check: controller service account must be able to create/update
the embedded resource type
- `spec.temporalWorkerDeploymentRef.name` is immutable after creation
### Helm chart updates
-
`helm/temporal-worker-controller-crds/templates/temporal.io_workerresourcetemplates.yaml`
(new CRD manifest)
- `helm/temporal-worker-controller/templates/webhook.yaml` (always-on
`WorkerResourceTemplate` `ValidatingWebhookConfiguration`;
`TemporalWorkerDeployment` webhook now behind `webhook.enabled`)
- `helm/temporal-worker-controller/templates/certmanager.yaml`
(cert-manager `Issuer` + `Certificate` for TLS, default enabled)
- `helm/temporal-worker-controller/Chart.yaml` (cert-manager added as
optional subchart dependency; opt in via `certmanager.install: true`)
- `helm/temporal-worker-controller/templates/manager.yaml` (cert
volume/port always present; `ALLOWED_KINDS`, `POD_NAMESPACE`,
`SERVICE_ACCOUNT_NAME` env vars)
- `helm/temporal-worker-controller/templates/rbac.yaml`
(`WorkerResourceTemplate` + SAR rules in manager `ClusterRole`;
editor/viewer roles; configurable attached-resource RBAC)
- `helm/temporal-worker-controller/values.yaml`
(`workerResourceConfig.allowedResources` default: HPA, piped to
`ALLOWED_KINDS` and to controller rbac)
### Integration tests
New integration test subtests added to the existing envtest suite, all
running through the shared `testTemporalWorkerDeploymentCreation`
table-test runner:
- `WorkerResourceTemplate` (7 tests): Deployment owner ref,
`matchLabels` injection, multiple `WorkerResourceTemplate`s on same
`TemporalWorkerDeployment`, metric selector label injection, multiple
active versions, apply failure → Applied:false, SSA idempotency
- Rollout gaps (5 tests): Progressive ramp to Current,
ConnectionSpecHash annotation repair, gate input from ConfigMap, gate
input from Secret, multiple deprecated versions
- Webhook admission (5 tests, separate Ginkgo suite): Spec rejection,
SAR pass, SAR fail (user), SAR fail (controller SA),
`temporalWorkerDeploymentRef.name` immutability
## Why?
HPA autoscaling for versioned Temporal workers requires a separate HPA
per worker version, each targeting only that version's Deployment with
the correct `scaleTargetRef` and label selectors. Without this CRD,
users have no way to create per-version resources that the controller
lifecycle-manages alongside the versioned Deployments.
## Checklist
<!--- add/delete as needed --->
1. Closes#207
2. How was this tested:
- Full envtest integration test suite: new subtests covering WRT
lifecycle, previously uncovered rollout scenarios, and webhook admission
via the real HTTP admission path
- Unit tests: webhook validator, SSA naming/injection helpers, planner
integration
- All tests pass: `KUBEBUILDER_ASSETS=.../bin/k8s/1.27.1-darwin-arm64 go
test -tags
test_dep ./...`
3. Any docs updates needed?
- `docs/worker-resource-template.md` added: concept overview, HPA
example with cert-manager setup, RBAC configuration guide
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
> 🚀 **Public Preview**: This project is in [Public Preview](https://docs.temporal.io/evaluate/development-production-features/release-stages) and ready for production use cases*. Core functionality is complete with stable APIs.
7
-
>
8
-
> *Dynamic auto-scaling based on workflow load is not yet implemented. Use cases must work with fixed worker replica counts.
9
7
10
8
**The Temporal Worker Controller makes it simple and safe to deploy Temporal workers on Kubernetes.**
📦 **Automatic version management** - Registers versions with Temporal, manages routing rules, and tracks version lifecycle
21
19
🎯 **Smart traffic routing** - New workflows automatically get routed to your target worker version
22
20
🛡️ **Progressive rollouts** - Catch incompatible changes early with small traffic percentages before they spread
23
-
⚡ **Easy rollbacks** - Instantly route traffic back to a previous version if issues are detected
21
+
⚡ **Easy rollbacks** - Instantly route traffic back to a previous version if issues are detected
22
+
📈 **Per-version autoscaling** - Attach HPAs or other custom scalers to each versioned Deployment via [`WorkerResourceTemplate`](docs/worker-resource-templates.md)
24
23
25
24
## Quick Example
26
25
@@ -78,6 +77,7 @@ When you update the image, the controller automatically:
78
77
- Helm [v3.0+](https://github.com/helm/helm/releases) if deploying via our Helm chart
79
78
- [Temporal Server](https://docs.temporal.io/) (Cloud or self-hosted [v1.29.1](https://github.com/temporalio/temporal/releases/tag/v1.29.1))
80
79
- Basic familiarity with Temporal [Workers](https://docs.temporal.io/workers), [Workflows](https://docs.temporal.io/workflows), and [Worker Versioning](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning)
80
+
- **TLS for the validating webhook** *(required for `WorkerResourceTemplate`)* — the recommended path is [cert-manager](https://cert-manager.io/docs/installation/), which handles certificate provisioning automatically. Install it separately or as a subchart of the controller chart (`certmanager.install: true`). If you prefer to manage TLS yourself, see [Webhook TLS](docs/worker-resource-templates.md#webhook-tls).
81
81
82
82
### 🔧 Installation
83
83
@@ -118,7 +118,7 @@ See [docs/crd-management.md](docs/crd-management.md) for upgrade, rollback, and
118
118
- ✅ **Deletion of resources** associated with drained Worker Deployment Versions
119
119
- ✅ **Multiple rollout strategies**: `Manual`, `AllAtOnce`, and `Progressive` rollouts
120
120
- ✅ **Gate workflows** - Test new versions with a [pre-deployment test](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning#adding-a-pre-deployment-test) before routing real traffic to them
121
-
-⏳**Load-based auto-scaling** - Not yet implemented (use fixed replica counts)
121
+
- ✅ **Per-version attached resources** - Attach HPAs, PodDisruptionBudgets, or any namespaced Kubernetes resource to each worker version with running workers via [`WorkerResourceTemplate`](docs/worker-resource-templates.md) — this is also the recommended path for metric-based and backlog-based autoscaling
122
122
123
123
124
124
## 💡 Why Use This?
@@ -143,16 +143,17 @@ The Temporal Worker Controller eliminates this operational overhead by automatin
143
143
144
144
## 📖 Documentation
145
145
146
-
| Document | Description |
147
-
|----------|-------------|
148
-
|[Migration Guide](docs/migration-to-versioned.md)| Step-by-step guide for migrating from traditional deployments |
149
-
|[Reversion Guide](docs/migration-to-unversioned.md)| Step-by-step guide for migrating back to unversioned deployment |
150
-
|[CD Rollouts](docs/cd-rollouts.md)| Helm, kubectl, ArgoCD, and Flux integration for steady-state rollouts |
151
-
|[Architecture](docs/architecture.md)| Technical deep-dive into how the controller works |
0 commit comments