Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 22 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
> ⚠️ This project is 100% experimental. Please do not attempt to install the controller in any production and/or shared environment.

The goal of the Temporal Worker Controller is to make it easy to run workers on Kubernetes while leveraging
[Worker Versioning](https://docs.temporal.io/workers#worker-versioning).
[Worker Deployments](https://docs.temporal.io/production-deployment/worker-deployments).

## Why

Expand Down Expand Up @@ -33,9 +33,9 @@ Temporal APIs to update the routing config of Temporal Worker Deployments to rou
## Terminology
Note that in Temporal, **Worker Deployment** is sometimes referred to as **Deployment**, but since the controller makes
significant references to Kubernetes Deployment resource, within this repository we will stick to these terms:
- **Worker Deployment Version**: A version of a deployment or service. It can have multiple Workers, but they all run the same build. Sometimes shortened to "version" or "deployment version."
- **Worker Deployment**: A deployment or service across multiple versions. In a rainbow deploy, a worker deployment can have multiple active deployment versions running at once.
- **Deployment**: A Kubernetes Deployment resource. A Deployment is "versioned" if it is running versioned Temporal workers/pollers.
- **Worker Deployment Version**: A version of a deployment or service that runs [Temporal Workers](https://docs.temporal.io/workers). It can have multiple Workers, but they all run the same build. Sometimes shortened to "version" or "deployment version."
- **Worker Deployment**: A deployment or service across multiple deployment versions. In a rainbow deploy, a Worker Deployment can have multiple active Deployment Versions running at once.
- **Deployment**: A [Kubernetes Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) resource. A Deployment is "versioned" if it is running versioned Temporal workers/pollers.

## Features

Expand All @@ -45,10 +45,6 @@ significant references to Kubernetes Deployment resource, within this repository
- [x] `Manual`, `AllAtOnce`, and `Progressive` rollouts of new versions
- [x] Ability to specify a "gate" workflow that must succeed on the new version before routing real traffic to that version
- [ ] Autoscaling of versioned Deployments
- [ ] Canary analysis of new worker versions
- [ ] Optional cancellation after timeout for workflows on old versions
- [ ] Passing `ContinueAsNew` signal to workflows on old versions


## Usage

Expand All @@ -65,18 +61,16 @@ Each of these will be automatically set by the controller, and must not be manua

## How It Works

Note: These sequence diagrams have not been fully converted to versioning v0.31 terminology.

Every `TemporalWorkerDeployment` resource manages one or more standard `Deployment` resources. Each deployment manages pods
which in turn poll Temporal for tasks routed to their respective versions.
Every `TemporalWorkerDeployment` resource manages one or more standard `Deployment` resources. Each Deployment manages pods
which in turn poll Temporal for tasks routed to their respective worker versions.

```mermaid
flowchart TD
subgraph "K8s Namespace 'ns'"
twd[TemporalWorkerDeployment 'foo']

subgraph "Current/default version"
d5["Deployment foo-v5 (Version foo/ns.v5)"]
d5["Deployment foo-v5, Version{DeploymentName: foo/ns, BuildId: v5}"]
rs5["ReplicaSet foo-v5"]
p5a["Pod foo-v5-a"]
p5b["Pod foo-v5-b"]
Expand All @@ -88,7 +82,7 @@ flowchart TD
end

subgraph "Deprecated versions"
d1["Deployment foo-v1 (Version foo/ns.v1)"]
d1["Deployment foo-v1 Version{DeploymentName: foo/ns, BuildId: v1}"]
rs1["ReplicaSet foo-v1"]
p1a["Pod foo-v1-a"]
p1b["Pod foo-v1-b"]
Expand All @@ -104,24 +98,24 @@ flowchart TD
twd --> dN
twd --> d5

p1a -. "poll version foo/ns.v1" .-> server
p1b -. "poll version foo/ns.v1" .-> server
p1a -. "poll version {foo/ns, v1}" .-> server
p1b -. "poll version {foo/ns, v1}" .-> server

p5a -. "poll version foo/ns.v5" .-> server
p5b -. "poll version foo/ns.v5" .-> server
p5c -. "poll version foo/ns.v5" .-> server
p5a -. "poll version {foo/ns, v5}" .-> server
p5b -. "poll version {foo/ns, v5}" .-> server
p5c -. "poll version {foo/ns, v5}" .-> server

server["Temporal Server"]
```

### Worker Lifecycle

When a new worker deployment version is deployed, the worker controller detects it and automatically begins the process
of making that version the new **current** (aka default) version of the worker deployment it is a part of. This could happen
of making that version the new **Current Version** of the worker deployment it is a part of. This could happen
immediately if `rollout.strategy = AllAtOnce`, or gradually if `rollout.strategy = Progressive`.

As older pinned workflows finish executing and deprecated deployment versions become drained, the worker controller also
frees up resources by sunsetting the `Deployment` resources polling those versions.
As older pinned workflows finish executing and deprecated deployment versions become **Drained**, the worker controller
frees up resources by sunsetting the `Deployment` resources running workers that poll those versions.

Here is an example of a progressive cut-over strategy gated on the success of the `HelloWorld` workflow:
```yaml
Expand All @@ -147,14 +141,14 @@ sequenceDiagram
Dev->>K8s: Create TemporalWorkerDeployment "foo" (v1)
K8s-->>Ctl: Notify TemporalWorkerDeployment "foo" created
Ctl->>K8s: Create Deployment "foo-v1"
Ctl->>T: Register "foo/ns.v1" as new current version of "foo/ns"
Ctl->>T: Register build "v1" as new current version of "foo/ns"
Dev->>K8s: Update TemporalWorker "foo" (v2)
K8s-->>Ctl: Notify TemporalWorker "foo" updated
Ctl->>K8s: Create Deployment "foo-v2"
Ctl->>T: Register "foo/ns.v2" as new current version of "foo/ns"
Ctl->>T: Register build "v2" as new current version of "foo/ns"

loop Poll Temporal API
Ctl-->>T: Wait for "foo/ns.v1" to be drained (no open pinned wfs)
Ctl-->>T: Wait for version {foo/ns, v1} to be drained (no open pinned wfs)
end

Ctl->>K8s: Delete Deployment "foo-v1"
Expand All @@ -166,9 +160,9 @@ This project is in very early stages; as such external code contributions are no

Bug reports and feature requests are welcome! Please [file an issue](https://github.com/jlegrone/worker-controller/issues/new).

You may also reach out to `@jlegrone` on the [Temporal Slack](https://t.mp/slack) if you have questions, suggestions, or are
interested in making other contributions.
You may also reach out to [#safe-deploys](https://temporalio.slack.com/archives/C07MDJ6S3HP) or @jlegrone on the
[Temporal Slack](https://t.mp/slack) if you have questions, suggestions, or are interested in making other contributions.

## Development

For local development setup and running the controller locally, see the [development guide](internal/README.md).
For local development setup and running the controller locally, see the [local demo guide](internal/demo/README.md).
Loading