Skip to content

Commit 6e8e1d2

Browse files
authored
Revert "Use ManagerIdentity API instead of LastModifierIdentity + ignore-last-modifier metadata hack (#220)" (#233)
<!--- Note to EXTERNAL Contributors --> <!-- Thanks for opening a PR! If it is a significant code change, please **make sure there is an open issue** for this. We work best with you when we have accepted the idea first before you code. --> <!--- For ALL Contributors 👇 --> ## What was changed This reverts commit fced0ad. ## Why? Temporary to reduce merge conflicts with a bug fix we are about to merge to main and pick into some old releases. ## Checklist <!--- add/delete as needed ---> 1. Closes <!-- add issue number here --> 2. How was this tested: <!--- Please describe how you tested your changes/how we can test them --> 3. Any docs updates needed? <!--- update README if applicable or point out where to update docs.temporal.io -->
1 parent fced0ad commit 6e8e1d2

15 files changed

Lines changed: 145 additions & 292 deletions

api/v1alpha1/worker_types.go

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -196,12 +196,6 @@ type TemporalWorkerDeploymentStatus struct {
196196
// +optional
197197
LastModifierIdentity string `json:"lastModifierIdentity,omitempty"`
198198

199-
// ManagerIdentity is the identity that has exclusive rights to modify this Worker Deployment's routing config.
200-
// When set, clients whose identity does not match will be blocked from making routing changes.
201-
// Empty by default. Use `temporal worker deployment manager-identity set/unset` to change.
202-
// +optional
203-
ManagerIdentity string `json:"managerIdentity,omitempty"`
204-
205199
// VersionCount is the total number of versions currently known by the worker deployment.
206200
// This includes current, target, ramping, and deprecated versions.
207201
// +optional

docs/ownership.md

Lines changed: 39 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2,52 +2,57 @@
22

33
## Problem
44

5-
If the worker controller is managing a Worker Deployment (i.e. updating its routing config), but a user makes a manual
6-
change via the CLI, SDK, or gRPC API instead of via the `TemporalWorkerDeployment` CRD interface, the controller should
7-
not clobber the user's change.
5+
If a worker controller is managing a Worker Deployment (ie. the controller is updating the RoutingConfig of the Worker
6+
Deployment), but the user changes something via the CLI (ie. rolls back to the previous current version, or stops the
7+
new target version from ramping because an issue was detected), the controller should not clobber what the human did.
88

9-
Once the user has finished their manual intervention, they need a way to hand ownership back to the controller.
9+
At some point, after this human has handled their urgent rollback, they will want to let the controller know that it is
10+
authorized to resume making changes to the Routing Config of the Worker Deployment.
1011

1112
## Solution
1213

13-
The controller uses the Temporal server's `ManagerIdentity` field on Worker Deployments to coordinate exclusive
14-
ownership of routing changes.
14+
_Once it is available in OSS v1.29, the controller will be able to coordinate with other users via the `ManagerIdentity`
15+
field of a Worker Deployment. This runbook will be updated when that is available and implemented by the controller._
1516

16-
When `ManagerIdentity` is set on a Worker Deployment, only clients whose identity matches `ManagerIdentity` can make
17-
routing changes (set current version, set ramping version). The controller's identity is visible in the
18-
`managerIdentity` field of the `TemporalWorkerDeployment` status.
17+
In the meantime, the controller will watch the `LastModifierIdentity` field of a Worker Deployment to detect whether
18+
another user has made a change. If another user made a change to the Worker Deployment, the controller will not make
19+
any more changes to ensure a human's change is not clobbered.
1920

20-
### How the controller claims ownership
21+
Once you are done making your own changes to the Worker Deployment's current and ramping versions, and you are ready
22+
for the Worker Controller to take over, you can update the metadata to indicate that.
2123

22-
The first time the controller plans a routing change for a Worker Deployment (i.e. when `ManagerIdentity` is empty),
23-
it calls `SetManagerIdentity` to claim ownership before applying the change. Subsequent routing changes succeed because
24-
the controller's identity already matches `ManagerIdentity`.
24+
There is no Temporal server support for Worker Deployment Version-level metadata, so you'll have to set this value on
25+
the Current Version of your Worker Deployment.
2526

26-
### Taking manual control
27-
28-
To take manual control away from the controller, set `ManagerIdentity` to your own identity:
27+
Note: The controller decodes this metadata value as a string. Be sure to set the value to the string "true" (not the boolean true).
2928

3029
```bash
31-
temporal worker deployment manager-identity set \
30+
temporal worker deployment update-metadata-version \
3231
--deployment-name $MY_DEPLOYMENT \
33-
--self
32+
--build-id $CURRENT_VERSION_BUILD_ID \
33+
--metadata 'temporal.io/ignore-last-modifier="true"'
3434
```
35-
36-
The `--self` flag sets `ManagerIdentity` to the identity of the caller (auto-generated by the CLI if not explicitly
37-
provided via `--identity`; similarly, the SDK uses its own auto-generated or configured identity). After this, the
38-
controller's routing change attempts will fail and it will retry on a backoff until ownership is returned.
39-
40-
You can then make routing changes freely (the server enforces `ManagerIdentity` for all clients, not just the
41-
controller).
42-
43-
### Returning ownership to the controller
44-
45-
When you are done with your manual changes and want the controller to resume, clear `ManagerIdentity`:
46-
35+
Alternatively, if your CLI supports JSON input:
4736
```bash
48-
temporal worker deployment manager-identity unset \
49-
--deployment-name $MY_DEPLOYMENT
37+
temporal worker deployment update-metadata-version \
38+
--deployment-name $MY_DEPLOYMENT \
39+
--build-id $CURRENT_VERSION_BUILD_ID \
40+
--metadata-json '{"temporal.io/ignore-last-modifier":"true"}'
41+
```
42+
In the rare case that you have a nil Current Version when you are passing back ownership, you should set it on your Ramping Version
43+
```bash
44+
temporal worker deployment update-metadata-version \
45+
--deployment-name $MY_DEPLOYMENT \
46+
--build-id $RAMPING_VERSION_BUILD_ID \
47+
--metadata 'temporal.io/ignore-last-modifier="true"'
48+
```
49+
Or with JSON:
50+
```bash
51+
temporal worker deployment update-metadata-version \
52+
--deployment-name $MY_DEPLOYMENT \
53+
--build-id $RAMPING_VERSION_BUILD_ID \
54+
--metadata-json '{"temporal.io/ignore-last-modifier":"true"}'
5055
```
5156

52-
On the next reconcile, the controller will detect that `ManagerIdentity` is empty, claim it for itself, and resume
53-
managing routing changes.
57+
In the even rarer case that you have nil Current Version and nil Ramping Version, you'll need to use the CLI or SDK to
58+
set a Current or Ramping Version and then do as instructed above.

internal/controller/clientpool/clientpool.go

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -97,15 +97,13 @@ type NewClientOptions struct {
9797
TemporalNamespace string
9898
K8sNamespace string
9999
Spec v1alpha1.TemporalConnectionSpec
100-
Identity string
101100
}
102101

103102
func (cp *ClientPool) fetchClientUsingMTLSSecret(secret corev1.Secret, opts NewClientOptions) (*sdkclient.Options, *ClientPoolKey, *ClientAuth, error) {
104103
clientOpts := sdkclient.Options{
105104
Logger: cp.logger,
106105
HostPort: opts.Spec.HostPort,
107106
Namespace: opts.TemporalNamespace,
108-
Identity: opts.Identity,
109107
}
110108

111109
var pemCert []byte
@@ -172,7 +170,6 @@ func (cp *ClientPool) fetchClientUsingAPIKeySecret(secret corev1.Secret, opts Ne
172170
Logger: cp.logger,
173171
HostPort: opts.Spec.HostPort,
174172
Namespace: opts.TemporalNamespace,
175-
Identity: opts.Identity,
176173
ConnectionOptions: sdkclient.ConnectionOptions{
177174
TLS: &tls.Config{},
178175
},
@@ -201,7 +198,6 @@ func (cp *ClientPool) fetchClientUsingNoCredentials(opts NewClientOptions) (*sdk
201198
Logger: cp.logger,
202199
HostPort: opts.Spec.HostPort,
203200
Namespace: opts.TemporalNamespace,
204-
Identity: opts.Identity,
205201
}
206202

207203
key := ClientPoolKey{

internal/controller/execplan.go

Lines changed: 0 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ import (
1212

1313
"github.com/go-logr/logr"
1414
temporaliov1alpha1 "github.com/temporalio/temporal-worker-controller/api/v1alpha1"
15-
"github.com/temporalio/temporal-worker-controller/internal/planner"
1615
"github.com/temporalio/temporal-worker-controller/internal/temporal"
1716
enumspb "go.temporal.io/api/enums/v1"
1817
sdkclient "go.temporal.io/sdk/client"
@@ -151,46 +150,12 @@ func (r *TemporalWorkerDeploymentReconciler) startTestWorkflows(ctx context.Cont
151150
return nil
152151
}
153152

154-
func (r *TemporalWorkerDeploymentReconciler) shouldClaimManagerIdentity(vcfg *planner.VersionConfig) bool {
155-
return vcfg.ManagerIdentity == ""
156-
}
157-
158-
func (r *TemporalWorkerDeploymentReconciler) claimManagerIdentity(
159-
ctx context.Context,
160-
l logr.Logger,
161-
workerDeploy *temporaliov1alpha1.TemporalWorkerDeployment,
162-
deploymentHandler sdkclient.WorkerDeploymentHandle,
163-
vcfg *planner.VersionConfig,
164-
) error {
165-
resp, err := deploymentHandler.SetManagerIdentity(ctx, sdkclient.WorkerDeploymentSetManagerIdentityOptions{
166-
Self: true,
167-
ConflictToken: vcfg.ConflictToken,
168-
Identity: getControllerIdentity(),
169-
})
170-
if err != nil {
171-
l.Error(err, "unable to claim manager identity")
172-
r.Recorder.Eventf(workerDeploy, corev1.EventTypeWarning, ReasonManagerIdentityClaimFailed,
173-
"Failed to claim manager identity: %v", err)
174-
return err
175-
}
176-
l.Info("claimed manager identity", "identity", getControllerIdentity())
177-
// Use the updated conflict token for the subsequent routing config change.
178-
vcfg.ConflictToken = resp.ConflictToken
179-
return nil
180-
}
181-
182153
func (r *TemporalWorkerDeploymentReconciler) updateVersionConfig(ctx context.Context, l logr.Logger, workerDeploy *temporaliov1alpha1.TemporalWorkerDeployment, deploymentHandler sdkclient.WorkerDeploymentHandle, p *plan) error {
183154
vcfg := p.UpdateVersionConfig
184155
if vcfg == nil {
185156
return nil
186157
}
187158

188-
if r.shouldClaimManagerIdentity(vcfg) {
189-
if err := r.claimManagerIdentity(ctx, l, workerDeploy, deploymentHandler, vcfg); err != nil {
190-
return fmt.Errorf("unable to claim manager identity: %w", err)
191-
}
192-
}
193-
194159
if vcfg.SetCurrent {
195160
l.Info("registering new current version", "buildID", vcfg.BuildID)
196161
if _, err := deploymentHandler.SetCurrentVersion(ctx, sdkclient.WorkerDeploymentSetCurrentVersionOptions{

internal/controller/genplan.go

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,15 @@ func (r *TemporalWorkerDeploymentReconciler) generatePlan(
8080
ScaleDeployments: make(map[*corev1.ObjectReference]uint32),
8181
}
8282

83+
// Check if we need to force manual strategy due to external modification
8384
rolloutStrategy := w.Spec.RolloutStrategy
85+
if w.Status.LastModifierIdentity != getControllerIdentity() &&
86+
w.Status.LastModifierIdentity != serverDeleteVersionIdentity &&
87+
w.Status.LastModifierIdentity != "" &&
88+
!temporalState.IgnoreLastModifier {
89+
l.Info(fmt.Sprintf("Forcing Manual rollout strategy since Worker Deployment was modified by a user with a different identity '%s'; to allow controller to make changes again, set 'temporal.io/ignore-last-modifier=true' in the metadata of your Current or Ramping Version; see ownership runbook at docs/ownership.md for more details.", w.Status.LastModifierIdentity))
90+
rolloutStrategy.Strategy = temporaliov1alpha1.UpdateManual
91+
}
8492

8593
// Resolve gate input if gate is configured
8694
var gateInput []byte

internal/controller/reconciler_events_test.go

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -627,20 +627,20 @@ func TestUpdateVersionConfig_EmitsEventOnFailure(t *testing.T) {
627627
{
628628
name: "SetCurrentFailed",
629629
handle: &stubWDHandle{setCurrentErr: errors.New("simulated SetCurrentVersion failure")},
630-
config: &planner.VersionConfig{BuildID: "build-abc", SetCurrent: true, ManagerIdentity: "some-manager"},
630+
config: &planner.VersionConfig{BuildID: "build-abc", SetCurrent: true},
631631
expectedReason: ReasonVersionPromotionFailed,
632632
},
633633
{
634634
name: "SetRampingFailed",
635635
handle: &stubWDHandle{setRampingErr: errors.New("simulated SetRampingVersion failure")},
636-
config: &planner.VersionConfig{BuildID: "build-abc", RampPercentage: 25, ManagerIdentity: "some-manager"},
636+
config: &planner.VersionConfig{BuildID: "build-abc", RampPercentage: 25},
637637
expectedReason: ReasonVersionPromotionFailed,
638638
},
639639
{
640640
// SetCurrentVersion succeeds; UpdateVersionMetadata fails.
641641
name: "MetadataUpdateFailed",
642642
handle: &stubWDHandle{updateMetaErr: errors.New("simulated UpdateVersionMetadata failure")},
643-
config: &planner.VersionConfig{BuildID: "build-abc", SetCurrent: true, ManagerIdentity: "some-manager"},
643+
config: &planner.VersionConfig{BuildID: "build-abc", SetCurrent: true},
644644
expectedReason: ReasonMetadataUpdateFailed,
645645
},
646646
}

internal/controller/state_mapper.go

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@ func (m *stateMapper) mapToStatus(targetBuildID string) *v1alpha1.TemporalWorker
3434
}
3535

3636
status.LastModifierIdentity = m.temporalState.LastModifierIdentity
37-
status.ManagerIdentity = m.temporalState.ManagerIdentity
3837

3938
// Get build IDs directly from temporal state
4039
currentBuildID := m.temporalState.CurrentBuildID

internal/controller/util.go

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,15 @@ import (
1818
// status API and may change between releases. Do not write alerting or automation
1919
// that depends on these strings.
2020
const (
21-
ReasonPlanGenerationFailed = "PlanGenerationFailed"
22-
ReasonPlanExecutionFailed = "PlanExecutionFailed"
23-
ReasonDeploymentCreateFailed = "DeploymentCreateFailed"
24-
ReasonDeploymentDeleteFailed = "DeploymentDeleteFailed"
25-
ReasonDeploymentScaleFailed = "DeploymentScaleFailed"
26-
ReasonDeploymentUpdateFailed = "DeploymentUpdateFailed"
27-
ReasonTestWorkflowStartFailed = "TestWorkflowStartFailed"
28-
ReasonVersionPromotionFailed = "VersionPromotionFailed"
29-
ReasonMetadataUpdateFailed = "MetadataUpdateFailed"
30-
ReasonManagerIdentityClaimFailed = "ManagerIdentityClaimFailed"
21+
ReasonPlanGenerationFailed = "PlanGenerationFailed"
22+
ReasonPlanExecutionFailed = "PlanExecutionFailed"
23+
ReasonDeploymentCreateFailed = "DeploymentCreateFailed"
24+
ReasonDeploymentDeleteFailed = "DeploymentDeleteFailed"
25+
ReasonDeploymentScaleFailed = "DeploymentScaleFailed"
26+
ReasonDeploymentUpdateFailed = "DeploymentUpdateFailed"
27+
ReasonTestWorkflowStartFailed = "TestWorkflowStartFailed"
28+
ReasonVersionPromotionFailed = "VersionPromotionFailed"
29+
ReasonMetadataUpdateFailed = "MetadataUpdateFailed"
3130
)
3231

3332
const (

internal/controller/worker_controller.go

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,6 @@ func (r *TemporalWorkerDeploymentReconciler) Reconcile(ctx context.Context, req
184184
K8sNamespace: workerDeploy.Namespace,
185185
TemporalNamespace: workerDeploy.Spec.WorkerOptions.TemporalNamespace,
186186
Spec: temporalConnection.Spec,
187-
Identity: getControllerIdentity(),
188187
})
189188
if err != nil {
190189
l.Error(err, "invalid Temporal auth secret")

internal/planner/planner.go

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@ type Plan struct {
2727
ShouldCreateDeployment bool
2828
VersionConfig *VersionConfig
2929
TestWorkflows []WorkflowConfig
30-
3130
// Build IDs of versions from which the controller should
3231
// remove IgnoreLastModifierKey from the version metadata
3332
RemoveIgnoreLastModifierBuilds []string
@@ -46,11 +45,6 @@ type VersionConfig struct {
4645
SetCurrent bool
4746
// Acceptable values [0,100]
4847
RampPercentage int32
49-
50-
// ManagerIdentity is the current manager identity of the worker deployment in Temporal.
51-
// An empty string indicates the controller should claim the identity before applying
52-
// any routing config changes.
53-
ManagerIdentity string
5448
}
5549

5650
// WorkflowConfig defines a workflow to be started
@@ -552,14 +546,9 @@ func getVersionConfigDiff(
552546
}
553547
}
554548

555-
managerIdentity := ""
556-
if temporalState != nil {
557-
managerIdentity = temporalState.ManagerIdentity
558-
}
559549
vcfg := &VersionConfig{
560-
ConflictToken: conflictToken,
561-
BuildID: status.TargetVersion.BuildID,
562-
ManagerIdentity: managerIdentity,
550+
ConflictToken: conflictToken,
551+
BuildID: status.TargetVersion.BuildID,
563552
}
564553

565554
// If there is no current version and presence of unversioned pollers is not confirmed for all

0 commit comments

Comments
 (0)