Skip to content

Flush SDK client cache on PermissionDenied/Unauthenticated errors#300

Merged
carlydf merged 1 commit intomainfrom
flush-sdk-client-on-access-denied
Apr 25, 2026
Merged

Flush SDK client cache on PermissionDenied/Unauthenticated errors#300
carlydf merged 1 commit intomainfrom
flush-sdk-client-on-access-denied

Conversation

@carlydf
Copy link
Copy Markdown
Collaborator

@carlydf carlydf commented Apr 24, 2026

Summary

  • Adds ClientPool.EvictClient(key) to close and remove a cached SDK client by key
  • Detects PermissionDenied and Unauthenticated errors from Temporal SDK calls in the reconcile loop
  • Evicts the stale client from the pool at both error sites (GetWorkerDeploymentState and executePlan), so the next reconcile re-reads credentials from the K8s secret and re-dials

Without this, a rotated API key or revoked mTLS cert causes a permanent stuck-retry loop that only recovers with a controller restart.

Closes #295
Closes #113

Test plan

  • go test ./internal/controller/clientpool/... — new TestEvictClient_RemovesAndClosesClient and TestEvictClient_NoopWhenKeyAbsent pass
  • go test ./internal/controller/... — existing controller tests still pass
  • go build ./... — compiles clean

🤖 Generated with Claude Code

When the Temporal server returns an access-denied error (PermissionDenied
or Unauthenticated), the stale cached client is now evicted from the pool
so the next reconcile re-reads credentials from the K8s secret and re-dials.

Closes #295
Closes #113

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@carlydf carlydf requested review from a team and jlegrone as code owners April 24, 2026 23:46
@veeral-patel veeral-patel self-requested a review April 24, 2026 23:49
@carlydf carlydf enabled auto-merge (squash) April 24, 2026 23:52
@carlydf carlydf disabled auto-merge April 25, 2026 00:04
@carlydf carlydf merged commit b4913c9 into main Apr 25, 2026
17 checks passed
@carlydf carlydf deleted the flush-sdk-client-on-access-denied branch April 25, 2026 00:04
carlydf added a commit that referenced this pull request Apr 25, 2026
## Summary

Follow-up to #300. Solves #295 more gracefully.

- Replaces the static `secret.Data` capture in the
`NewAPIKeyDynamicCredentials` closure with a live K8s secret read via a
new `fetchAPIKeyFromSecret` helper
- A rotated API key now takes effect on the next outgoing Temporal RPC —
no permission-denied cycle needed to evict and re-dial the cached client
- The k8s client is controller-runtime's cache-backed client, so reads
hit the local informer cache (cheap in-memory lookup, not a raw API
server call)
- `fetchAPIKeyFromSecret` is extracted as a testable method; new test
verifies the live-read and rotation behavior directly

## Test plan

- [ ] `go test ./internal/controller/clientpool/...` — new
`TestFetchAPIKey_CredentialClosureReadsLiveSecret` passes (verifies
initial read and post-rotation read return correct values)
- [ ] `go test ./internal/controller/...` — existing tests still pass
- [ ] `go build ./...` — compiles clean

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
shashwatsuri pushed a commit to shashwatsuri/temporal-worker-controller that referenced this pull request Apr 28, 2026
…mporalio#300)

## Summary

- Adds `ClientPool.EvictClient(key)` to close and remove a cached SDK
client by key
- Detects `PermissionDenied` and `Unauthenticated` errors from Temporal
SDK calls in the reconcile loop
- Evicts the stale client from the pool at both error sites
(`GetWorkerDeploymentState` and `executePlan`), so the next reconcile
re-reads credentials from the K8s secret and re-dials

Without this, a rotated API key or revoked mTLS cert causes a permanent
stuck-retry loop that only recovers with a controller restart.

Closes temporalio#295
Closes temporalio#113

## Test plan

- [ ] `go test ./internal/controller/clientpool/...` — new
`TestEvictClient_RemovesAndClosesClient` and
`TestEvictClient_NoopWhenKeyAbsent` pass
- [ ] `go test ./internal/controller/...` — existing controller tests
still pass
- [ ] `go build ./...` — compiles clean

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
shashwatsuri pushed a commit to shashwatsuri/temporal-worker-controller that referenced this pull request Apr 28, 2026
## Summary

Follow-up to temporalio#300. Solves temporalio#295 more gracefully.

- Replaces the static `secret.Data` capture in the
`NewAPIKeyDynamicCredentials` closure with a live K8s secret read via a
new `fetchAPIKeyFromSecret` helper
- A rotated API key now takes effect on the next outgoing Temporal RPC —
no permission-denied cycle needed to evict and re-dial the cached client
- The k8s client is controller-runtime's cache-backed client, so reads
hit the local informer cache (cheap in-memory lookup, not a raw API
server call)
- `fetchAPIKeyFromSecret` is extracted as a testable method; new test
verifies the live-read and rotation behavior directly

## Test plan

- [ ] `go test ./internal/controller/clientpool/...` — new
`TestFetchAPIKey_CredentialClosureReadsLiveSecret` passes (verifies
initial read and post-rotation read return correct values)
- [ ] `go test ./internal/controller/...` — existing tests still pass
- [ ] `go build ./...` — compiles clean

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] API key credentials stale after Kubernetes Secret rotation Controller should flush SDK client cache when access denied error occurs

2 participants