Add event recording and status conditions for worker deployments#203
Merged
Add event recording and status conditions for worker deployments#203
Conversation
carlydf
reviewed
Feb 21, 2026
carlydf
reviewed
Feb 21, 2026
Collaborator
carlydf
left a comment
There was a problem hiding this comment.
also make fmt-imports will solve some of your lint errors
… usage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5632069 to
5900793
Compare
carlydf
reviewed
Feb 24, 2026
Collaborator
carlydf
left a comment
There was a problem hiding this comment.
looking good! just did initial review, we should still add a functional test once these comments are addressed.
I found https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#events and https://book.kubebuilder.io/reference/raising-events#creating-events helpful while reviewing.
Shivs11
reviewed
Feb 27, 2026
carlydf
reviewed
Mar 3, 2026
Collaborator
carlydf
left a comment
There was a problem hiding this comment.
really close from my perspective. will push a commit showing what I mean about the stricter string types for EventType and ConditionType.
"Registration" already has a meaning in Temporal versioning (a worker polling for the first time creates a version record). "Promotion" better describes setting a version as current or ramping, which moves it forward in the rollout lifecycle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…est scenarios ranked by trigger difficulty
carlydf
reviewed
Mar 4, 2026
carlydf
reviewed
Mar 4, 2026
…on-nil with empty name
…lientCreationFailed when Temporal had a problem
Shivs11
added a commit
that referenced
this pull request
Mar 20, 2026
) <!--- Note to EXTERNAL Contributors --> <!-- Thanks for opening a PR! If it is a significant code change, please **make sure there is an open issue** for this. We work best with you when we have accepted the idea first before you code. --> <!--- For ALL Contributors 👇 --> ## What was changed - WISOTT - Note: This bug fix was an unfortunate regression that was introduced with [this](#203). This intends on fixing that. ## Why? - Bug fix! ## Checklist <!--- add/delete as needed ---> 1. Closes <!-- add issue number here --> 2. How was this tested: <!--- Please describe how you tested your changes/how we can test them --> 3. Any docs updates needed? <!--- update README if applicable or point out where to update docs.temporal.io -->
Shivs11
added a commit
that referenced
this pull request
Mar 20, 2026
) <!--- Note to EXTERNAL Contributors --> <!-- Thanks for opening a PR! If it is a significant code change, please **make sure there is an open issue** for this. We work best with you when we have accepted the idea first before you code. --> <!--- For ALL Contributors 👇 --> ## What was changed - WISOTT - Note: This bug fix was an unfortunate regression that was introduced with [this](#203). This intends on fixing that. ## Why? - Bug fix! ## Checklist <!--- add/delete as needed ---> 1. Closes <!-- add issue number here --> 2. How was this tested: <!--- Please describe how you tested your changes/how we can test them --> 3. Any docs updates needed? <!--- update README if applicable or point out where to update docs.temporal.io -->
4 tasks
carlydf
added a commit
that referenced
this pull request
Mar 23, 2026
## Summary - Adds `clientpool_test.go` with 8 unit tests covering the auth code paths that had no test coverage - Two tests are explicit regression guards for the bugs fixed in #227 and #232 - Makes `dialFn` and `systemCertPoolFn` injectable on `ClientPool` (no behavior change in production) to enable testing without network I/O or OS trust store dependencies ## Regression tests **`TestFetchMTLS_CACertAppendsToSystemPool`** — guards against the PR #212 bug (fixed in #227): `fetchClientUsingMTLSSecret` used `x509.NewCertPool()` (empty) instead of `x509.SystemCertPool()`, silently dropping system root CAs and breaking Temporal Cloud connections. The test injects a fake system pool and verifies both the injected system CAs and the custom `ca.crt` are present in the returned pool. This test fails if the fix is reverted. **`TestDialAndUpsert_APIKeySkipsCheckHealth`** — guards against the PR #203 bug (fixed in #232): `DialAndUpsertClient` called `CheckHealth` unconditionally, which fails on Temporal Cloud with namespace-scoped API keys. The test uses an injected mock client and asserts `CheckHealth` is never called for `AuthModeAPIKey`. This test fails if the fix is reverted. ## Test plan - [x] `go test ./internal/controller/clientpool/... -v` — all 8 tests pass - [x] `go build ./...` — no compilation errors - [x] Manually revert the PR #227 fix → `TestFetchMTLS_CACertAppendsToSystemPool` fails - [x] Manually revert the PR #232 fix → `TestDialAndUpsert_APIKeySkipsCheckHealth` fails 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
shashwatsuri
pushed a commit
to shashwatsuri/temporal-worker-controller
that referenced
this pull request
Apr 28, 2026
…poralio#203) ## What changed: Added Kubernetes events and status conditions (TemporalConnectionHealthy, RolloutReady) to the worker controller reconciliation loop. ##Why: Reconciliation failures were only visible in controller logs — events and conditions let users diagnose issues directly via kubectl. 1. Closes temporalio#28 2. How was this tested: added unit tests and functional tests 3. Any docs updates needed? N/A 4. Is this risky? Explain Making a change to the CRD (adding conditions) opens up the risk that users could upgrade the controller but fail to upgrade their CRD. In this case, it is ok if new features are silently ignored, but we don't want the controller to panic or fail to successfully do the actions that were available in the previous CRD version. I believe that this change is safe even if someone forgets to upgrade their CRD, because when this new controller runs against a v1.2.0 CRD: - No panic. The controller calls r.Status().Update(ctx, twd) with conditions populated in memory. The API server validates against the CRD schema and prunes unknown fields (standard behavior for structural schemas without x-kubernetes-preserve-unknown-fields). The status write succeeds with a 200 and the conditions are silently dropped before storage. - Kubernetes Events work fine. Events are written as separate events.k8s.io/v1 resources, completely independent of the TWD CRD schema. All r.Recorder.Eventf(...) calls will succeed normally. - Conditions simply don't persist. kubectl get twd foo -o yaml will show no conditions field. The controller sets them in memory on every reconcile, tries to write, and the API server drops them. Functionally the controller does the right thing, it just can't communicate the health status via conditions until the CRD is upgraded. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Carly de Frondeville <cdefrondeville@berkeley.edu>
shashwatsuri
pushed a commit
to shashwatsuri/temporal-worker-controller
that referenced
this pull request
Apr 28, 2026
…emporalio#232) <!--- Note to EXTERNAL Contributors --> <!-- Thanks for opening a PR! If it is a significant code change, please **make sure there is an open issue** for this. We work best with you when we have accepted the idea first before you code. --> <!--- For ALL Contributors 👇 --> ## What was changed - WISOTT - Note: This bug fix was an unfortunate regression that was introduced with [this](temporalio#203). This intends on fixing that. ## Why? - Bug fix! ## Checklist <!--- add/delete as needed ---> 1. Closes <!-- add issue number here --> 2. How was this tested: <!--- Please describe how you tested your changes/how we can test them --> 3. Any docs updates needed? <!--- update README if applicable or point out where to update docs.temporal.io -->
shashwatsuri
pushed a commit
to shashwatsuri/temporal-worker-controller
that referenced
this pull request
Apr 28, 2026
## Summary - Adds `clientpool_test.go` with 8 unit tests covering the auth code paths that had no test coverage - Two tests are explicit regression guards for the bugs fixed in temporalio#227 and temporalio#232 - Makes `dialFn` and `systemCertPoolFn` injectable on `ClientPool` (no behavior change in production) to enable testing without network I/O or OS trust store dependencies ## Regression tests **`TestFetchMTLS_CACertAppendsToSystemPool`** — guards against the PR temporalio#212 bug (fixed in temporalio#227): `fetchClientUsingMTLSSecret` used `x509.NewCertPool()` (empty) instead of `x509.SystemCertPool()`, silently dropping system root CAs and breaking Temporal Cloud connections. The test injects a fake system pool and verifies both the injected system CAs and the custom `ca.crt` are present in the returned pool. This test fails if the fix is reverted. **`TestDialAndUpsert_APIKeySkipsCheckHealth`** — guards against the PR temporalio#203 bug (fixed in temporalio#232): `DialAndUpsertClient` called `CheckHealth` unconditionally, which fails on Temporal Cloud with namespace-scoped API keys. The test uses an injected mock client and asserts `CheckHealth` is never called for `AuthModeAPIKey`. This test fails if the fix is reverted. ## Test plan - [x] `go test ./internal/controller/clientpool/... -v` — all 8 tests pass - [x] `go build ./...` — no compilation errors - [x] Manually revert the PR temporalio#227 fix → `TestFetchMTLS_CACertAppendsToSystemPool` fails - [x] Manually revert the PR temporalio#232 fix → `TestDialAndUpsert_APIKeySkipsCheckHealth` fails 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed: Added Kubernetes events and status conditions
(TemporalConnectionHealthy, RolloutReady) to the worker controller
reconciliation loop.
##Why: Reconciliation failures were only visible in controller logs —
events and conditions let users diagnose issues directly via kubectl.
Closes Add events to the TemporalWorkerDeployment CRD when there is a problem #28
How was this tested:
added unit tests and functional tests
Any docs updates needed?
N/A
Is this risky? Explain
Making a change to the CRD (adding conditions) opens up the risk that users could upgrade the controller but fail to upgrade their CRD. In this case, it is ok if new features are silently ignored, but we don't want the controller to panic or fail to successfully do the actions that were available in the previous CRD version. I believe that this change is safe even if someone forgets to upgrade their CRD, because when this new controller runs against a v1.2.0 CRD: