fix: append custom CA to system cert pool instead of replacing it#227
Merged
fix: append custom CA to system cert pool instead of replacing it#227
Conversation
PR #212 introduced ca.crt support for server verification but used x509.NewCertPool(), which replaces the system CA bundle entirely. This breaks connections to Temporal Cloud (public CA) when the mTLS secret contains a ca.crt from cert-manager (the client CA). Use x509.SystemCertPool() instead so the custom CA is appended to the system bundle, allowing both private and public CAs to be trusted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
carlydf
approved these changes
Mar 10, 2026
Collaborator
carlydf
left a comment
There was a problem hiding this comment.
Makes sense to me. Let's try this. To test without having to cut another potentially bad v1.2.x patch release, can we test it in our dogfooding test env by just using the main image?
Shivs11
added a commit
that referenced
this pull request
Mar 19, 2026
## Summary - PR #212 introduced `ca.crt` support for server certificate verification but used `x509.NewCertPool()`, which creates an **empty** CA pool — replacing the system CA bundle entirely - This breaks connections to Temporal Cloud (public CA) when the mTLS secret contains a `ca.crt` key from cert-manager (the CA that signed the **client** cert, not the server cert) - This fix uses `x509.SystemCertPool()` instead, so the custom CA is **appended** to the system bundle rather than replacing it ## Why this broke cert-manager always includes `ca.crt` in TLS secrets (the issuing CA). When connecting to Temporal Cloud: 1. The controller sees `ca.crt` in the secret (the self-signed client CA) 2. `NewCertPool()` creates an empty pool with **only** that CA 3. Temporal Cloud's server cert is signed by a public CA (e.g., DigiCert) 4. The public CA is no longer trusted → `x509: certificate signed by unknown authority` ## What this fixes - `SystemCertPool()` loads the system CA bundle first, then appends the custom CA - Both public CAs (Temporal Cloud) and private CAs (self-hosted) are trusted simultaneously - Falls back to `NewCertPool()` with a warning log if the system pool can't be loaded ## Affected versions - v1.2.1, v1.2.2, v1.2.3 — all contain the regression from PR #212 - Closes #223 ## Test plan - [ ] Deploy against Temporal Cloud with cert-manager mTLS secret (has `ca.crt`) — verify connection succeeds - [ ] Deploy against self-hosted Temporal with private CA — verify connection succeeds - [ ] Deploy with mTLS secret without `ca.crt` — verify fallback to system bundle works 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Shivs11
added a commit
that referenced
this pull request
Mar 20, 2026
## Summary - PR #212 introduced `ca.crt` support for server certificate verification but used `x509.NewCertPool()`, which creates an **empty** CA pool — replacing the system CA bundle entirely - This breaks connections to Temporal Cloud (public CA) when the mTLS secret contains a `ca.crt` key from cert-manager (the CA that signed the **client** cert, not the server cert) - This fix uses `x509.SystemCertPool()` instead, so the custom CA is **appended** to the system bundle rather than replacing it ## Why this broke cert-manager always includes `ca.crt` in TLS secrets (the issuing CA). When connecting to Temporal Cloud: 1. The controller sees `ca.crt` in the secret (the self-signed client CA) 2. `NewCertPool()` creates an empty pool with **only** that CA 3. Temporal Cloud's server cert is signed by a public CA (e.g., DigiCert) 4. The public CA is no longer trusted → `x509: certificate signed by unknown authority` ## What this fixes - `SystemCertPool()` loads the system CA bundle first, then appends the custom CA - Both public CAs (Temporal Cloud) and private CAs (self-hosted) are trusted simultaneously - Falls back to `NewCertPool()` with a warning log if the system pool can't be loaded ## Affected versions - v1.2.1, v1.2.2, v1.2.3 — all contain the regression from PR #212 - Closes #223 ## Test plan - [ ] Deploy against Temporal Cloud with cert-manager mTLS secret (has `ca.crt`) — verify connection succeeds - [ ] Deploy against self-hosted Temporal with private CA — verify connection succeeds - [ ] Deploy with mTLS secret without `ca.crt` — verify fallback to system bundle works 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
carlydf
added a commit
that referenced
this pull request
Mar 21, 2026
Introduces clientpool_test.go with 8 tests that directly cover the two regression patterns that slipped through CI: - TestFetchMTLS_CACertAppendsToSystemPool: regression test for PR #227 (empty cert pool replacing system CAs). Injects a fake system pool and verifies both injected system CAs and the custom ca.crt are trusted. - TestDialAndUpsert_APIKeySkipsCheckHealth: regression test for PR #232 (CheckHealth called unconditionally). Verifies CheckHealth is not invoked when AuthModeAPIKey is used. To make both code paths injectable in tests, ClientPool gains two new function fields (dialFn, systemCertPoolFn) wired to sdkclient.Dial and x509.SystemCertPool by default — no behavior change in production. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4 tasks
carlydf
added a commit
that referenced
this pull request
Mar 21, 2026
Introduces clientpool_test.go with 8 tests that directly cover the two regression patterns that slipped through CI: - TestFetchMTLS_CACertAppendsToSystemPool: regression test for PR #227 (empty cert pool replacing system CAs). Injects a fake system pool and verifies both injected system CAs and the custom ca.crt are trusted. - TestDialAndUpsert_APIKeySkipsCheckHealth: regression test for PR #232 (CheckHealth called unconditionally). Verifies CheckHealth is not invoked when AuthModeAPIKey is used. To make both code paths injectable in tests, ClientPool gains two new function fields (dialFn, systemCertPoolFn) wired to sdkclient.Dial and x509.SystemCertPool by default — no behavior change in production. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
carlydf
added a commit
that referenced
this pull request
Mar 23, 2026
## Summary - Adds `clientpool_test.go` with 8 unit tests covering the auth code paths that had no test coverage - Two tests are explicit regression guards for the bugs fixed in #227 and #232 - Makes `dialFn` and `systemCertPoolFn` injectable on `ClientPool` (no behavior change in production) to enable testing without network I/O or OS trust store dependencies ## Regression tests **`TestFetchMTLS_CACertAppendsToSystemPool`** — guards against the PR #212 bug (fixed in #227): `fetchClientUsingMTLSSecret` used `x509.NewCertPool()` (empty) instead of `x509.SystemCertPool()`, silently dropping system root CAs and breaking Temporal Cloud connections. The test injects a fake system pool and verifies both the injected system CAs and the custom `ca.crt` are present in the returned pool. This test fails if the fix is reverted. **`TestDialAndUpsert_APIKeySkipsCheckHealth`** — guards against the PR #203 bug (fixed in #232): `DialAndUpsertClient` called `CheckHealth` unconditionally, which fails on Temporal Cloud with namespace-scoped API keys. The test uses an injected mock client and asserts `CheckHealth` is never called for `AuthModeAPIKey`. This test fails if the fix is reverted. ## Test plan - [x] `go test ./internal/controller/clientpool/... -v` — all 8 tests pass - [x] `go build ./...` — no compilation errors - [x] Manually revert the PR #227 fix → `TestFetchMTLS_CACertAppendsToSystemPool` fails - [x] Manually revert the PR #232 fix → `TestDialAndUpsert_APIKeySkipsCheckHealth` fails 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
shashwatsuri
pushed a commit
to shashwatsuri/temporal-worker-controller
that referenced
this pull request
Apr 28, 2026
…mporalio#227) ## Summary - PR temporalio#212 introduced `ca.crt` support for server certificate verification but used `x509.NewCertPool()`, which creates an **empty** CA pool — replacing the system CA bundle entirely - This breaks connections to Temporal Cloud (public CA) when the mTLS secret contains a `ca.crt` key from cert-manager (the CA that signed the **client** cert, not the server cert) - This fix uses `x509.SystemCertPool()` instead, so the custom CA is **appended** to the system bundle rather than replacing it ## Why this broke cert-manager always includes `ca.crt` in TLS secrets (the issuing CA). When connecting to Temporal Cloud: 1. The controller sees `ca.crt` in the secret (the self-signed client CA) 2. `NewCertPool()` creates an empty pool with **only** that CA 3. Temporal Cloud's server cert is signed by a public CA (e.g., DigiCert) 4. The public CA is no longer trusted → `x509: certificate signed by unknown authority` ## What this fixes - `SystemCertPool()` loads the system CA bundle first, then appends the custom CA - Both public CAs (Temporal Cloud) and private CAs (self-hosted) are trusted simultaneously - Falls back to `NewCertPool()` with a warning log if the system pool can't be loaded ## Affected versions - v1.2.1, v1.2.2, v1.2.3 — all contain the regression from PR temporalio#212 - Closes temporalio#223 ## Test plan - [ ] Deploy against Temporal Cloud with cert-manager mTLS secret (has `ca.crt`) — verify connection succeeds - [ ] Deploy against self-hosted Temporal with private CA — verify connection succeeds - [ ] Deploy with mTLS secret without `ca.crt` — verify fallback to system bundle works 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
shashwatsuri
pushed a commit
to shashwatsuri/temporal-worker-controller
that referenced
this pull request
Apr 28, 2026
## Summary - Adds `clientpool_test.go` with 8 unit tests covering the auth code paths that had no test coverage - Two tests are explicit regression guards for the bugs fixed in temporalio#227 and temporalio#232 - Makes `dialFn` and `systemCertPoolFn` injectable on `ClientPool` (no behavior change in production) to enable testing without network I/O or OS trust store dependencies ## Regression tests **`TestFetchMTLS_CACertAppendsToSystemPool`** — guards against the PR temporalio#212 bug (fixed in temporalio#227): `fetchClientUsingMTLSSecret` used `x509.NewCertPool()` (empty) instead of `x509.SystemCertPool()`, silently dropping system root CAs and breaking Temporal Cloud connections. The test injects a fake system pool and verifies both the injected system CAs and the custom `ca.crt` are present in the returned pool. This test fails if the fix is reverted. **`TestDialAndUpsert_APIKeySkipsCheckHealth`** — guards against the PR temporalio#203 bug (fixed in temporalio#232): `DialAndUpsertClient` called `CheckHealth` unconditionally, which fails on Temporal Cloud with namespace-scoped API keys. The test uses an injected mock client and asserts `CheckHealth` is never called for `AuthModeAPIKey`. This test fails if the fix is reverted. ## Test plan - [x] `go test ./internal/controller/clientpool/... -v` — all 8 tests pass - [x] `go build ./...` — no compilation errors - [x] Manually revert the PR temporalio#227 fix → `TestFetchMTLS_CACertAppendsToSystemPool` fails - [x] Manually revert the PR temporalio#232 fix → `TestDialAndUpsert_APIKeySkipsCheckHealth` fails 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ca.crtsupport for server certificate verification but usedx509.NewCertPool(), which creates an empty CA pool — replacing the system CA bundle entirelyca.crtkey from cert-manager (the CA that signed the client cert, not the server cert)x509.SystemCertPool()instead, so the custom CA is appended to the system bundle rather than replacing itWhy this broke
cert-manager always includes
ca.crtin TLS secrets (the issuing CA). When connecting to Temporal Cloud:ca.crtin the secret (the self-signed client CA)NewCertPool()creates an empty pool with only that CAx509: certificate signed by unknown authorityWhat this fixes
SystemCertPool()loads the system CA bundle first, then appends the custom CANewCertPool()with a warning log if the system pool can't be loadedAffected versions
Test plan
ca.crt) — verify connection succeedsca.crt— verify fallback to system bundle works🤖 Generated with Claude Code