K8SPSMDB-1606: Add CA key persistence for manual TLS to support safe cert re-signing on SAN changes#2277
Conversation
| if cr.CompareVersion("1.17.0") < 0 { | ||
| secretObj.Labels = nil | ||
| caLabels = nil | ||
| } |
There was a problem hiding this comment.
this condition can be removed
There was a problem hiding this comment.
Removed the condition - 0a1aa64
It seems like this line slipped in by accident.
| if cr.CompareVersion("1.17.0") < 0 { | ||
| secretLabels = nil | ||
| } |
There was a problem hiding this comment.
we don't need this condition
There was a problem hiding this comment.
Removed the condition - 0a1aa64
It seems like this line slipped in by accident.
gkech
left a comment
There was a problem hiding this comment.
approving, but minor comment for future consideration
| } | ||
|
|
||
| // GetSANsFromCert extracts DNS SANs from a PEM-encoded TLS certificate. | ||
| func GetSANsFromCert(tlsCertPEM []byte) ([]string, error) { |
There was a problem hiding this comment.
Got confused by this function name. Either rename to GetDNSNamesFromCert or return all SAN types
The function returns only cert.DNSNames, not all SAN types. Renaming to match actual behavior per PR review feedback.
| } | ||
|
|
||
| // needsManualSSLUpdate checks if the TLS certificate SANs differ from the expected SANs. | ||
| func (r *ReconcilePerconaServerMongoDB) needsManualSSLUpdate(ctx context.Context, cr *api.PerconaServerMongoDB, sslSecret *corev1.Secret) bool { |
There was a problem hiding this comment.
@mayankshah1607 Thanks. Used ctx to add the missing error log below — that resolves your other comment too. Fixed in f502c50
| currentSANs, err := tls.GetDNSNamesFromCert(sslSecret.Data["tls.crt"]) | ||
| if err != nil { | ||
| return errors.Wrap(err, "create TLS internal secret") | ||
| return false |
There was a problem hiding this comment.
If we are swallowing the error, we should at least log it here (in that case we will need the context logger)
There was a problem hiding this comment.
@mayankshah1607 Thanks. Added an error log via the context logger. Kept the return false since the caller treats this as advisory and retrying wouldn't recover a corrupt cert. Fixed in f502c50
commit: 1aff7fd |
|
@myJamong thank you for the contribution |
CHANGE DESCRIPTION
Problem:
When cert-manager is not installed, the operator's manual TLS path has two issues: (1) it never detects SAN changes (e.g., when splitHorizons are added), because it skips reconciliation if TLS secrets already exist, and (2) if secrets are manually deleted to force regeneration, a completely new CA is generated without merging with the old one, causing TLS verification failures during SmartUpdate rolling restarts.
I also made an issue here: #2278
Cause:
The manual TLS code path (createSSLManually) only creates secrets when they don't exist and returns immediately if they do — there is no SAN change detection. Additionally, each call to tls.Issue() generates an independent CA, meaning ssl and ssl-internal use different CAs, and any regeneration produces a CA that existing pods cannot trust.
Solution:
Introduce a persistent CA secret ({name}-ca-cert) for manual TLS management, mirroring the cert-manager CA secret structure. The CA key is preserved so that when SANs change (e.g., splitHorizon additions), the operator re-signs TLS certificates using the same CA — no CA merge is needed and rolling restarts are safe. Key changes:
CHECKLIST
Jira
Needs Doc) and QA (Needs QA)?Tests
compare/*-oc.yml)?Config/Logging/Testability