TemporalConnection update propagation#102
Conversation
| Namespace: workerDeploy.Spec.WorkerOptions.TemporalNamespace, | ||
| HostPort: temporalConnection.Spec.HostPort, | ||
| Namespace: workerDeploy.Spec.WorkerOptions.TemporalNamespace, | ||
| MutualTLSSecret: temporalConnection.Spec.MutualTLSSecret, |
There was a problem hiding this comment.
@jlegrone - curious to hear your thoughts on this:
I added the name of the MutualTLSSecret as a key over here since a change to the secret name, for the same connectionSpec object, should trigger our worker controller to refresh the client it is using! This was required since I realized that there could be a world where the old secret is not expired but just replaced.
However, one area where this breaks is if someone just replaces the contents of an existing secret without changing the name. In this way, the controller does not really get a new client.
There was a problem hiding this comment.
Could you solve the problem of content changing without name changing by storing a hash of the name + contents instead of just the name?
Or, if that's inefficient because it means you have to read the secret contents all the time just to generate the map key, we could do a different solution for the contents changing, where if an error occurs when calling the temporal APIs that suggests "wrong credentials", we could try reloading them once just in case it's changed. Then, do a namespace describe to make sure they work, and if they work, restart all the workers.
I don't think this needs to be solved in this PR though, but if you don't solve the "changed content, not expired, same name" thing, do just create an issue about it.
There was a problem hiding this comment.
thanks for responding to this;
yes, I did think about the storing a hash of the name + contents idea but I decided against it because in one of my earlier PR's, we discussed that reading the contents of a secret on every reconciliation loop could be intensive.
having said that, I like the APIs that suggests "wrong credentials", we could try reloading them once just in case it's changed idea but don't think it should come in this PR. I can make a separate issue for this!
| Namespace: workerDeploy.Spec.WorkerOptions.TemporalNamespace, | ||
| HostPort: temporalConnection.Spec.HostPort, | ||
| Namespace: workerDeploy.Spec.WorkerOptions.TemporalNamespace, | ||
| MutualTLSSecret: temporalConnection.Spec.MutualTLSSecret, |
There was a problem hiding this comment.
Could you solve the problem of content changing without name changing by storing a hash of the name + contents instead of just the name?
Or, if that's inefficient because it means you have to read the secret contents all the time just to generate the map key, we could do a different solution for the contents changing, where if an error occurs when calling the temporal APIs that suggests "wrong credentials", we could try reloading them once just in case it's changed. Then, do a namespace describe to make sure they work, and if they work, restart all the workers.
I don't think this needs to be solved in this PR though, but if you don't solve the "changed content, not expired, same name" thing, do just create an issue about it.
|
|
||
| // If the connection spec hash has changed, update the deployment | ||
| currentHash := k8s.ComputeConnectionSpecHash(connection) | ||
| if currentHash != existingDeployment.Spec.Template.Annotations[k8s.ConnectionSpecHashAnnotation] { |
There was a problem hiding this comment.
is it possible for annotations to ever be nil? like if the user doesn't provide any or something. Could you make sure to handle the case where the user provides no initial annotations?
There was a problem hiding this comment.
Here is the code which builds pod annotations:
func ComputeConnectionSpecHash(connection temporaliov1alpha1.TemporalConnectionSpec) string {
// HostPort is required, but MutualTLSSecret can be empty for non-mTLS connections
if connection.HostPort == "" {
return ""
}
hasher := sha256.New()
// Hash connection spec fields in deterministic order
_, _ = hasher.Write([]byte(connection.HostPort))
_, _ = hasher.Write([]byte(connection.MutualTLSSecret))
return hex.EncodeToString(hasher.Sum(nil))
}
Right now, pod annotations will actually never be nil. The piece of code conducting the check connection.HostPort == "" is actually a sign of defensive programming. In the odd world where some client were to forget to specify the hostport in the TemporalConnection spec, the controller would error out earlier in the reconciliation loop.
Specifically, this would happen when it would try to create/fetch a client since having a nil hostport would force the controller to try creating a new SDK client trying to connect to nil temporal hostport. This shall fail!
<!--- Note to EXTERNAL Contributors --> <!-- Thanks for opening a PR! If it is a significant code change, please **make sure there is an open issue** for this. We work best with you when we have accepted the idea first before you code. --> <!--- For ALL Contributors 👇 --> ## What was changed - Handle updates to TemporalConnection, for TWD's that are associated with one, so that these updates are propagated deployments. - This was achieved by adding the hashed contents of a TemporalConnection as pod annotations. Thus, if a TemporalConnection were to get updated, it would result in new deployments with the updated annotation. The new updated secret would also be mounted so that the workers starting can be started with the latest secrets. ## Why? - core functionality ## Checklist <!--- add/delete as needed ---> 1. Closes <!-- add issue number here --> - temporalio#10 4. How was this tested: - Added unit tests and also tested this locally. The only thing I was not able to do was add an integration test since I was not able to configure our temporaltest server using mTLS. Here's how I tested this locally: - I ran a worker pod which was using the `temporal-cloud-mtls` secret to connect to Temporal. - I updated the namespace the worker was connected to by changing the mTLS secret. This resulted in the controller and the worker pods throwing connection errors (expected) - I updated the local secret in the local k8s cluster (and also changed it's name to `temporal-cloud-mtls-1` since we want our controller to also refresh the client it's using) - Noticed that a rolling deployment was conducted and everything stabalized. Proof of a rolling update: <img width="1056" height="58" alt="Screenshot 2025-08-07 at 5 41 34 PM" src="https://github.com/user-attachments/assets/40720118-0bb5-4f85-aa80-2c7c257b65f2" /> 5. Any docs updates needed? - None
What was changed
Why?
Checklist
Here's how I tested this locally:
temporal-cloud-mtlssecret to connect to Temporal.temporal-cloud-mtls-1since we want our controller to also refresh the client it's using)Proof of a rolling update:
