Skip to content

Feat: Add server readiness probe#825

Merged
robholland merged 8 commits intotemporalio:mainfrom
quangngotan95:feat/server-readiness-probe
Jan 8, 2026
Merged

Feat: Add server readiness probe#825
robholland merged 8 commits intotemporalio:mainfrom
quangngotan95:feat/server-readiness-probe

Conversation

@quangngotan95
Copy link
Copy Markdown
Contributor

What was changed

Adds configuration for server pods readinessProbe

Why?

At the moment we only have livenessProbe and no readinessProbe for server pods.
On startup pods are getting requests right away before being ready.

Checklist

  1. Closes [Bug] Missing readiness probes on web server #710

  2. How was this tested:

  • Helm template
  • Helm test
  • The changes had been running in our fork on production for few months.
  1. Any docs updates needed?
    N/A

At the moment we only have `livenessProbe` and no `readinessProbe` for
server pods.
On startup pods are getting requests right away before being ready.
This change adds configuration for that.
@quangngotan95 quangngotan95 requested a review from a team as a code owner January 8, 2026 10:52
Copy link
Copy Markdown
Contributor

@robholland robholland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think readiness probably only makes sense for the frontend(-internal) services. Other services don't receive incoming requests via a Kubernetes mechanism so readiness isn't really useful. For the frontend service there is a useful health check mechanism we can use over GRPC:

readinessProbe:
  grpc:
    port: <your-grpc-port>
    service: temporal.api.workflowservice.v1.WorkflowService

Note that we can't use a named port for GRPC probes for some reason, so it will need to be interpolated as the RPC port number is in the template.

@quangngotan95
Copy link
Copy Markdown
Contributor Author

@robholland so you're saying for example history pods don't use k8s readinessProbe for anything and temporal control history nodes internally? We're setting 300s initial delay for history readinessProbe as a way to slow down history pods deployment 🤔

@robholland
Copy link
Copy Markdown
Contributor

Yes history health is managed internally via the membership system (ringpop I think?). To slow down history deployments (which is a totally valid goal) I would use minReadySeconds on the deployment spec. Feel free to add that to this PR ;) Please do add the readinessprobe for frontend(-internal) though, I think that will be useful for people.

@quangngotan95
Copy link
Copy Markdown
Contributor Author

Got it. We used minReadySeconds before and the while it works for the deployment.
The problem is that PodDisruptionBudget does not respect that. Hence there'll still be multiple history pods recycling when one of our node goes down; that's why we switched to use this readinessProbe and it works for both cases.

Comment thread charts/temporal/templates/server-deployment.yaml Outdated
@robholland robholland merged commit 841a2a2 into temporalio:main Jan 8, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Missing readiness probes on web server

2 participants