You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
During helm upgrade from chart 0.x to 1.0.0 (Temporal Server 1.29.x → 1.30.x), the schema migration Job and server Deployments are applied simultaneously. If the visibility schema migration involves slow DDL operations (e.g., ALTER TABLE ... ADD COLUMN ... GENERATED ALWAYS AS ... STORED on a large executions_visibility table), the new server pods start before the schema is ready, fail the schema version check, and enter CrashLoopBackOff.
Combined with Kubernetes' default maxUnavailable: 25% on single-replica Deployments (which rounds up to 1), the old server pods are terminated before the new ones are healthy, resulting in full downtime for the duration of the schema migration — which can take over an hour on large tables.
"Before initiating the Temporal Server upgrade, use one of the recommended upgrade tools to update your database schema."
The Helm chart does not enforce this ordering.
Our experience
Upgrading from 1.29.2 to 1.30.3 (chart 1.0.0) required migrating the visibility schema from v1.9 to v1.13. On our staging PostgreSQL database, this took ~2 hours due to multiple ALTER TABLE ... ADD COLUMN ... GENERATED ALWAYS AS ... STORED statements (which rewrite the entire table in PostgreSQL) and CREATE INDEX operations across v1.10–v1.13.
During this entire period, all Temporal server pods (frontend, history, matching, worker) were down.
Describe the solution you'd like
Ensure the schema migration Job completes before the server Deployment pods are rolled out. Some possible approaches:
Helm pre-upgrade hook: Add helm.sh/hook: pre-upgrade annotation to the schema Job so Helm waits for it to complete before applying the server Deployment changes.
Init container on server pods: Add an init container to the server Deployment that polls the schema version and blocks until it matches the expected version.
Documentation: At minimum, document that users with large visibility tables should set manageSchema: false, run the schema migration manually while the old server is still running, and only then upgrade the server image — to match the recommended order in the server upgrade docs.
Additional context
PostgreSQL ALTER TABLE ... ADD COLUMN ... STORED rewrites the entire table, making it especially slow on large executions_visibility tables.
The v1.10–v1.13 visibility schema adds ~17 generated columns and ~17 indexes total.
The workaround we used for production: pin server/web/admintools image tags to the old version in values.yaml, let the chart upgrade deploy the schema Job with the new admin-tools image, wait for migration to complete, then remove the image pins in a second deploy.
Is your feature request related to a problem? Please describe.
During
helm upgradefrom chart 0.x to 1.0.0 (Temporal Server 1.29.x → 1.30.x), the schema migration Job and server Deployments are applied simultaneously. If the visibility schema migration involves slow DDL operations (e.g.,ALTER TABLE ... ADD COLUMN ... GENERATED ALWAYS AS ... STOREDon a largeexecutions_visibilitytable), the new server pods start before the schema is ready, fail the schema version check, and enterCrashLoopBackOff.Combined with Kubernetes' default
maxUnavailable: 25%on single-replica Deployments (which rounds up to 1), the old server pods are terminated before the new ones are healthy, resulting in full downtime for the duration of the schema migration — which can take over an hour on large tables.This contradicts the official upgrade documentation, which recommends:
The Helm chart does not enforce this ordering.
Our experience
Upgrading from 1.29.2 to 1.30.3 (chart 1.0.0) required migrating the visibility schema from v1.9 to v1.13. On our staging PostgreSQL database, this took ~2 hours due to multiple
ALTER TABLE ... ADD COLUMN ... GENERATED ALWAYS AS ... STOREDstatements (which rewrite the entire table in PostgreSQL) andCREATE INDEXoperations across v1.10–v1.13.During this entire period, all Temporal server pods (frontend, history, matching, worker) were down.
Describe the solution you'd like
Ensure the schema migration Job completes before the server Deployment pods are rolled out. Some possible approaches:
Helm
pre-upgradehook: Addhelm.sh/hook: pre-upgradeannotation to the schema Job so Helm waits for it to complete before applying the server Deployment changes.Init container on server pods: Add an init container to the server Deployment that polls the schema version and blocks until it matches the expected version.
Documentation: At minimum, document that users with large visibility tables should set
manageSchema: false, run the schema migration manually while the old server is still running, and only then upgrade the server image — to match the recommended order in the server upgrade docs.Additional context
ALTER TABLE ... ADD COLUMN ... STOREDrewrites the entire table, making it especially slow on largeexecutions_visibilitytables.Values.server.enabled#695 (requested decoupling schema jobs from server deployments, now closed).values.yaml, let the chart upgrade deploy the schema Job with the new admin-tools image, wait for migration to complete, then remove the image pins in a second deploy.