Schema Job should complete before server pods start during helm upgrade

### Is your feature request related to a problem? Please describe.

During `helm upgrade` from chart 0.x to 1.0.0 (Temporal Server 1.29.x → 1.30.x), the schema migration Job and server Deployments are applied simultaneously. If the visibility schema migration involves slow DDL operations (e.g., `ALTER TABLE ... ADD COLUMN ... GENERATED ALWAYS AS ... STORED` on a large `executions_visibility` table), the new server pods start before the schema is ready, fail the schema version check, and enter `CrashLoopBackOff`.

Combined with Kubernetes' default `maxUnavailable: 25%` on single-replica Deployments (which rounds up to 1), the old server pods are terminated before the new ones are healthy, resulting in **full downtime** for the duration of the schema migration — which can take over an hour on large tables.

This contradicts the [official upgrade documentation](https://docs.temporal.io/self-hosted-guide/upgrade-server), which recommends:

> "Before initiating the Temporal Server upgrade, use one of the recommended upgrade tools to update your database schema."

The Helm chart does not enforce this ordering.

### Our experience

Upgrading from 1.29.2 to 1.30.3 (chart 1.0.0) required migrating the visibility schema from v1.9 to v1.13. On our staging PostgreSQL database, this took **~2 hours** due to multiple `ALTER TABLE ... ADD COLUMN ... GENERATED ALWAYS AS ... STORED` statements (which rewrite the entire table in PostgreSQL) and `CREATE INDEX` operations across v1.10–v1.13.

During this entire period, all Temporal server pods (frontend, history, matching, worker) were down.

### Describe the solution you'd like

Ensure the schema migration Job completes **before** the server Deployment pods are rolled out. Some possible approaches:

1. **Helm `pre-upgrade` hook**: Add `helm.sh/hook: pre-upgrade` annotation to the schema Job so Helm waits for it to complete before applying the server Deployment changes.

2. **Init container on server pods**: Add an init container to the server Deployment that polls the schema version and blocks until it matches the expected version.

3. **Documentation**: At minimum, document that users with large visibility tables should set `manageSchema: false`, run the schema migration manually while the old server is still running, and only then upgrade the server image — to match the recommended order in the server upgrade docs.

### Additional context

- PostgreSQL `ALTER TABLE ... ADD COLUMN ... STORED` rewrites the entire table, making it especially slow on large `executions_visibility` tables.
- The v1.10–v1.13 visibility schema adds ~17 generated columns and ~17 indexes total.
- Related: #695 (requested decoupling schema jobs from server deployments, now closed).
- The workaround we used for production: pin server/web/admintools image tags to the old version in `values.yaml`, let the chart upgrade deploy the schema Job with the new admin-tools image, wait for migration to complete, then remove the image pins in a second deploy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema Job should complete before server pods start during helm upgrade #890

Is your feature request related to a problem? Please describe.

Our experience

Describe the solution you'd like

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Schema Job should complete before server pods start during helm upgrade #890

Description

Is your feature request related to a problem? Please describe.

Our experience

Describe the solution you'd like

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions