Expected Behavior
We want to upgrade from Temporal 1.29.3 to 1.30.4. We expect this to be possible without downtime.
We are using Temporal self-hosted with Postgres.
Actual Behavior
The release contains this migration: https://github.com/temporalio/temporal/blob/main/schema/postgresql/v12/visibility/versioned/v1.10/add_temporal_reported_columns.sql
Our executions_visibility table has 35 million rows. When I run this migration on a test instance (sized to match our production instance; 16 cores, 128GB memory), with no other traffic, it takes almost an hour:
ALTER TABLE executions_visibility ADD COLUMN TemporalReportedProblems JSONB GENERATED ALWAYS AS (search_attributes->'TemporalReportedProblems') STORED;
CREATE INDEX by_temporal_reported_problems ON executions_visibility USING GIN (namespace_id, TemporalReportedProblems jsonb_path_ops);
ALTER TABLE
Time: 3430898.163 ms (57:10.898)
CREATE INDEX
Time: 34220.298 ms (00:34.220)
While it's doing this, it also uses 50% of the CPU available to the DB.
In addition to the length this takes, this also obtains an exclusive lock on the table, meaning we'd have to take downtime to run this.
Steps to Reproduce the Problem
- Have a large (?) visibility table.
- Run the linked migration.
Other migrations
These all do the same thing (add a column GENERATED ALWAYS ... STORED) so would have the same problem for us - locking and rewriting the entire table.
Could we consider using triggers for these instead, so we don't need to rewrite at migration time? The search attributes don't appear to exist prior to the migration. We can do this ourselves manually if needed.
Expected Behavior
We want to upgrade from Temporal 1.29.3 to 1.30.4. We expect this to be possible without downtime.
We are using Temporal self-hosted with Postgres.
Actual Behavior
The release contains this migration: https://github.com/temporalio/temporal/blob/main/schema/postgresql/v12/visibility/versioned/v1.10/add_temporal_reported_columns.sql
Our
executions_visibilitytable has 35 million rows. When I run this migration on a test instance (sized to match our production instance; 16 cores, 128GB memory), with no other traffic, it takes almost an hour:While it's doing this, it also uses 50% of the CPU available to the DB.
In addition to the length this takes, this also obtains an exclusive lock on the table, meaning we'd have to take downtime to run this.
Steps to Reproduce the Problem
Other migrations
These all do the same thing (add a column
GENERATED ALWAYS ... STORED) so would have the same problem for us - locking and rewriting the entire table.Could we consider using triggers for these instead, so we don't need to rewrite at migration time? The search attributes don't appear to exist prior to the migration. We can do this ourselves manually if needed.