-
Notifications
You must be signed in to change notification settings - Fork 246
MGMT-16090: upgrade assisted-service postgresql from 12 to 13 #8602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,160 @@ | ||
| # PostgreSQL Major Version Upgrade | ||
|
|
||
| This document describes how assisted-service handles PostgreSQL major version upgrades in the kube-api (MCE/ACM) deployment mode. | ||
|
|
||
| ## Overview | ||
|
|
||
| PostgreSQL major version upgrades require data migration because the on-disk format changes between versions. The assisted-service leverages the [sclorg postgresql-container](https://github.com/sclorg/postgresql-container) built-in upgrade mechanism via the `POSTGRESQL_UPGRADE` environment variable. | ||
|
|
||
| ## The Problem | ||
|
|
||
| The sclorg containers support `POSTGRESQL_UPGRADE=hardlink` to trigger `pg_upgrade`, but this setting **cannot be set permanently**. The sclorg container intentionally fails when `POSTGRESQL_UPGRADE` is set but versions already match - this is a safety mechanism to prevent users from leaving it enabled. | ||
|
|
||
| ## Our Solution: Conditional Upgrade | ||
|
|
||
| We use a wrapper script (`internal/controller/controllers/postgres_startup.sh`) embedded via `//go:embed` that conditionally sets `POSTGRESQL_UPGRADE=hardlink` only when a version mismatch is detected. | ||
|
|
||
| The script: | ||
| 1. Checks if `PG_VERSION` file exists in the data directory | ||
| 2. Compares data version with container's `POSTGRESQL_VERSION` env var | ||
| 3. Sets `POSTGRESQL_UPGRADE=hardlink` only when versions differ | ||
| 4. Calls `run-postgresql` to start the database | ||
|
|
||
| This handles all scenarios correctly: | ||
| - **Fresh install**: No data → normal initialization | ||
| - **Restart (same version)**: Versions match → normal startup | ||
| - **Upgrade (version mismatch)**: Versions differ → enables pg_upgrade | ||
|
|
||
| ## How pg_upgrade Works | ||
|
|
||
| When `POSTGRESQL_UPGRADE=hardlink` is set and versions differ: | ||
|
|
||
| 1. **Detect Version Mismatch**: The sclorg `run-postgresql` script reads `PG_VERSION` from the data directory | ||
| 2. **Validate Source Version**: Checks that the data version matches `POSTGRESQL_PREV_VERSION` (e.g., PG13 image requires PG12 data) | ||
| 3. **Run pg_upgrade**: Executes `pg_upgrade --link` to upgrade the data in-place using hardlinks | ||
| 4. **Start PostgreSQL**: Normal postgres startup with upgraded data | ||
|
|
||
| ### sclorg Environment Variables | ||
|
|
||
| The sclorg container images define these environment variables (baked into each image): | ||
|
|
||
| | Variable | Description | Example | | ||
| |----------|-------------|---------| | ||
| | `POSTGRESQL_VERSION` | Current PostgreSQL version | `13` | | ||
| | `POSTGRESQL_PREV_VERSION` | Previous version this image can upgrade from | `12` | | ||
|
|
||
| You can verify these by inspecting the container: | ||
| ```bash | ||
| podman run --rm quay.io/sclorg/postgresql-13-c9s:latest env | grep POSTGRESQL | ||
| # POSTGRESQL_VERSION=13 | ||
| # POSTGRESQL_PREV_VERSION=12 | ||
| ``` | ||
|
|
||
| ### Hardlink Mode | ||
|
|
||
| The `--link` flag tells `pg_upgrade` to create hardlinks instead of copying files: | ||
|
|
||
| - **Fast**: Completes in seconds regardless of database size | ||
| - **No Extra Storage**: Hardlinks share the same disk blocks as original files | ||
| - **Near-Atomic**: Hardlink creation is an atomic filesystem operation | ||
|
|
||
| ## Preserving Events and Logs | ||
|
|
||
| If you need to ensure 100% preservation of events and logs, snapshot your database PVC before upgrading: | ||
|
|
||
| ```bash | ||
| # Example: snapshot the PVC before MCE upgrade | ||
| kubectl get pvc postgres -n multicluster-engine -o yaml > postgres-pvc-backup.yaml | ||
| # Or use your storage class's snapshot feature if available | ||
| ``` | ||
|
|
||
| ## Failure Handling | ||
|
|
||
| If the upgrade fails: | ||
|
|
||
| 1. The postgres container crashes | ||
| 2. Pod goes into `CrashLoopBackOff` | ||
| 3. Logs show the error from sclorg/pg_upgrade | ||
| 4. Manual investigation and recovery required | ||
|
|
||
| ### Recovery Options | ||
|
|
||
| If upgrade fails and data is unrecoverable: | ||
|
|
||
| ```bash | ||
| # 1. Check what went wrong | ||
| kubectl logs <pod-name> -c postgres -n multicluster-engine | ||
|
|
||
| # 2. If data is corrupt, delete the PVC to start fresh | ||
| kubectl delete pvc postgres-assisted-service -n multicluster-engine | ||
|
|
||
| # 3. Delete pod to force restart | ||
| kubectl delete pod <pod-name> -n multicluster-engine | ||
|
|
||
| # 4. New pod starts with fresh DB, controllers reconcile from CRs | ||
| ``` | ||
|
|
||
| Data loss on recovery: | ||
|
|
||
| | Data | Source | Recovery | | ||
| |------|--------|----------| | ||
| | Clusters | AgentClusterInstall CR | Reconciled from etcd | | ||
| | Hosts | Agent CR | Reconciled from etcd | | ||
| | InfraEnvs | InfraEnv CR | Reconciled from etcd | | ||
| | **Events** | PostgreSQL only | **Lost** | | ||
| | **Logs metadata** | PostgreSQL only | **Lost** | | ||
|
|
||
| ## Upgrade Path | ||
|
|
||
| PostgreSQL container images from [sclorg](https://github.com/sclorg/postgresql-container) include binaries for the previous major version, enabling single-step upgrades. Each image only supports upgrading from one specific previous version (`POSTGRESQL_PREV_VERSION`). | ||
|
|
||
| ### Available Images and Supported Upgrades | ||
|
|
||
| | Image | PG Version | Upgrades From | Base OS | | ||
| |-------|------------|---------------|---------| | ||
| | postgresql-12-c8s | 12 | 10 | RHEL 8 | | ||
| | postgresql-13-c8s | 13 | 12 | RHEL 8 | | ||
| | postgresql-13-c9s | 13 | 12 | RHEL 9 | | ||
| | postgresql-15-c9s | 15 | 13 | RHEL 9 | | ||
| | postgresql-16-c9s | 16 | 15 | RHEL 9 | | ||
| | postgresql-17-c9s | 17 | 16 | RHEL 9 | | ||
|
|
||
| Note: Upgrading from `postgresql-12-c8s` (RHEL 8) to `postgresql-13-c9s` (RHEL 9) is supported. See [Red Hat's fast upgrade documentation](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/configuring_and_using_database_servers/using-postgresql_configuring-and-using-database-servers#fast-upgrade-using-the-pg_upgrade-tool_migrating-to-a-rhel-9-version-of-postgresql). | ||
|
|
||
| ## How to Upgrade PostgreSQL Version | ||
|
|
||
| To upgrade to a new PostgreSQL version: | ||
|
|
||
| 1. Update `internal/controller/controllers/images.go` with the new image | ||
| 2. Update `deploy/olm-catalog/manifests/assisted-service-operator.clusterserviceversion.yaml`: | ||
| - Update `DATABASE_IMAGE` env var | ||
| - Update `relatedImages` section | ||
| 3. Update backplane-operator: | ||
| - `hack/bundle-automation/config.yaml` - image mapping | ||
| - `pkg/templates/charts/toggle/assisted-service/values.yaml` | ||
| - `pkg/templates/charts/toggle/assisted-service/templates/infrastructure-operator.yaml` | ||
|
|
||
| The wrapper script automatically detects version mismatches and triggers `pg_upgrade` when needed. | ||
|
|
||
| ## Deployment Strategy | ||
|
|
||
| The assisted-service deployment uses `Recreate` strategy (not `RollingUpdate`): | ||
|
|
||
| ```go | ||
| deploymentStrategy := appsv1.DeploymentStrategy{ | ||
| Type: appsv1.RecreateDeploymentStrategyType, | ||
| } | ||
| ``` | ||
|
|
||
| This ensures the old pod releases the PVC before the new pod starts, preventing deadlocks. | ||
|
|
||
| ## Version Skip Protection | ||
|
|
||
| The sclorg container validates that the source data version matches `POSTGRESQL_PREV_VERSION`. If a customer tries to skip versions (e.g., PG10 → PG13), the container fails with a clear error: | ||
|
|
||
| ``` | ||
| With this container image you can only upgrade from data directory | ||
| of version '12', not '10'. | ||
| ``` | ||
|
|
||
|
Comment on lines
+155
to
+159
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add language identifier to fenced code block. The fenced code block is missing a language identifier, which causes a linting warning and reduces syntax highlighting. 🔎 Proposed fix-```
+```text
With this container image you can only upgrade from data directory
of version '12', not '10'.🧰 Tools🪛 markdownlint-cli2 (0.18.1)155-155: Fenced code blocks should have a language specified (MD040, fenced-code-language) 🤖 Prompt for AI Agents |
||
| This prevents accidental data corruption from unsupported upgrade paths. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| #!/bin/bash | ||
| # postgres_startup.sh - Wrapper script for PostgreSQL container startup | ||
| # | ||
| # This script checks if a PostgreSQL major version upgrade is needed before starting | ||
| # the database. It compares the data directory version (PG_VERSION) with the container's | ||
| # PostgreSQL version and enables pg_upgrade only when necessary. | ||
| # | ||
| # See docs/dev/postgresql-upgrade.md for details. | ||
|
|
||
| set -e | ||
|
|
||
| PGDATA=/var/lib/pgsql/data/userdata | ||
|
|
||
| echo "=== PostgreSQL Startup Check ===" | ||
|
|
||
| if [ -f "$PGDATA/PG_VERSION" ]; then | ||
| DATA_VERSION=$(cat "$PGDATA/PG_VERSION") | ||
| echo "Data directory version: $DATA_VERSION" | ||
| echo "Container image version: $POSTGRESQL_VERSION" | ||
|
|
||
| if [ "$DATA_VERSION" != "$POSTGRESQL_VERSION" ]; then | ||
| echo "Version mismatch detected - enabling pg_upgrade (hardlink mode)" | ||
| export POSTGRESQL_UPGRADE=hardlink | ||
| else | ||
| echo "Versions match - normal startup" | ||
| fi | ||
| else | ||
| echo "No existing data directory - fresh initialization" | ||
| fi | ||
|
|
||
| exec run-postgresql |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| package controllers | ||
|
|
||
| import _ "embed" | ||
|
|
||
| // PostgresStartupScript is a wrapper script that conditionally enables pg_upgrade | ||
| // only when a PostgreSQL major version upgrade is detected. This avoids the issue | ||
| // where setting POSTGRESQL_UPGRADE=hardlink permanently causes container startup | ||
| // failures on normal restarts (when versions already match). | ||
| // | ||
| // See docs/dev/postgresql-upgrade.md for details. | ||
| // | ||
| //go:embed postgres_startup.sh | ||
| var PostgresStartupScript string |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
quay.io/sclorg/postgresql-13-c9s latest image registry💡 Result:
The sclorg CentOS-stream images live on Quay under quay.io/sclorg. The repo for PostgreSQL is maintained here and CentOS‑Stream‑9 images include postgresql-13 (tag suffix c9s). See the project README. [1]
Note: upstream notes state the project removed some "latest" tags in favor of explicit c9s tags (so quay.io/sclorg/postgresql-13-c9s:latest may not be present). Use the explicit tag (c9s) or check the repository on Quay to confirm available tags. Example pull commands:
Sources:
[1] sclorg/postgresql-container README (Quay repos described).
[2] sclorg/related discussion/announcements noting "latest" tag removal in favor of c9s.
🏁 Script executed:
Repository: openshift/assisted-service
Length of output: 81
🏁 Script executed:
# Check the manager.yaml file for context cat -n config/manager/manager.yamlRepository: openshift/assisted-service
Length of output: 12515
Replace
:latesttag with explicit:c9stag.The
quay.io/sclorg/postgresql-13-c9s:latestreference is incorrect. The sclorg project removed "latest" tags in favor of explicit version tags. Usequay.io/sclorg/postgresql-13-c9s:c9sinstead.🤖 Prompt for AI Agents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this image does exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@omer-vishlitzky Thank you for verifying! You're absolutely right—the image exists and can be pulled successfully. I apologize for the incorrect information in my review. The
quay.io/sclorg/postgresql-13-c9s:latesttag is valid and your change is correct. No action needed here.