Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ DEBUG_SERVICE_PORT := $(or ${DEBUG_SERVICE_PORT},40000)
SERVICE := $(or ${SERVICE},${ASSISTED_ORG}/assisted-service:${ASSISTED_TAG})
IMAGE_SERVICE := $(or ${IMAGE_SERVICE},${ASSISTED_ORG}/assisted-image-service:${ASSISTED_TAG})
ASSISTED_UI := $(or ${ASSISTED_UI},${ASSISTED_ORG}/assisted-installer-ui:${ASSISTED_TAG})
PSQL_IMAGE := $(or ${PSQL_IMAGE},quay.io/sclorg/postgresql-12-c8s:latest)
PSQL_IMAGE := $(or ${PSQL_IMAGE},quay.io/sclorg/postgresql-13-c9s:latest)
BUNDLE_IMAGE := $(or ${BUNDLE_IMAGE},${ASSISTED_ORG}/assisted-service-operator-bundle:${ASSISTED_TAG})
INDEX_IMAGE := $(or ${INDEX_IMAGE},${ASSISTED_ORG}/assisted-service-index:${ASSISTED_TAG})

Expand Down
2 changes: 1 addition & 1 deletion config/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ spec:
- name: IMAGE_SERVICE_IMAGE
value: quay.io/edge-infrastructure/assisted-image-service:latest
- name: DATABASE_IMAGE
value: quay.io/sclorg/postgresql-12-c8s:latest
value: quay.io/sclorg/postgresql-13-c9s:latest
Comment on lines 37 to +38
Copy link

@coderabbitai coderabbitai bot Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

quay.io/sclorg/postgresql-13-c9s latest image registry

💡 Result:

The sclorg CentOS-stream images live on Quay under quay.io/sclorg. The repo for PostgreSQL is maintained here and CentOS‑Stream‑9 images include postgresql-13 (tag suffix c9s). See the project README. [1]

Note: upstream notes state the project removed some "latest" tags in favor of explicit c9s tags (so quay.io/sclorg/postgresql-13-c9s:latest may not be present). Use the explicit tag (c9s) or check the repository on Quay to confirm available tags. Example pull commands:

  • podman pull quay.io/sclorg/postgresql-13-c9s:c9s
  • podman pull quay.io/sclorg/postgresql-15-c9s:latest (example of current latest-tag usage). [1][2]

Sources:
[1] sclorg/postgresql-container README (Quay repos described).
[2] sclorg/related discussion/announcements noting "latest" tag removal in favor of c9s.


🏁 Script executed:

# Search for any PostgreSQL documentation or constraints in the repo
find . -type f \( -name "*.md" -o -name "*.yaml" -o -name "*.yml" \) | head -20 | xargs grep -l -i "postgresql\|database.*image" 2>/dev/null | head -10

Repository: openshift/assisted-service

Length of output: 81


🏁 Script executed:

# Check the manager.yaml file for context
cat -n config/manager/manager.yaml

Repository: openshift/assisted-service

Length of output: 12515


Replace :latest tag with explicit :c9s tag.

The quay.io/sclorg/postgresql-13-c9s:latest reference is incorrect. The sclorg project removed "latest" tags in favor of explicit version tags. Use quay.io/sclorg/postgresql-13-c9s:c9s instead.

🤖 Prompt for AI Agents
In config/manager/manager.yaml around lines 37 to 38, the DATABASE_IMAGE value
uses the removed :latest tag; update the image value to use the explicit tag by
replacing quay.io/sclorg/postgresql-13-c9s:latest with
quay.io/sclorg/postgresql-13-c9s:c9s so the manifest references the supported
explicit tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this image does exist.

❯ podman pull quay.io/sclorg/postgresql-13-c9s:latest
Trying to pull quay.io/sclorg/postgresql-13-c9s:latest...
Getting image source signatures
Copying blob 93d0d641f1b2 skipped: already exists
Copying blob 5dce88cbadbe done   |
Copying blob da80ec85690c done   |
Copying config 6975cf9771 done   |
Writing manifest to image destination
6975cf9771977fc69e643df3067ace9c5e0bd4656a36307be6c03be59020aa10

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@omer-vishlitzky Thank you for verifying! You're absolutely right—the image exists and can be pulled successfully. I apologize for the incorrect information in my review. The quay.io/sclorg/postgresql-13-c9s:latest tag is valid and your change is correct. No action needed here.

- name: AGENT_IMAGE
value: quay.io/edge-infrastructure/assisted-installer-agent:latest
- name: CONTROLLER_IMAGE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ spec:
name: controller
- image: quay.io/edge-infrastructure/assisted-image-service:latest
name: image-service
- image: quay.io/sclorg/postgresql-12-c8s:latest
- image: quay.io/sclorg/postgresql-13-c9s:latest
name: postgresql
- image: quay.io/edge-infrastructure/assisted-installer:latest
name: installer
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1145,7 +1145,7 @@ spec:
- name: IMAGE_SERVICE_IMAGE
value: quay.io/edge-infrastructure/assisted-image-service:latest
- name: DATABASE_IMAGE
value: quay.io/sclorg/postgresql-12-c8s:latest
value: quay.io/sclorg/postgresql-13-c9s:latest
- name: AGENT_IMAGE
value: quay.io/edge-infrastructure/assisted-installer-agent:latest
- name: CONTROLLER_IMAGE
Expand Down Expand Up @@ -1284,7 +1284,7 @@ spec:
name: controller
- image: quay.io/edge-infrastructure/assisted-image-service:latest
name: image-service
- image: quay.io/sclorg/postgresql-12-c8s:latest
- image: quay.io/sclorg/postgresql-13-c9s:latest
name: postgresql
- image: quay.io/edge-infrastructure/assisted-installer:latest
name: installer
Expand Down
2 changes: 1 addition & 1 deletion deploy/podman/pod-persistent-disconnected.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ spec:
containers:
- args:
- run-postgresql
image: quay.io/sclorg/postgresql-12-c8s:latest
image: quay.io/sclorg/postgresql-13-c9s:latest
name: db
envFrom:
- configMapRef:
Expand Down
2 changes: 1 addition & 1 deletion deploy/podman/pod-persistent.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ spec:
containers:
- args:
- run-postgresql
image: quay.io/sclorg/postgresql-12-c8s:latest
image: quay.io/sclorg/postgresql-13-c9s:latest
name: db
envFrom:
- configMapRef:
Expand Down
2 changes: 1 addition & 1 deletion deploy/podman/pod.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ spec:
containers:
- args:
- run-postgresql
image: quay.io/sclorg/postgresql-12-c8s:latest
image: quay.io/sclorg/postgresql-13-c9s:latest
name: db
envFrom:
- configMapRef:
Expand Down
2 changes: 1 addition & 1 deletion deploy/podman/pod_tls.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ spec:
containers:
- args:
- run-postgresql
image: quay.io/sclorg/postgresql-12-c8s:latest
image: quay.io/sclorg/postgresql-13-c9s:latest
name: db
envFrom:
- configMapRef:
Expand Down
2 changes: 1 addition & 1 deletion deploy/postgres/postgres-deployment-ephemeral.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ spec:
spec:
containers:
- name: postgres
image: quay.io/sclorg/postgresql-12-c8s
image: quay.io/sclorg/postgresql-13-c9s
imagePullPolicy: "IfNotPresent"
ports:
- containerPort: 5432
Expand Down
2 changes: 1 addition & 1 deletion deploy/postgres/postgres-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ spec:
spec:
containers:
- name: postgres
image: quay.io/sclorg/postgresql-12-c8s
image: quay.io/sclorg/postgresql-13-c9s
imagePullPolicy: "IfNotPresent"
ports:
- containerPort: 5432
Expand Down
2 changes: 1 addition & 1 deletion docs/dev/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ $ podman ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
487e6c1bdb9a localhost/podman-pause:4.9.3-1708357294 4 minutes ago Up 3 minutes 0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp, 0.0.0.0:8888->8888/tcp 7c18ebd0915a-infra
8479b5eb8a8d quay.io/sclorg/postgresql-12-c8s:latest run-postgresql 4 minutes ago Up 3 minutes 0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp, 0.0.0.0:8888->8888/tcp assisted-installer-db
8479b5eb8a8d quay.io/sclorg/postgresql-13-c9s:latest run-postgresql 4 minutes ago Up 3 minutes 0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp, 0.0.0.0:8888->8888/tcp assisted-installer-db
ffb9013c4fab quay.io/edge-infrastructure/assisted-installer-ui:latest /deploy/start.sh 3 minutes ago Up 3 minutes 0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp, 0.0.0.0:8888->8888/tcp assisted-installer-ui
100c865abfd6 quay.io/edge-infrastructure/assisted-image-service:latest /assisted-image-s... 3 minutes ago Up 3 minutes 0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp, 0.0.0.0:8888->8888/tcp assisted-installer-image-service
78924b68f7af quay.io/edge-infrastructure/assisted-service:latest /assisted-service 3 minutes ago Up 3 minutes 0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp, 0.0.0.0:8888->8888/tcp assisted-installer-service
Expand Down
160 changes: 160 additions & 0 deletions docs/dev/postgresql-upgrade.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# PostgreSQL Major Version Upgrade

This document describes how assisted-service handles PostgreSQL major version upgrades in the kube-api (MCE/ACM) deployment mode.

## Overview

PostgreSQL major version upgrades require data migration because the on-disk format changes between versions. The assisted-service leverages the [sclorg postgresql-container](https://github.com/sclorg/postgresql-container) built-in upgrade mechanism via the `POSTGRESQL_UPGRADE` environment variable.

## The Problem

The sclorg containers support `POSTGRESQL_UPGRADE=hardlink` to trigger `pg_upgrade`, but this setting **cannot be set permanently**. The sclorg container intentionally fails when `POSTGRESQL_UPGRADE` is set but versions already match - this is a safety mechanism to prevent users from leaving it enabled.

## Our Solution: Conditional Upgrade

We use a wrapper script (`internal/controller/controllers/postgres_startup.sh`) embedded via `//go:embed` that conditionally sets `POSTGRESQL_UPGRADE=hardlink` only when a version mismatch is detected.

The script:
1. Checks if `PG_VERSION` file exists in the data directory
2. Compares data version with container's `POSTGRESQL_VERSION` env var
3. Sets `POSTGRESQL_UPGRADE=hardlink` only when versions differ
4. Calls `run-postgresql` to start the database

This handles all scenarios correctly:
- **Fresh install**: No data → normal initialization
- **Restart (same version)**: Versions match → normal startup
- **Upgrade (version mismatch)**: Versions differ → enables pg_upgrade

## How pg_upgrade Works

When `POSTGRESQL_UPGRADE=hardlink` is set and versions differ:

1. **Detect Version Mismatch**: The sclorg `run-postgresql` script reads `PG_VERSION` from the data directory
2. **Validate Source Version**: Checks that the data version matches `POSTGRESQL_PREV_VERSION` (e.g., PG13 image requires PG12 data)
3. **Run pg_upgrade**: Executes `pg_upgrade --link` to upgrade the data in-place using hardlinks
4. **Start PostgreSQL**: Normal postgres startup with upgraded data

### sclorg Environment Variables

The sclorg container images define these environment variables (baked into each image):

| Variable | Description | Example |
|----------|-------------|---------|
| `POSTGRESQL_VERSION` | Current PostgreSQL version | `13` |
| `POSTGRESQL_PREV_VERSION` | Previous version this image can upgrade from | `12` |

You can verify these by inspecting the container:
```bash
podman run --rm quay.io/sclorg/postgresql-13-c9s:latest env | grep POSTGRESQL
# POSTGRESQL_VERSION=13
# POSTGRESQL_PREV_VERSION=12
```

### Hardlink Mode

The `--link` flag tells `pg_upgrade` to create hardlinks instead of copying files:

- **Fast**: Completes in seconds regardless of database size
- **No Extra Storage**: Hardlinks share the same disk blocks as original files
- **Near-Atomic**: Hardlink creation is an atomic filesystem operation

## Preserving Events and Logs

If you need to ensure 100% preservation of events and logs, snapshot your database PVC before upgrading:

```bash
# Example: snapshot the PVC before MCE upgrade
kubectl get pvc postgres -n multicluster-engine -o yaml > postgres-pvc-backup.yaml
# Or use your storage class's snapshot feature if available
```

## Failure Handling

If the upgrade fails:

1. The postgres container crashes
2. Pod goes into `CrashLoopBackOff`
3. Logs show the error from sclorg/pg_upgrade
4. Manual investigation and recovery required

### Recovery Options

If upgrade fails and data is unrecoverable:

```bash
# 1. Check what went wrong
kubectl logs <pod-name> -c postgres -n multicluster-engine

# 2. If data is corrupt, delete the PVC to start fresh
kubectl delete pvc postgres-assisted-service -n multicluster-engine

# 3. Delete pod to force restart
kubectl delete pod <pod-name> -n multicluster-engine

# 4. New pod starts with fresh DB, controllers reconcile from CRs
```

Data loss on recovery:

| Data | Source | Recovery |
|------|--------|----------|
| Clusters | AgentClusterInstall CR | Reconciled from etcd |
| Hosts | Agent CR | Reconciled from etcd |
| InfraEnvs | InfraEnv CR | Reconciled from etcd |
| **Events** | PostgreSQL only | **Lost** |
| **Logs metadata** | PostgreSQL only | **Lost** |

## Upgrade Path

PostgreSQL container images from [sclorg](https://github.com/sclorg/postgresql-container) include binaries for the previous major version, enabling single-step upgrades. Each image only supports upgrading from one specific previous version (`POSTGRESQL_PREV_VERSION`).

### Available Images and Supported Upgrades

| Image | PG Version | Upgrades From | Base OS |
|-------|------------|---------------|---------|
| postgresql-12-c8s | 12 | 10 | RHEL 8 |
| postgresql-13-c8s | 13 | 12 | RHEL 8 |
| postgresql-13-c9s | 13 | 12 | RHEL 9 |
| postgresql-15-c9s | 15 | 13 | RHEL 9 |
| postgresql-16-c9s | 16 | 15 | RHEL 9 |
| postgresql-17-c9s | 17 | 16 | RHEL 9 |

Note: Upgrading from `postgresql-12-c8s` (RHEL 8) to `postgresql-13-c9s` (RHEL 9) is supported. See [Red Hat's fast upgrade documentation](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/configuring_and_using_database_servers/using-postgresql_configuring-and-using-database-servers#fast-upgrade-using-the-pg_upgrade-tool_migrating-to-a-rhel-9-version-of-postgresql).

## How to Upgrade PostgreSQL Version

To upgrade to a new PostgreSQL version:

1. Update `internal/controller/controllers/images.go` with the new image
2. Update `deploy/olm-catalog/manifests/assisted-service-operator.clusterserviceversion.yaml`:
- Update `DATABASE_IMAGE` env var
- Update `relatedImages` section
3. Update backplane-operator:
- `hack/bundle-automation/config.yaml` - image mapping
- `pkg/templates/charts/toggle/assisted-service/values.yaml`
- `pkg/templates/charts/toggle/assisted-service/templates/infrastructure-operator.yaml`

The wrapper script automatically detects version mismatches and triggers `pg_upgrade` when needed.

## Deployment Strategy

The assisted-service deployment uses `Recreate` strategy (not `RollingUpdate`):

```go
deploymentStrategy := appsv1.DeploymentStrategy{
Type: appsv1.RecreateDeploymentStrategyType,
}
```

This ensures the old pod releases the PVC before the new pod starts, preventing deadlocks.

## Version Skip Protection

The sclorg container validates that the source data version matches `POSTGRESQL_PREV_VERSION`. If a customer tries to skip versions (e.g., PG10 → PG13), the container fails with a clear error:

```
With this container image you can only upgrade from data directory
of version '12', not '10'.
```

Comment on lines +155 to +159
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifier to fenced code block.

The fenced code block is missing a language identifier, which causes a linting warning and reduces syntax highlighting.

🔎 Proposed fix
-```
+```text
 With this container image you can only upgrade from data directory
 of version '12', not '10'.
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

155-155: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In docs/dev/postgresql-upgrade.md around lines 155 to 159, the fenced code block
lacks a language identifier which triggers a lint warning and disables syntax
highlighting; update the opening fence to include a language identifier (for
example use ```text) so the block becomes ```text and leave the content
unchanged, ensuring the linter recognizes the code block language.

This prevents accidental data corruption from unsupported upgrade paths.
2 changes: 1 addition & 1 deletion docs/dev/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ The assisted-installer tests are divided into 3 categories:

* **Unit tests** - Focused on a module/function level while other modules are mocked.
Unit tests are located in the package, where the code they are testing resides, using the pattern `<module_name>_test.go`.
Unit tests needs a postgresql db container. The image for db container `quay.io/sclorg/postgresql-12-c8s:latest` is built from `https://github.com/sclorg/postgresql-container`
Unit tests needs a postgresql db container. The image for db container `quay.io/sclorg/postgresql-13-c9s:latest` is built from `https://github.com/sclorg/postgresql-container`

* **Subsystem tests** - Focused on the component while mocking other component.
For example, assisted-service subsystem tests mock the agent responses.
Expand Down
2 changes: 1 addition & 1 deletion internal/common/common_unitest_db.go
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ func (c *K8SDBContext) Create() error {
Containers: []corev1.Container{
{
Name: "psql",
Image: "quay.io/sclorg/postgresql-12-c8s",
Image: "quay.io/sclorg/postgresql-13-c9s",
Ports: []corev1.ContainerPort{
{
Name: "tcp-5433",
Expand Down
2 changes: 1 addition & 1 deletion internal/common/testcontainers_db_context.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ func (c *TestContainersDBContext) Create() error {
var err error
c.dbContainer, err = testcontainers.GenericContainer(c.ctx, testcontainers.GenericContainerRequest{
ContainerRequest: testcontainers.ContainerRequest{
Image: "quay.io/sclorg/postgresql-12-c8s:latest",
Image: "quay.io/sclorg/postgresql-13-c9s:latest",
Env: map[string]string{"POSTGRESQL_ADMIN_PASSWORD": "admin"},
ExposedPorts: []string{fmt.Sprintf("%s/tcp", dbDefaultPort)},
Name: dbDockerName,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2068,6 +2068,11 @@ func newAssistedServiceDeployment(ctx context.Context, log logrus.FieldLogger, a
postgresContainer := corev1.Container{
Name: databaseName,
Image: DatabaseImage(),
// Use a wrapper script that conditionally enables pg_upgrade only when a version
// mismatch is detected. This allows automatic upgrades while avoiding failures
// on normal restarts. See docs/dev/postgresql-upgrade.md for details.
Command: []string{"/bin/bash", "-c"},
Args: []string{PostgresStartupScript},
Ports: []corev1.ContainerPort{
{
Name: databaseName,
Expand Down
2 changes: 1 addition & 1 deletion internal/controller/controllers/images.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ func ImageServiceImage() string {
}

func DatabaseImage() string {
return getEnvVar("DATABASE_IMAGE", "quay.io/sclorg/postgresql-12-c8s:latest")
return getEnvVar("DATABASE_IMAGE", "quay.io/sclorg/postgresql-13-c9s:latest")
}

func AgentImage() string {
Expand Down
31 changes: 31 additions & 0 deletions internal/controller/controllers/postgres_startup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/bin/bash
# postgres_startup.sh - Wrapper script for PostgreSQL container startup
#
# This script checks if a PostgreSQL major version upgrade is needed before starting
# the database. It compares the data directory version (PG_VERSION) with the container's
# PostgreSQL version and enables pg_upgrade only when necessary.
#
# See docs/dev/postgresql-upgrade.md for details.

set -e

PGDATA=/var/lib/pgsql/data/userdata

echo "=== PostgreSQL Startup Check ==="

if [ -f "$PGDATA/PG_VERSION" ]; then
DATA_VERSION=$(cat "$PGDATA/PG_VERSION")
echo "Data directory version: $DATA_VERSION"
echo "Container image version: $POSTGRESQL_VERSION"

if [ "$DATA_VERSION" != "$POSTGRESQL_VERSION" ]; then
echo "Version mismatch detected - enabling pg_upgrade (hardlink mode)"
export POSTGRESQL_UPGRADE=hardlink
else
echo "Versions match - normal startup"
fi
else
echo "No existing data directory - fresh initialization"
fi

exec run-postgresql
13 changes: 13 additions & 0 deletions internal/controller/controllers/postgres_startup_script.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
package controllers

import _ "embed"

// PostgresStartupScript is a wrapper script that conditionally enables pg_upgrade
// only when a PostgreSQL major version upgrade is detected. This avoids the issue
// where setting POSTGRESQL_UPGRADE=hardlink permanently causes container startup
// failures on normal restarts (when versions already match).
//
// See docs/dev/postgresql-upgrade.md for details.
//
//go:embed postgres_startup.sh
var PostgresStartupScript string