diff --git a/ENVIRONMENT.rst b/ENVIRONMENT.rst index 4fbb947a9..a9be0922d 100644 --- a/ENVIRONMENT.rst +++ b/ENVIRONMENT.rst @@ -11,11 +11,11 @@ Environment Configuration Settings - **ETCD_KEY**: Etcd client certificate key. Can be empty if the key is part of certificate. - **PGHOME**: filesystem path where to put PostgreSQL home directory (/home/postgres by default) - **APIPORT**: TCP port to Patroni API connections (8008 by default) -- **BACKUP_SCHEDULE**: cron schedule for doing backups via WAL-E (if WAL-E is enabled, '00 01 * * *' by default) +- **BACKUP_SCHEDULE**: cron schedule for doing backups via WAL-G ('00 01 * * *' by default) - **CLONE_TARGET_TIMELINE**: timeline id of the backup for restore, 'latest' by default. - **CRONTAB**: anything that you want to run periodically as a cron job (empty by default) - **PGROOT**: a directory where we put the pgdata (by default /home/postgres/pgroot). One may adjust it to point to the mount point of the persistent volume, such as EBS. -- **WALE_TMPDIR**: directory to store WAL-E temporary files. PGROOT/../tmp by default, make sure it has a few GBs of free space. +- **WALE_TMPDIR** or **WALG_TMPDIR**: directory to store WAL-G temporary files. PGROOT/../tmp by default, make sure it has a few GBs of free space. - **PGDATA**: location of PostgreSQL data directory, by default PGROOT/pgdata. - **PGUSER_STANDBY**: username for the replication user, 'standby' by default. - **PGPASSWORD_STANDBY**: a password for the replication user, 'standby' by default. @@ -47,22 +47,22 @@ Environment Configuration Settings - **SSL_RESTAPI_PRIVATE_KEY**: content of the REST Api SSL private key in the SSL_PRIVATE_KEY_FILE file (by default /run/certs/server.key). - **SSL_TEST_RELOAD**: whenever to test for certificate rotation and reloading (by default True if SSL_PRIVATE_KEY_FILE has been set). - **RESTAPI_CONNECT_ADDRESS**: when you configure Patroni RESTAPI in SSL mode some safe API (i.e. switchover) perform hostname validation. In this case could be convenient configure ````restapi.connect_address````as a hostname instead of IP. For example, you can configure it as "$(POD_NAME).". -- **WALE_BACKUP_THRESHOLD_MEGABYTES**: maximum size of the WAL segments accumulated after the base backup to consider WAL-E restore instead of pg_basebackup. -- **WALE_BACKUP_THRESHOLD_PERCENTAGE**: maximum ratio (in percents) of the accumulated WAL files to the base backup to consider WAL-E restore instead of pg_basebackup. -- **WALE_ENV_DIR**: directory where to store WAL-E environment variables +- **WALG_BACKUP_THRESHOLD_MEGABYTES** or **WALE_BACKUP_THRESHOLD_MEGABYTES**: maximum size of the WAL segments accumulated after the base backup to consider WAL-G restore instead of pg_basebackup. +- **WALG_BACKUP_THRESHOLD_PERCENTAGE** or **WALE_BACKUP_THRESHOLD_PERCENTAGE**: maximum ratio (in percents) of the accumulated WAL files to the base backup to consider WAL-G restore instead of pg_basebackup. +- **WALG_ENV_DIR** or **WALE_ENV_DIR**: directory where to store WAL-G environment variables - **WAL_RESTORE_TIMEOUT**: timeout (in seconds) for restoring a single WAL file (at most 16 MB) from the backup location, 0 by default. A duration of 0 disables the timeout. -- **WAL_S3_BUCKET**: (optional) name of the S3 bucket used for WAL-E base backups. +- **WAL_S3_BUCKET**: (optional) name of the S3 bucket used for WAL-G base backups. - **AWS_ACCESS_KEY_ID**: (optional) aws access key - **AWS_SECRET_ACCESS_KEY**: (optional) aws secret key - **AWS_REGION**: (optional) region of S3 bucket - **AWS_ENDPOINT**: (optional) in format 'https://s3.AWS_REGION.amazonaws.com:443', if not specified will be calculated from AWS_REGION -- **WALE_S3_ENDPOINT**: (optional) in format 'https+path://s3.AWS_REGION.amazonaws.com:443', if not specified will be calculated from AWS_ENDPOINT or AWS_REGION -- **WALE_S3_PREFIX**: (optional) the full path to the backup location on S3 in the format s3://bucket-name/very/long/path. If not specified Spilo will generate it from WAL_S3_BUCKET. -- **WAL_GS_BUCKET**: ditto for the Google Cloud Storage (WAL-E supports both S3 and GCS). -- **WALE_GS_PREFIX**: (optional) the full path to the backup location on the Google Cloud Storage in the format gs://bucket-name/very/long/path. If not specified Spilo will generate it from WAL_GS_BUCKET. -- **GOOGLE_APPLICATION_CREDENTIALS**: credentials for WAL-E when running in Google Cloud. +- **WALG_S3_ENDPOINT** or **WALE_S3_ENDPOINT**: (optional) in format 'https+path://s3.AWS_REGION.amazonaws.com:443', if not specified will be calculated from AWS_ENDPOINT or AWS_REGION +- **WALG_S3_PREFIX** or **WALE_S3_PREFIX**: (optional) the full path to the backup location on S3 in the format s3://bucket-name/very/long/path. If not specified Spilo will generate it from WAL_S3_BUCKET. +- **WAL_GS_BUCKET**: ditto for the Google Cloud Storage (WAL-G supports both S3 and GCS). +- **WALG_GS_PREFIX** or **WALE_GS_PREFIX**: (optional) the full path to the backup location on the Google Cloud Storage in the format gs://bucket-name/very/long/path. If not specified Spilo will generate it from WAL_GS_BUCKET. +- **GOOGLE_APPLICATION_CREDENTIALS**: credentials for WAL-G when running in Google Cloud. - **WAL_SWIFT_BUCKET**: ditto for the OpenStack Object Storage (Swift) -- **SWIFT_AUTHURL**: see wal-e documentation https://github.com/wal-e/wal-e#swift +- **SWIFT_AUTHURL**: see wal-g documentation https://wal-g.readthedocs.io/STORAGES/#swift - **SWIFT_TENANT**: - **SWIFT_TENANT_ID**: - **SWIFT_USER**: @@ -79,7 +79,7 @@ Environment Configuration Settings - **SWIFT_PROJECT_ID**: - **SWIFT_PROJECT_DOMAIN_NAME**: - **SWIFT_PROJECT_DOMAIN_ID**: -- **WALE_SWIFT_PREFIX**: (optional) the full path to the backup location on the Swift Storage in the format swift://bucket-name/very/long/path. If not specified Spilo will generate it from WAL_SWIFT_BUCKET. +- **WALG_SWIFT_PREFIX** or **WALE_SWIFT_PREFIX**: (optional) the full path to the backup location on the Swift Storage in the format swift://bucket-name/very/long/path. If not specified Spilo will generate it from WAL_SWIFT_BUCKET. - **SSH_USERNAME**: (optional) the username for WAL backups. - **SSH_PORT**: (optional) the ssh port for WAL backups. - **SSH_PRIVATE_KEY_PATH**: (optional) the path to the private key used for WAL backups. @@ -109,18 +109,17 @@ Environment Configuration Settings - **KUBERNETES_BOOTSTRAP_LABELS**: a JSON describing names and values of labels used by Patroni as ``kubernetes.bootstrap_labels``. Default is empty. - **INITDB_LOCALE**: database cluster's default UTF-8 locale (en_US by default) - **ENABLE_WAL_PATH_COMPAT**: old Spilo images were generating wal path in the backup store using the following template ``/spilo/{WAL_BUCKET_SCOPE_PREFIX}{SCOPE}{WAL_BUCKET_SCOPE_SUFFIX}/wal/``, while new images adding one additional directory (``{PGVERSION}``) to the end. In order to avoid (unlikely) issues with restoring WALs (from S3/GC/and so on) when switching to ``spilo-13`` please set the ``ENABLE_WAL_PATH_COMPAT=true`` when deploying old cluster with ``spilo-13`` for the first time. After that the environment variable could be removed. Change of the WAL path also mean that backups stored in the old location will not be cleaned up automatically. -- **WALE_DISABLE_S3_SSE**, **WALG_DISABLE_S3_SSE**: by default wal-e/wal-g are configured to encrypt files uploaded to S3. In order to disable it you can set this environment variable to ``true``. +- **WALG_DISABLE_S3_SSE** or **WALE_DISABLE_S3_SSE**: by default wal-g is configured to encrypt files uploaded to S3. In order to disable it you can set this environment variable to ``true``. - **USE_OLD_LOCALES**: whether to use old locales from Ubuntu 18.04 in the Ubuntu 22.04-based image. Default is false. wal-g ----- -`wal-g` is used by default for Azure and SSH backups and restore. -In case of S3, `wal-e` is used for backups and `wal-g` for restore. - -- **USE_WALG_BACKUP**: (optional) Enforce using `wal-g` instead of `wal-e` for backups (Boolean) -- **USE_WALG_RESTORE**: (optional) Enforce using `wal-g` instead of `wal-e` for restores (Boolean) - +wal-g is used everywhere in Spilo to perform backups and restore from them. **Support for wal-e has been removed**. +Backward compatibility is ensured for environment variables containing **WALE**, the env-dir layout, and bootstrap method names. +This allows existing configurations and clusters to continue working without requiring immediate changes. +Regardless of which variable is set, all backups and restores will be performed using wal-g. +However, if both **WALE** and **WALG** variables are present, the latter will take precedence. - **WALG_DELTA_MAX_STEPS**, **WALG_DELTA_ORIGIN**, **WALG_DOWNLOAD_CONCURRENCY**, **WALG_UPLOAD_CONCURRENCY**, **WALG_UPLOAD_DISK_CONCURRENCY**, **WALG_DISK_RATE_LIMIT**, **WALG_NETWORK_RATE_LIMIT**, **WALG_COMPRESSION_METHOD**, **WALG_BACKUP_COMPRESSION_METHOD**, **WALG_BACKUP_FROM_REPLICA**, **WALG_SENTINEL_USER_DATA**, **WALG_PREVENT_WAL_OVERWRITE**: (optional) configuration options for wal-g. - **WALG_S3_CA_CERT_FILE**: (optional) TLS CA certificate for wal-g (see [wal-g configuration](https://github.com/wal-g/wal-g#configuration)) - **WALG_SSH_PREFIX**: (optional) the ssh prefix to store WAL backups at in the format ssh://host.example.com/path/to/backups/ See `Wal-g `__ documentation for details. diff --git a/delivery.yaml b/delivery.yaml index d05d19a21..12ea76a70 100644 --- a/delivery.yaml +++ b/delivery.yaml @@ -3,7 +3,8 @@ allow_concurrent_steps: true build_env: &BUILD_ENV BASE_IMAGE: container-registry.zalando.net/library/ubuntu-22.04 - PGVERSION: 17 + PGVERSION: 18 + PGOLDVERSIONS: "16 17" MULTI_ARCH_REGISTRY: container-registry-test.zalando.net/acid pipeline: @@ -32,7 +33,7 @@ pipeline: docker buildx build --platform "linux/amd64,linux/arm64" \ --build-arg PGVERSION="$PGVERSION" \ --build-arg BASE_IMAGE="$BASE_IMAGE" \ - --build-arg PGOLDVERSIONS="14 15 16" \ + --build-arg PGOLDVERSIONS="$PGOLDVERSIONS" \ -t "$ECR_TEST_IMAGE" \ --push . @@ -61,7 +62,7 @@ pipeline: docker buildx build --platform "linux/amd64,linux/arm64" \ --build-arg PGVERSION="$PGVERSION" \ --build-arg BASE_IMAGE="$BASE_IMAGE" \ - --build-arg PGOLDVERSIONS="14 15 16" \ + --build-arg PGOLDVERSIONS="$PGOLDVERSIONS" \ -t "$ECR_TEST_IMAGE" \ --push . cdp-promote-image "$ECR_TEST_IMAGE" @@ -92,7 +93,7 @@ pipeline: docker buildx build --platform "linux/amd64,linux/arm64" \ --build-arg PGVERSION="$PGVERSION" \ --build-arg BASE_IMAGE="$BASE_IMAGE" \ - --build-arg PGOLDVERSIONS="14 15 16" \ + --build-arg PGOLDVERSIONS="$PGOLDVERSIONS" \ -t "$ECR_TEST_IMAGE" \ --push . cdp-promote-image "$ECR_TEST_IMAGE" diff --git a/postgres-appliance/Dockerfile b/postgres-appliance/Dockerfile index 7d15482d1..6608483ab 100644 --- a/postgres-appliance/Dockerfile +++ b/postgres-appliance/Dockerfile @@ -1,5 +1,5 @@ ARG BASE_IMAGE=ubuntu:22.04 -ARG PGVERSION=17 +ARG PGVERSION=18 ARG DEMO=false ARG COMPRESS=false ARG ADDITIONAL_LOCALES= @@ -18,7 +18,7 @@ FROM $BASE_IMAGE as dependencies-builder ARG DEMO -ENV WALG_VERSION=v3.0.5 +ENV WALG_VERSION=v3.0.8 COPY build_scripts/dependencies.sh /builddeps/ @@ -46,21 +46,19 @@ ARG PGVERSION ARG TIMESCALEDB_APACHE_ONLY=true ARG TIMESCALEDB_TOOLKIT=true ARG COMPRESS -ARG PGOLDVERSIONS="13 14 15 16" +ARG PGOLDVERSIONS="14 15 16 17" ARG WITH_PERL=false ARG DEB_PG_SUPPORTED_VERSIONS="$PGOLDVERSIONS $PGVERSION" # Install PostgreSQL, extensions and contribs -ENV POSTGIS_VERSION=3.5 \ - BG_MON_COMMIT=7f5887218790b263fe3f42f85f4ddc9c8400b154 \ +ENV POSTGIS_VERSION=3.6 \ + BG_MON_COMMIT=a73c6bcd10dfdf9feaf5eabab7eb6b12d167680d \ PG_AUTH_MON_COMMIT=fe099eef7662cbc85b0b79191f47f52f1e96b779 \ - PG_MON_COMMIT=ead1de70794ed62ca1e34d4022f6165ff36e9a91 \ - SET_USER=REL4_1_0 \ + PG_MON_COMMIT=88ac7b58348aa061c814982defc170644f368f39 \ PLPROFILER=REL4_2_5 \ - PG_PROFILE=4.7 \ - PAM_OAUTH2=v1.0.1 \ - PG_PERMISSIONS_COMMIT=f4b7c18676fa64236a1c8e28d34a35764e4a70e2 + PG_PROFILE=4.11 \ + PAM_OAUTH2=v1.0.1 WORKDIR /builddeps RUN bash base.sh @@ -68,15 +66,14 @@ RUN bash base.sh # Install wal-g COPY --from=dependencies-builder /builddeps/wal-g /usr/local/bin/ -COPY build_scripts/patroni_wale.sh build_scripts/compress_build.sh /builddeps/ +COPY build_scripts/patroni.sh build_scripts/compress_build.sh /builddeps/ -# Install patroni and wal-e -ENV PATRONIVERSION=4.0.4 -ENV WALE_VERSION=1.1.1 +# Install patroni +ENV PATRONIVERSION=4.1.0 WORKDIR / -RUN bash /builddeps/patroni_wale.sh +RUN bash /builddeps/patroni.sh RUN if [ "$COMPRESS" = "true" ]; then bash /builddeps/compress_build.sh; fi @@ -101,7 +98,7 @@ ENV LC_ALL=en_US.utf-8 \ RW_DIR=/run \ DEMO=$DEMO -ENV WALE_ENV_DIR=$RW_DIR/etc/wal-e.d/env \ +ENV WALG_ENV_DIR=$RW_DIR/etc/wal-e.d/env \ LOG_ENV_DIR=$RW_DIR/etc/log.d/env \ PGROOT=$PGHOME/pgdata/pgroot diff --git a/postgres-appliance/bootstrap/clone_with_wale.py b/postgres-appliance/bootstrap/clone_with_walg.py similarity index 91% rename from postgres-appliance/bootstrap/clone_with_wale.py rename to postgres-appliance/bootstrap/clone_with_walg.py index 9e0adc1c5..1d32823d0 100755 --- a/postgres-appliance/bootstrap/clone_with_wale.py +++ b/postgres-appliance/bootstrap/clone_with_walg.py @@ -25,7 +25,7 @@ def read_configuration(): parser.add_argument('--recovery-target-time', help='the timestamp up to which recovery will proceed (including time zone)', dest='recovery_target_time_string') - parser.add_argument('--dry-run', action='store_true', help='find a matching backup and build the wal-e ' + parser.add_argument('--dry-run', action='store_true', help='find a matching backup and build the wal-g.' 'command to fetch that backup without running it') args = parser.parse_args() @@ -40,8 +40,8 @@ def read_configuration(): return options(args.scope, args.datadir, recovery_target_time, args.dry_run) -def build_wale_command(command, datadir=None, backup=None): - cmd = ['wal-g' if os.getenv('USE_WALG_RESTORE') == 'true' else 'wal-e'] + [command] +def build_walg_command(command, datadir=None, backup=None): + cmd = ['wal-g', command] if command == 'backup-fetch': if datadir is None or backup is None: raise Exception("backup-fetch requires datadir and backup arguments") @@ -79,7 +79,7 @@ def choose_backup(backup_list, recovery_target_time): def list_backups(env): - backup_list_cmd = build_wale_command('backup-list') + backup_list_cmd = build_walg_command('backup-list') output = subprocess.check_output(backup_list_cmd, env=env) reader = csv.DictReader(fix_output(output), dialect='excel-tab') return list(reader) @@ -89,7 +89,7 @@ def get_clone_envdir(): from spilo_commons import get_patroni_config config = get_patroni_config() - restore_command = shlex.split(config['bootstrap']['clone_with_wale']['recovery_conf']['restore_command']) + restore_command = shlex.split(config['bootstrap']['clone_with_walg']['recovery_conf']['restore_command']) if len(restore_command) > 4 and restore_command[0] == 'envdir': return restore_command[1] raise Exception('Failed to find clone envdir') @@ -117,10 +117,9 @@ def get_possible_versions(): return [ver for _, ver in sorted(versions.items(), reverse=True)] -def get_wale_environments(env): - use_walg = env.get('USE_WALG_RESTORE') == 'true' - prefix = 'WALG_' if use_walg else 'WALE_' - # len('WALE__PREFIX') = 12 +def get_walg_environments(env): + prefix = 'WALG_' + # len('WALG_PREFIX') = 12 names = [name for name in env.keys() if name.endswith('_PREFIX') and name.startswith(prefix) and len(name) > 12] if len(names) != 1: raise Exception('Found find {0} {1}*_PREFIX environment variables, expected 1' @@ -141,7 +140,7 @@ def get_wale_environments(env): def find_backup(recovery_target_time, env): old_value = None - for name, value in get_wale_environments(env): + for name, value in get_walg_environments(env): logger.info('Trying %s for clone', value) if not old_value: old_value = env[name] @@ -164,12 +163,12 @@ def run_clone_from_s3(options): backup_name, update_envdir = find_backup(options.recovery_target_time, env) - backup_fetch_cmd = build_wale_command('backup-fetch', options.datadir, backup_name) + backup_fetch_cmd = build_walg_command('backup-fetch', options.datadir, backup_name) logger.info("cloning cluster %s using %s", options.name, ' '.join(backup_fetch_cmd)) if not options.dry_run: ret = subprocess.call(backup_fetch_cmd, env=env) if ret != 0: - raise Exception("wal-e backup-fetch exited with exit code {0}".format(ret)) + raise Exception("wal-g backup-fetch exited with exit code {0}".format(ret)) if update_envdir: # We need to update file in the clone envdir or restore_command will fail! envdir = get_clone_envdir() diff --git a/postgres-appliance/bootstrap/maybe_pg_upgrade.py b/postgres-appliance/bootstrap/maybe_pg_upgrade.py index 4f36e6953..9252d7899 100644 --- a/postgres-appliance/bootstrap/maybe_pg_upgrade.py +++ b/postgres-appliance/bootstrap/maybe_pg_upgrade.py @@ -40,7 +40,7 @@ def perform_pitr(postgresql, cluster_version, bin_version, config): except Exception: logs = tail_postgres_logs() # Spilo has no other locales except en_EN.UTF-8, therefore we are safe here. - if int(cluster_version) >= 13 and 'recovery ended before configured recovery target was reached' in logs: + if 'recovery ended before configured recovery target was reached' in logs: # Starting from version 13 Postgres stopped promoting when recovery target wasn't reached. # In order to improve the user experience we reset all possible recovery targets and retry. recovery_conf = config[config['method']].get('recovery_conf', {}) @@ -103,7 +103,7 @@ def main(): except Exception as e: logger.error('Failed to update extensions: %r', e) - upgrade.analyze() + upgrade.analyze(bin_version) def call_maybe_pg_upgrade(): diff --git a/postgres-appliance/build_scripts/base.sh b/postgres-appliance/build_scripts/base.sh index ff885c5e9..bc4633d82 100644 --- a/postgres-appliance/build_scripts/base.sh +++ b/postgres-appliance/build_scripts/base.sh @@ -51,11 +51,9 @@ if [ "$WITH_PERL" != "true" ]; then equivs-build perl fi -curl -sL "https://github.com/zalando-pg/bg_mon/archive/$BG_MON_COMMIT.tar.gz" | tar xz +curl -sL "https://github.com/CyberDem0n/bg_mon/archive/$BG_MON_COMMIT.tar.gz" | tar xz curl -sL "https://github.com/zalando-pg/pg_auth_mon/archive/$PG_AUTH_MON_COMMIT.tar.gz" | tar xz -curl -sL "https://github.com/cybertec-postgresql/pg_permissions/archive/$PG_PERMISSIONS_COMMIT.tar.gz" | tar xz curl -sL "https://github.com/zubkov-andrei/pg_profile/archive/$PG_PROFILE.tar.gz" | tar xz -git clone -b "$SET_USER" https://github.com/pgaudit/set_user.git apt-get install -y \ postgresql-common \ @@ -85,10 +83,8 @@ for version in $DEB_PG_SUPPORTED_VERSIONS; do "postgresql-${version}-pgaudit" "postgresql-${version}-pldebugger" "postgresql-${version}-pglogical" - "postgresql-${version}-pglogical-ticker" "postgresql-${version}-plpgsql-check" "postgresql-${version}-pg-checksums" - "postgresql-${version}-pgl-ddl-deploy" "postgresql-${version}-pgq-node" "postgresql-${version}-postgis-${POSTGIS_VERSION%.*}" "postgresql-${version}-postgis-${POSTGIS_VERSION%.*}-scripts" @@ -97,10 +93,12 @@ for version in $DEB_PG_SUPPORTED_VERSIONS; do "postgresql-${version}-decoderbufs" "postgresql-${version}-pllua" "postgresql-${version}-pgvector" - "postgresql-${version}-roaringbitmap") + "postgresql-${version}-roaringbitmap" + "postgresql-${version}-pgfaceting") - if [ "$version" -ge 14 ]; then - EXTRAS+=("postgresql-${version}-pgfaceting") + if [ "$version" != "18" ]; then + EXTRAS+=("postgresql-${version}-pgl-ddl-deploy" + "postgresql-${version}-pglogical-ticker") fi if [ "$WITH_PERL" = "true" ]; then @@ -124,12 +122,36 @@ for version in $DEB_PG_SUPPORTED_VERSIONS; do "postgresql-server-dev-${version}" \ "postgresql-${version}-pgq3" \ "postgresql-${version}-pg-stat-kcache" \ + "postgresql-${version}-pg-permissions" \ + "postgresql-${version}-set-user" \ "${EXTRAS[@]}" - # Clean up timescaledb versions except the last 5 minor versions + # Clean up timescaledb versions - keep at least 5 minor versions, but ensure compatibility with the lowest/oldest PG version (where possible) + exclude_patterns=() versions=$(find "/usr/lib/postgresql/$version/lib/" -name 'timescaledb-2.*.so' | sed -rn 's/.*timescaledb-([1-9]+\.[0-9]+\.[0-9]+)\.so$/\1/p' | sort -rV) - latest_minor_versions=$(echo "$versions" | awk -F. '{print $1"."$2}' | uniq | head -n 5) + + # Calculate the number of versions dynamically based on the lowest PG version's latest minor + num_versions=5 + if [ -n "$first_latest_minor" ]; then + minor_versions=$(echo "$versions" | awk -F. '{print $1"."$2}' | uniq) + position=0 + found=0 + while IFS= read -r minor; do + position=$((position + 1)) + if [ "$minor" = "$first_latest_minor" ]; then + found=1 + break + fi + done <<< "$minor_versions" + + # if found, keep max(5, position) versions (so all versions have at least 1 version in common with lowest PG version) + if [ $found -eq 1 ] && [ $position -gt $num_versions ]; then + num_versions=$position + fi + fi + + latest_minor_versions=$(echo "$versions" | awk -F. '{print $1"."$2}' | uniq | head -n "$num_versions") for minor in $latest_minor_versions; do for full_version in $(echo "$versions" | grep "^$minor"); do exclude_patterns+=(! -name timescaledb-"${full_version}".so) @@ -138,6 +160,11 @@ for version in $DEB_PG_SUPPORTED_VERSIONS; do done find "/usr/lib/postgresql/$version/lib/" \( -name 'timescaledb-2.*.so' -o -name 'timescaledb-tsl-2.*.so' \) "${exclude_patterns[@]}" -delete + # Save the latest minor version from the first PG version + if [ -z "$first_latest_minor" ]; then + first_latest_minor=$(echo "$latest_minor_versions" | head -n 1) + fi + # Install 3rd party stuff if [ "${TIMESCALEDB_APACHE_ONLY}" != "true" ] && [ "${TIMESCALEDB_TOOLKIT}" = "true" ]; then @@ -156,11 +183,10 @@ for version in $DEB_PG_SUPPORTED_VERSIONS; do for n in bg_mon-${BG_MON_COMMIT} \ pg_auth_mon-${PG_AUTH_MON_COMMIT} \ - set_user \ - pg_permissions-${PG_PERMISSIONS_COMMIT} \ pg_profile-${PG_PROFILE} \ "${EXTRA_EXTENSIONS[@]}"; do - make -C "$n" USE_PGXS=1 clean install-strip + PATH="/usr/lib/postgresql/$version/bin:$PATH" make -C "$n" USE_PGXS=1 clean + PATH="/usr/lib/postgresql/$version/bin:$PATH" make -C "$n" USE_PGXS=1 install-strip done done diff --git a/postgres-appliance/build_scripts/dependencies.sh b/postgres-appliance/build_scripts/dependencies.sh index 65aa28055..2f1ef68cb 100644 --- a/postgres-appliance/build_scripts/dependencies.sh +++ b/postgres-appliance/build_scripts/dependencies.sh @@ -29,9 +29,9 @@ apt-get install -y curl ca-certificates mkdir /builddeps/wal-g if [ "$ARCH" = "amd64" ]; then - PKG_NAME='wal-g-pg-ubuntu-20.04-amd64' + PKG_NAME='wal-g-pg-22.04-amd64' else - PKG_NAME='wal-g-pg-ubuntu-20.04-aarch64' + PKG_NAME='wal-g-pg-22.04-aarch64' fi curl -sL "https://github.com/wal-g/wal-g/releases/download/$WALG_VERSION/$PKG_NAME.tar.gz" \ diff --git a/postgres-appliance/build_scripts/patroni_wale.sh b/postgres-appliance/build_scripts/patroni.sh similarity index 77% rename from postgres-appliance/build_scripts/patroni_wale.sh rename to postgres-appliance/build_scripts/patroni.sh index 04c792314..e9fe60a79 100644 --- a/postgres-appliance/build_scripts/patroni_wale.sh +++ b/postgres-appliance/build_scripts/patroni.sh @@ -1,8 +1,8 @@ #!/bin/bash -## ------------------------- -## Install patroni and wal-e -## ------------------------- +## ---------------- +## Install patroni +## ---------------- export DEBIAN_FRONTEND=noninteractive @@ -26,25 +26,17 @@ if [ "$DEMO" != "true" ]; then python3-etcd \ python3-consul \ python3-kazoo \ - python3-boto \ python3-boto3 \ python3-botocore \ python3-cachetools \ - python3-cffi \ - python3-gevent \ python3-pyasn1-modules \ python3-rsa \ - python3-s3transfer \ - python3-swiftclient + python3-s3transfer find /usr/share/python-babel-localedata/locale-data -type f ! -name 'en_US*.dat' -delete - pip3 install filechunkio protobuf \ - 'git+https://github.com/zalando-pg/wal-e.git@ipv6-imds#egg=wal-e[aws,google,swift]' \ + pip3 install protobuf \ 'git+https://github.com/zalando/pg_view.git@master#egg=pg-view' - - # https://github.com/wal-e/wal-e/issues/318 - sed -i 's/^\( for i in range(0,\) num_retries):.*/\1 100):/g' /usr/lib/python3/dist-packages/boto/utils.py else EXTRAS="" fi diff --git a/postgres-appliance/build_scripts/prepare.sh b/postgres-appliance/build_scripts/prepare.sh index 66c2a2cb8..8590e34c9 100644 --- a/postgres-appliance/build_scripts/prepare.sh +++ b/postgres-appliance/build_scripts/prepare.sh @@ -19,8 +19,6 @@ rm -fr /etc/cron.??* truncate --size 0 /etc/crontab if [ "$DEMO" != "true" ]; then - # Required for wal-e - apt-get install -y pv lzop # install etcdctl ETCDVERSION=3.3.27 curl -L https://github.com/coreos/etcd/releases/download/v${ETCDVERSION}/etcd-v${ETCDVERSION}-linux-"$(dpkg --print-architecture)".tar.gz \ diff --git a/postgres-appliance/launch.sh b/postgres-appliance/launch.sh index 1e3e54f30..07f17474a 100755 --- a/postgres-appliance/launch.sh +++ b/postgres-appliance/launch.sh @@ -51,10 +51,11 @@ chmod -R go-w "$PGROOT" chmod 01777 "$RW_DIR/tmp" chmod 0700 "$PGDATA" +WALG_ENV_DIR="${WALG_ENV_DIR:-$WALE_ENV_DIR}" if [ "$DEMO" = "true" ]; then python3 /scripts/configure_spilo.py patroni pgqd certificate pam-oauth2 elif python3 /scripts/configure_spilo.py all; then - CMD="/scripts/patroni_wait.sh -t 3600 -- envdir $WALE_ENV_DIR /scripts/postgres_backup.sh $PGDATA" + CMD="/scripts/patroni_wait.sh -t 3600 -- envdir $WALG_ENV_DIR /scripts/postgres_backup.sh $PGDATA" if [ "$(id -u)" = "0" ]; then su postgres -c "PATH=$PATH $CMD" & else diff --git a/postgres-appliance/major_upgrade/inplace_upgrade.py b/postgres-appliance/major_upgrade/inplace_upgrade.py index 0390cfbd4..51c193bb6 100644 --- a/postgres-appliance/major_upgrade/inplace_upgrade.py +++ b/postgres-appliance/major_upgrade/inplace_upgrade.py @@ -21,7 +21,7 @@ RSYNC_PORT = 5432 -def patch_wale_prefix(value, new_version): +def patch_walg_prefix(value, new_version): from spilo_commons import is_valid_pg_version if '/spilo/' in value and '/wal/' in value: # path crafted in the configure_spilo.py? @@ -51,20 +51,20 @@ def update_configs(new_version): write_patroni_config(config, True) - # update wal-e/wal-g envdir files + # update wal-g envdir files restore_command = shlex.split(config['postgresql'].get('recovery_conf', {}).get('restore_command', '')) if len(restore_command) > 6 and restore_command[0] == 'envdir': envdir = restore_command[1] try: for name in os.listdir(envdir): - # len('WALE__PREFIX') = 12 - if len(name) > 12 and name.endswith('_PREFIX') and name[:5] in ('WALE_', 'WALG_'): + # len('WALG__PREFIX') = 12 + if len(name) > 12 and name.endswith('_PREFIX') and name.startswith('WALG_'): name = os.path.join(envdir, name) try: with open(name) as f: value = f.read().strip() - new_value = patch_wale_prefix(value, new_version) + new_value = patch_walg_prefix(value, new_version) if new_value != value: write_file(new_value, name, True) except Exception as e: @@ -451,7 +451,7 @@ def restore_custom_statistics_target(self): except Exception: logger.error("Failed to execute '%s'", query) - def reanalyze(self): + def custom_stats_target_reanalyze(self): from patroni.postgresql.connection import get_connection_cursor if not self._statistics: @@ -470,12 +470,15 @@ def reanalyze(self): except Exception: logger.error("Failed to execute '%s'", query) + def full_reanalyze(self): + self.postgresql.analyze(self.desired_version) + def analyze(self): try: self.reset_custom_statistics_target() except Exception as e: logger.error('Failed to reset custom statistics targets: %r', e) - self.postgresql.analyze(True) + self.postgresql.analyze(self.desired_version, in_stages=True) try: self.restore_custom_statistics_target() except Exception as e: @@ -634,7 +637,10 @@ def do_upgrade(self): analyze_thread.join() - self.reanalyze() + if int(self.desired_version) < 18: + self.custom_stats_target_reanalyze() + else: + self.full_reanalyze() logger.info('Total upgrade time (with analyze): %s', time.time() - downtime_start) self.postgresql.bootstrap.call_post_bootstrap(self.config['bootstrap']) @@ -711,6 +717,7 @@ def rsync_replica(config, desired_version, primary_ip, pid): env = os.environ.copy() env['RSYNC_PASSWORD'] = postgresql.config.replication['password'] + primary_ip = f'[{primary_ip}]' if ':' in primary_ip else primary_ip if subprocess.call(['rsync', '--archive', '--delete', '--hard-links', '--size-only', '--omit-dir-times', '--no-inc-recursive', '--include=/data/***', '--include=/data_old/***', '--exclude=/data/pg_xlog/*', '--exclude=/data_old/pg_xlog/*', diff --git a/postgres-appliance/major_upgrade/pg_upgrade.py b/postgres-appliance/major_upgrade/pg_upgrade.py index ad1563e80..f3f05ea6b 100644 --- a/postgres-appliance/major_upgrade/pg_upgrade.py +++ b/postgres-appliance/major_upgrade/pg_upgrade.py @@ -207,8 +207,9 @@ def prepare_new_pgdata(self, version): locale = self.query("SELECT datcollate FROM pg_database WHERE datname='template1';")[0][0] encoding = self.query('SHOW server_encoding')[0][0] initdb_config = [{'locale': locale}, {'encoding': encoding}] - if self.query("SELECT current_setting('data_checksums')::bool")[0][0]: - initdb_config.append('data-checksums') + checksums_enabled = self.query("SELECT current_setting('data_checksums')::bool")[0][0] + if checksums_enabled == (int(version) < 18): + initdb_config.append('data-checksums' if checksums_enabled else 'no-data-checksums') logger.info('initdb config: %s', initdb_config) @@ -268,9 +269,12 @@ def do_upgrade(self): return self.pg_upgrade() and self.restore_shared_preload_libraries()\ and self.switch_pgdata() and self.cleanup_old_pgdata() - def analyze(self, in_stages=False): - vacuumdb_args = ['--analyze-in-stages'] if in_stages else [] - logger.info('Rebuilding statistics (vacuumdb%s)', (' ' + vacuumdb_args[0] if in_stages else '')) + def analyze(self, version, in_stages=False): + vacuumdb_args = [] + if in_stages: + vacuumdb_args = ['--analyze-in-stages'] if int(version) < 18 else ['--analyze-in-stages', + '--missing-stats-only'] + logger.info('Rebuilding statistics (vacuumdb%s)', (' ' + ' '.join(vacuumdb_args) if in_stages else '')) if 'username' in self.config.superuser: vacuumdb_args += ['-U', self.config.superuser['username']] vacuumdb_args += ['-Z', '-j'] diff --git a/postgres-appliance/scripts/basebackup.sh b/postgres-appliance/scripts/basebackup.sh index 7c8fc68dc..d47cd2923 100755 --- a/postgres-appliance/scripts/basebackup.sh +++ b/postgres-appliance/scripts/basebackup.sh @@ -19,6 +19,10 @@ done [[ -z $DATA_DIR || -z "$CONNSTR" || ! $RETRIES =~ ^[1-9]$ ]] && exit 1 +if [[ ! $CONNSTR =~ dbname= ]]; then + CONNSTR="${CONNSTR} dbname=postgres" +fi + if which pg_receivewal &> /dev/null; then PG_RECEIVEWAL=pg_receivewal PG_BASEBACKUP_OPTS=(-X none) @@ -95,6 +99,11 @@ else receivewal_pid=$(cat "$WAL_FAST/receivewal.pid") fi +PGVER=$(psql "$CONNSTR" -tAc "SELECT pg_catalog.current_setting('server_version_num')::int/10000" || echo 0) +if [[ $PGVER -ge 15 ]]; then + PG_BASEBACKUP_OPTS+=("--compress=server-lz4") +fi + ATTEMPT=0 while [[ $((ATTEMPT++)) -le $RETRIES ]]; do pg_basebackup --pgdata="${DATA_DIR}" "${PG_BASEBACKUP_OPTS[@]}" --dbname="${CONNSTR}" & diff --git a/postgres-appliance/scripts/callback_aws.py b/postgres-appliance/scripts/callback_aws.py index 7b46c618d..89300f824 100755 --- a/postgres-appliance/scripts/callback_aws.py +++ b/postgres-appliance/scripts/callback_aws.py @@ -1,54 +1,43 @@ #!/usr/bin/env python -import boto.ec2 -import boto.utils +from botocore.config import Config +import boto3 import logging import os import sys -import time +import requests logger = logging.getLogger(__name__) LEADER_TAG_VALUE = os.environ.get('AWS_LEADER_TAG_VALUE', 'master') -def retry(func): - def wrapped(*args, **kwargs): - count = 0 - while True: - try: - return func(*args, **kwargs) - except boto.exception.BotoServerError as e: - if count >= 10 or str(e.error_code) not in ('Throttling', 'RequestLimitExceeded'): - raise - logger.info('Throttling AWS API requests...') - time.sleep(2 ** count * 0.5) - count += 1 - - return wrapped - - def get_instance_metadata(): - return boto.utils.get_instance_identity()['document'] + response = requests.put( + url='http://169.254.169.254/latest/api/token', # AWS EC2 metadata service endpoint to get a token + headers={'X-aws-ec2-metadata-token-ttl-seconds': '60'} + ) + token = response.text + instance_identity = requests.get( + url='http://169.254.169.254/latest/dynamic/instance-identity/document', + headers={'X-aws-ec2-metadata-token': token} + ) + return instance_identity.json() -@retry def associate_address(ec2, allocation_id, instance_id): - return ec2.associate_address(instance_id=instance_id, allocation_id=allocation_id, allow_reassociation=True) + return ec2.associate_address(InstanceId=instance_id, AllocationId=allocation_id, AllowReassociation=True) -@retry def tag_resource(ec2, resource_id, tags): - return ec2.create_tags([resource_id], tags) + return ec2.create_tags(Resources=[resource_id], Tags=tags) -@retry def list_volumes(ec2, instance_id): - return ec2.get_all_volumes(filters={'attachment.instance-id': instance_id}) + return ec2.describe_volumes(Filters=[{'Name': 'attachment.instance-id', 'Values': [instance_id]}]) -@retry def get_instance(ec2, instance_id): - return ec2.get_only_instances([instance_id])[0] + return ec2.describe_instances(InstanceIds=[instance_id])['Reservations'][0]['Instances'][0] def main(): @@ -65,30 +54,35 @@ def main(): instance_id = metadata['instanceId'] - ec2 = boto.ec2.connect_to_region(metadata['region']) + config = Config( + region_name=metadata['region'], + retries={ + 'max_attempts': 10, + 'mode': 'standard' + } + ) + ec2 = boto3.client('ec2', config=config) if argc == 5 and role in ('primary', 'standby_leader') and action in ('on_start', 'on_role_change'): associate_address(ec2, sys.argv[1], instance_id) instance = get_instance(ec2, instance_id) - tags = {'Role': LEADER_TAG_VALUE if role == 'primary' else role} + tags = [{'Key': 'Role', 'Value': LEADER_TAG_VALUE if role == 'primary' else role}] tag_resource(ec2, instance_id, tags) - tags.update({'Instance': instance_id}) + tags.append({'Key': 'Instance', 'Value': instance_id}) volumes = list_volumes(ec2, instance_id) - for v in volumes: - if 'Name' in v.tags: + for v in volumes['Volumes']: + if any(tag['Key'] == 'Name' for tag in v.get('Tags', [])): tags_to_update = tags else: - if v.attach_data.device == instance.root_device_name: - volume_device = 'root' - else: - volume_device = 'data' - tags_to_update = dict(tags, Name='spilo_{}_{}'.format(cluster, volume_device)) + for attachment in v['Attachments']: + volume_device = 'root' if attachment['Device'] == instance.get('RootDeviceName') else 'data' + tags_to_update = tags + [{'Key': 'Name', 'Value': 'spilo_{}_{}'.format(cluster, volume_device)}] - tag_resource(ec2, v.id, tags_to_update) + tag_resource(ec2, v.get('VolumeId'), tags_to_update) if __name__ == '__main__': diff --git a/postgres-appliance/scripts/configure_spilo.py b/postgres-appliance/scripts/configure_spilo.py index afb2f2929..61f00acfa 100755 --- a/postgres-appliance/scripts/configure_spilo.py +++ b/postgres-appliance/scripts/configure_spilo.py @@ -34,12 +34,12 @@ USE_KUBERNETES = os.environ.get('KUBERNETES_SERVICE_HOST') is not None KUBERNETES_DEFAULT_LABELS = '{"application": "spilo"}' PATRONI_DCS = ('kubernetes', 'zookeeper', 'exhibitor', 'consul', 'etcd3', 'etcd') -AUTO_ENABLE_WALG_RESTORE = ('WAL_S3_BUCKET', 'WALE_S3_PREFIX', 'WALG_S3_PREFIX', 'WALG_AZ_PREFIX', 'WALG_SSH_PREFIX') +AUTO_ENABLE_WALG_RESTORE = ('WAL_S3_BUCKET', 'WALG_S3_PREFIX', 'WALG_AZ_PREFIX', 'WALG_SSH_PREFIX') WALG_SSH_NAMES = ['WALG_SSH_PREFIX', 'SSH_PRIVATE_KEY_PATH', 'SSH_USERNAME', 'SSH_PORT'] def parse_args(): - sections = ['all', 'patroni', 'pgqd', 'certificate', 'wal-e', 'crontab', + sections = ['all', 'patroni', 'pgqd', 'certificate', 'wal-g', 'crontab', 'pam-oauth2', 'pgbouncer', 'bootstrap', 'standby-cluster', 'log'] argp = argparse.ArgumentParser(description='Configures Spilo', epilog="Choose from the following sections:\n\t{}".format('\n\t'.join(sections)), @@ -174,14 +174,14 @@ def deep_update(a, b): {{#STANDBY_CLUSTER}} standby_cluster: create_replica_methods: - {{#STANDBY_WITH_WALE}} + {{#STANDBY_WITH_WALG}} - bootstrap_standby_with_wale - {{/STANDBY_WITH_WALE}} + {{/STANDBY_WITH_WALG}} - basebackup_fast_xlog - {{#STANDBY_WITH_WALE}} - restore_command: envdir "{{STANDBY_WALE_ENV_DIR}}" timeout "{{WAL_RESTORE_TIMEOUT}}" + {{#STANDBY_WITH_WALG}} + restore_command: envdir "{{STANDBY_WALG_ENV_DIR}}" timeout "{{WAL_RESTORE_TIMEOUT}}" /scripts/restore_command.sh "%f" "%p" - {{/STANDBY_WITH_WALE}} + {{/STANDBY_WITH_WALG}} {{#STANDBY_HOST}} host: {{STANDBY_HOST}} {{/STANDBY_HOST}} @@ -225,13 +225,13 @@ def deep_update(a, b): autovacuum_max_workers: 5 autovacuum_vacuum_scale_factor: 0.05 autovacuum_analyze_scale_factor: 0.02 - {{#CLONE_WITH_WALE}} - method: clone_with_wale - clone_with_wale: - command: envdir "{{CLONE_WALE_ENV_DIR}}" python3 /scripts/clone_with_wale.py + {{#CLONE_WITH_WALG}} + method: clone_with_walg + clone_with_walg: + command: envdir "{{CLONE_WALG_ENV_DIR}}" python3 /scripts/clone_with_walg.py --recovery-target-time="{{CLONE_TARGET_TIME}}" recovery_conf: - restore_command: envdir "{{CLONE_WALE_ENV_DIR}}" timeout "{{WAL_RESTORE_TIMEOUT}}" + restore_command: envdir "{{CLONE_WALG_ENV_DIR}}" timeout "{{WAL_RESTORE_TIMEOUT}}" /scripts/restore_command.sh "%f" "%p" recovery_target_timeline: "{{CLONE_TARGET_TIMELINE}}" {{#USE_PAUSE_AT_RECOVERY_TARGET}} @@ -246,7 +246,7 @@ def deep_update(a, b): {{^CLONE_TARGET_INCLUSIVE}} recovery_target_inclusive: false {{/CLONE_TARGET_INCLUSIVE}} - {{/CLONE_WITH_WALE}} + {{/CLONE_WITH_WALG}} {{#CLONE_WITH_BASEBACKUP}} method: clone_with_basebackup clone_with_basebackup: @@ -347,11 +347,11 @@ def deep_update(a, b): - hostssl all all all md5 {{/ALLOW_NOSSL}} - {{#USE_WALE}} + {{#USE_WALG}} recovery_conf: - restore_command: envdir "{{WALE_ENV_DIR}}" timeout "{{WAL_RESTORE_TIMEOUT}}" + restore_command: envdir "{{WALG_ENV_DIR}}" timeout "{{WAL_RESTORE_TIMEOUT}}" /scripts/restore_command.sh "%f" "%p" - {{/USE_WALE}} + {{/USE_WALG}} authentication: superuser: username: {{PGUSER_SUPERUSER}} @@ -369,29 +369,29 @@ def deep_update(a, b): on_role_change: '/scripts/on_role_change.sh {{HUMAN_ROLE}} true' {{/CALLBACK_SCRIPT}} create_replica_method: - {{#USE_WALE}} - - wal_e - {{/USE_WALE}} + {{#USE_WALG}} + - wal_g + {{/USE_WALG}} - basebackup_fast_xlog - {{#USE_WALE}} - wal_e: - command: envdir {{WALE_ENV_DIR}} bash /scripts/wale_restore.sh - threshold_megabytes: {{WALE_BACKUP_THRESHOLD_MEGABYTES}} - threshold_backup_size_percentage: {{WALE_BACKUP_THRESHOLD_PERCENTAGE}} + {{#USE_WALG}} + wal_g: + command: envdir {{WALG_ENV_DIR}} bash /scripts/walg_restore.sh + threshold_megabytes: {{WALG_BACKUP_THRESHOLD_MEGABYTES}} + threshold_backup_size_percentage: {{WALG_BACKUP_THRESHOLD_PERCENTAGE}} retries: 2 no_leader: 1 - {{/USE_WALE}} + {{/USE_WALG}} basebackup_fast_xlog: command: /scripts/basebackup.sh retries: 2 -{{#STANDBY_WITH_WALE}} +{{#STANDBY_WITH_WALG}} bootstrap_standby_with_wale: - command: envdir "{{STANDBY_WALE_ENV_DIR}}" bash /scripts/wale_restore.sh - threshold_megabytes: {{WALE_BACKUP_THRESHOLD_MEGABYTES}} - threshold_backup_size_percentage: {{WALE_BACKUP_THRESHOLD_PERCENTAGE}} + command: envdir "{{STANDBY_WALG_ENV_DIR}}" bash /scripts/walg_restore.sh + threshold_megabytes: {{WALG_BACKUP_THRESHOLD_MEGABYTES}} + threshold_backup_size_percentage: {{WALG_BACKUP_THRESHOLD_PERCENTAGE}} retries: 2 no_leader: 1 -{{/STANDBY_WITH_WALE}} +{{/STANDBY_WITH_WALG}} ''' @@ -409,12 +409,25 @@ def get_provider(): try: logging.info("Figuring out my environment (Google? AWS? Openstack? Local?)") - r = requests.get('http://169.254.169.254', timeout=2) + response = requests.put( + url='http://169.254.169.254/latest/api/token', + headers={'X-aws-ec2-metadata-token-ttl-seconds': '60'}, + timeout=2 + ) + if not response.ok: + logging.info("Failed to get IMDS token (status %s), assuming local Docker setup", response.status_code) + return PROVIDER_LOCAL + token = response.text + r = requests.get( + url='http://169.254.169.254', + headers={'X-aws-ec2-metadata-token': token}, + timeout=2 + ) if r.headers.get('Metadata-Flavor', '') == 'Google': return PROVIDER_GOOGLE else: # accessible on Openstack, will fail on AWS - r = requests.get('http://169.254.169.254/openstack/latest/meta_data.json') + r = requests.get('http://169.254.169.254/openstack/latest/meta_data.json', timeout=2) if r.ok: # make sure the response is parsable - https://github.com/Azure/aad-pod-identity/issues/943 and # https://github.com/zalando/spilo/issues/542 @@ -422,7 +435,11 @@ def get_provider(): return PROVIDER_OPENSTACK # is accessible from both AWS and Openstack, Possiblity of misidentification if previous try fails - r = requests.get('http://169.254.169.254/latest/meta-data/ami-id') + r = requests.get( + url='http://169.254.169.254/latest/meta-data/ami-id', + headers={'X-aws-ec2-metadata-token': token}, + timeout=2 + ) return PROVIDER_AWS if r.ok else PROVIDER_UNSUPPORTED except (requests.exceptions.ConnectTimeout, requests.exceptions.ConnectionError, requests.exceptions.ReadTimeout): logging.info("Could not connect to 169.254.169.254, assuming local Docker setup") @@ -474,32 +491,21 @@ def get_instance_metadata(provider): return metadata -def set_extended_wale_placeholders(placeholders, prefix): - """ checks that enough parameters are provided to configure cloning or standby with WAL-E """ +def set_extended_walg_placeholders(placeholders, prefix): + """ checks that enough parameters are provided to configure cloning or standby with WAL-G """ for name in ('S3', 'GS', 'GCS', 'SWIFT', 'AZ'): - if placeholders.get('{0}WALE_{1}_PREFIX'.format(prefix, name)) or\ - name in ('S3', 'GS', 'AZ') and placeholders.get('{0}WALG_{1}_PREFIX'.format(prefix, name)) or\ + if placeholders.get('{0}WALG_{1}_PREFIX'.format(prefix, name)) or\ placeholders.get('{0}WAL_{1}_BUCKET'.format(prefix, name)) and placeholders.get(prefix + 'SCOPE'): break else: return False scope = placeholders.get(prefix + 'SCOPE') dirname = 'env-' + prefix[:-1].lower() + ('-' + scope if scope else '') - placeholders[prefix + 'WALE_ENV_DIR'] = os.path.join(placeholders['RW_DIR'], 'etc', 'wal-e.d', dirname) - placeholders[prefix + 'WITH_WALE'] = True + placeholders[prefix + 'WALG_ENV_DIR'] = os.path.join(placeholders['RW_DIR'], 'etc', 'wal-e.d', dirname) + placeholders[prefix + 'WITH_WALG'] = True return name -def set_walg_placeholders(placeholders, prefix=''): - walg_supported = any(placeholders.get(prefix + n) for n in AUTO_ENABLE_WALG_RESTORE + - ('WAL_GS_BUCKET', 'WALE_GS_PREFIX', 'WALG_GS_PREFIX')) - default = placeholders.get('USE_WALG', False) - placeholders.setdefault(prefix + 'USE_WALG', default) - for name in ('USE_WALG_BACKUP', 'USE_WALG_RESTORE'): - value = str(placeholders.get(prefix + name, placeholders[prefix + 'USE_WALG'])).lower() - placeholders[prefix + name] = 'true' if value == 'true' and walg_supported else None - - def get_listen_ip(): """ Get IP to listen on for things that don't natively support detecting IPv4/IPv6 dualstack """ def has_dual_stack(): @@ -524,7 +530,16 @@ def has_dual_stack(): def get_placeholders(provider): - placeholders = dict(os.environ) + placeholders = {} + for key, value in os.environ.items(): + if "WALE" in key: + new_key = key.replace("WALE", "WALG") # backward compatibility + if new_key in os.environ: + # skip, because a real WALG env already exists + continue + placeholders[new_key] = value + else: + placeholders[key] = value placeholders.setdefault('PGHOME', os.path.expanduser('~')) placeholders.setdefault('APIPORT', '8008') @@ -532,7 +547,7 @@ def get_placeholders(provider): placeholders.setdefault('BACKUP_NUM_TO_RETAIN', '5') placeholders.setdefault('CRONTAB', '[]') placeholders.setdefault('PGROOT', os.path.join(placeholders['PGHOME'], 'pgroot')) - placeholders.setdefault('WALE_TMPDIR', os.path.abspath(os.path.join(placeholders['PGROOT'], '../tmp'))) + placeholders.setdefault('WALG_TMPDIR', os.path.abspath(os.path.join(placeholders['PGROOT'], '../tmp'))) placeholders.setdefault('PGDATA', os.path.join(placeholders['PGROOT'], 'pgdata')) placeholders.setdefault('HUMAN_ROLE', 'zalandos') placeholders.setdefault('PGUSER_STANDBY', 'standby') @@ -555,8 +570,8 @@ def get_placeholders(provider): placeholders.setdefault('SSL_RESTAPI_CA_FILE', '') placeholders.setdefault('SSL_RESTAPI_CERTIFICATE_FILE', '') placeholders.setdefault('SSL_RESTAPI_PRIVATE_KEY_FILE', '') - placeholders.setdefault('WALE_BACKUP_THRESHOLD_MEGABYTES', 102400) - placeholders.setdefault('WALE_BACKUP_THRESHOLD_PERCENTAGE', 30) + placeholders.setdefault('WALG_BACKUP_THRESHOLD_MEGABYTES', 102400) + placeholders.setdefault('WALG_BACKUP_THRESHOLD_PERCENTAGE', 30) placeholders.setdefault('INITDB_LOCALE', 'en_US') placeholders.setdefault('CLONE_TARGET_TIMELINE', 'latest') # if Kubernetes is defined as a DCS, derive the namespace from the POD_NAMESPACE, if not set explicitely. @@ -569,8 +584,9 @@ def get_placeholders(provider): if placeholders['NAMESPACE'] not in ('default', '') else '') placeholders.setdefault('WAL_BUCKET_SCOPE_SUFFIX', '') placeholders.setdefault('WAL_RESTORE_TIMEOUT', '0') - placeholders.setdefault('WALE_ENV_DIR', os.path.join(placeholders['RW_DIR'], 'etc', 'wal-e.d', 'env')) - placeholders.setdefault('USE_WALE', False) + # the env dir path is still called "wal-e.d" for backwards compatibility: many existing deployments, scripts, + # or manifests expect this path, even though wal-e itself is not used (wal-g reads env vars from here too) + placeholders.setdefault('WALG_ENV_DIR', os.path.join(placeholders['RW_DIR'], 'etc', 'wal-e.d', 'env')) cpu_count = str(min(psutil.cpu_count(), 10)) placeholders.setdefault('WALG_DOWNLOAD_CONCURRENCY', cpu_count) placeholders.setdefault('WALG_UPLOAD_CONCURRENCY', cpu_count) @@ -588,7 +604,7 @@ def get_placeholders(provider): placeholders.setdefault('KUBERNETES_BOOTSTRAP_LABELS', '{}') placeholders.setdefault('USE_PAUSE_AT_RECOVERY_TARGET', False) placeholders.setdefault('CLONE_METHOD', '') - placeholders.setdefault('CLONE_WITH_WALE', '') + placeholders.setdefault('CLONE_WITH_WALG', '') placeholders.setdefault('CLONE_WITH_BASEBACKUP', '') placeholders.setdefault('CLONE_TARGET_TIME', '') placeholders.setdefault('CLONE_TARGET_INCLUSIVE', True) @@ -608,18 +624,17 @@ def get_placeholders(provider): else: placeholders['LOG_SHIP_HOURLY'] = '' - # see comment for wal-e bucket prefix + # use namespaces to set WAL bucket prefix scope naming the folder namespace-clustername for non-default namespace placeholders.setdefault('LOG_BUCKET_SCOPE_PREFIX', '{0}-'.format(placeholders['NAMESPACE']) if placeholders['NAMESPACE'] not in ('default', '') else '') - if placeholders['CLONE_METHOD'] == 'CLONE_WITH_WALE': + placeholders['CLONE_METHOD'] = placeholders['CLONE_METHOD'].replace('WALE', 'WALG') # backwards compatibility + if placeholders['CLONE_METHOD'] == 'CLONE_WITH_WALG': # modify placeholders and take care of error cases - name = set_extended_wale_placeholders(placeholders, 'CLONE_') + name = set_extended_walg_placeholders(placeholders, 'CLONE_') if name is False: - logging.warning('Cloning with WAL-E is only possible when CLONE_WALE_*_PREFIX ' - 'or CLONE_WALG_*_PREFIX or CLONE_WAL_*_BUCKET and CLONE_SCOPE are set.') - elif name == 'S3': - placeholders.setdefault('CLONE_USE_WALG', 'true') + logging.warning('Cloning with WAL-G is only possible when CLONE_WALG_*_PREFIX ' + 'or CLONE_WAL_*_BUCKET and CLONE_SCOPE are set.') elif placeholders['CLONE_METHOD'] == 'CLONE_WITH_BASEBACKUP': clone_scope = placeholders.get('CLONE_SCOPE') if clone_scope and placeholders.get('CLONE_HOST') \ @@ -632,14 +647,13 @@ def get_placeholders(provider): logging.warning("Clone method is set to basebackup, but no 'CLONE_SCOPE' " "or 'CLONE_HOST' or 'CLONE_USER' or 'CLONE_PASSWORD' specified") else: - if set_extended_wale_placeholders(placeholders, 'STANDBY_') == 'S3': - placeholders.setdefault('STANDBY_USE_WALG', 'true') + set_extended_walg_placeholders(placeholders, 'STANDBY_') - placeholders.setdefault('STANDBY_WITH_WALE', '') + placeholders.setdefault('STANDBY_WITH_WALG', '') placeholders.setdefault('STANDBY_HOST', '') placeholders.setdefault('STANDBY_PORT', '') placeholders.setdefault('STANDBY_PRIMARY_SLOT_NAME', '') - placeholders.setdefault('STANDBY_CLUSTER', placeholders['STANDBY_WITH_WALE'] or placeholders['STANDBY_HOST']) + placeholders.setdefault('STANDBY_CLUSTER', placeholders['STANDBY_WITH_WALG'] or placeholders['STANDBY_HOST']) if provider == PROVIDER_AWS and not USE_KUBERNETES: # AWS specific callback to tag the instances with roles @@ -647,17 +661,9 @@ def get_placeholders(provider): if placeholders.get('EIP_ALLOCATION'): placeholders['CALLBACK_SCRIPT'] += ' ' + placeholders['EIP_ALLOCATION'] - if any(placeholders.get(n) for n in AUTO_ENABLE_WALG_RESTORE): - placeholders.setdefault('USE_WALG_RESTORE', 'true') - if placeholders.get('WALG_AZ_PREFIX'): - placeholders.setdefault('USE_WALG_BACKUP', 'true') - if all(placeholders.get(n) for n in WALG_SSH_NAMES): - placeholders.setdefault('USE_WALG_BACKUP', 'true') - set_walg_placeholders(placeholders) - - placeholders['USE_WALE'] = any(placeholders.get(n) for n in AUTO_ENABLE_WALG_RESTORE + - ('WAL_SWIFT_BUCKET', 'WALE_SWIFT_PREFIX', 'WAL_GCS_BUCKET', - 'WAL_GS_BUCKET', 'WALE_GS_PREFIX', 'WALG_GS_PREFIX')) + # check if we have enough parameters to enable WAL-G + placeholders['USE_WALG'] = any(placeholders.get(n) for n in AUTO_ENABLE_WALG_RESTORE + + ('WAL_SWIFT_BUCKET', 'WAL_GS_BUCKET', 'WALG_GS_PREFIX')) if placeholders.get('WALG_BACKUP_FROM_REPLICA'): placeholders['WALG_BACKUP_FROM_REPLICA'] = str(placeholders['WALG_BACKUP_FROM_REPLICA']).lower() @@ -669,10 +675,9 @@ def get_placeholders(provider): placeholders.setdefault('postgresql', {}) placeholders['postgresql'].setdefault('parameters', {}) - placeholders['WALE_BINARY'] = 'wal-g' if placeholders.get('USE_WALG_BACKUP') == 'true' else 'wal-e' placeholders['postgresql']['parameters']['archive_command'] = \ - 'envdir "{WALE_ENV_DIR}" {WALE_BINARY} wal-push "%p"'.format(**placeholders) \ - if placeholders['USE_WALE'] else '/bin/true' + 'envdir "{WALG_ENV_DIR}" wal-g wal-push "%p"'.format(**placeholders) \ + if placeholders['USE_WALG'] else '/bin/true' cgroup_memory_limit_path = '/sys/fs/cgroup/memory/memory.limit_in_bytes' cgroup_v2_memory_limit_path = '/sys/fs/cgroup/memory.max' @@ -833,114 +838,104 @@ def write_log_environment(placeholders): write_file(log_env[var], os.path.join(log_env['LOG_ENV_DIR'], var), True) -def write_wale_environment(placeholders, prefix, overwrite): - s3_names = ['WALE_S3_PREFIX', 'WALG_S3_PREFIX', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', - 'WALE_S3_ENDPOINT', 'AWS_ENDPOINT', 'AWS_REGION', 'AWS_INSTANCE_PROFILE', 'WALE_DISABLE_S3_SSE', +def write_walg_environment(placeholders, prefix, overwrite): + s3_names = ['WALG_S3_PREFIX', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', + 'WALG_S3_ENDPOINT', 'AWS_ENDPOINT', 'AWS_REGION', 'AWS_INSTANCE_PROFILE', 'WALG_S3_SSE_KMS_ID', 'WALG_S3_SSE', 'WALG_DISABLE_S3_SSE', 'AWS_S3_FORCE_PATH_STYLE', 'AWS_ROLE_ARN', 'AWS_WEB_IDENTITY_TOKEN_FILE', 'AWS_STS_REGIONAL_ENDPOINTS'] azure_names = ['WALG_AZ_PREFIX', 'AZURE_STORAGE_ACCOUNT', 'WALG_AZURE_BUFFER_SIZE', 'WALG_AZURE_MAX_BUFFERS', 'AZURE_ENVIRONMENT_NAME'] azure_auth_names = ['AZURE_STORAGE_ACCESS_KEY', 'AZURE_STORAGE_SAS_TOKEN', 'AZURE_CLIENT_ID', 'AZURE_CLIENT_SECRET', 'AZURE_TENANT_ID'] - gs_names = ['WALE_GS_PREFIX', 'WALG_GS_PREFIX', 'GOOGLE_APPLICATION_CREDENTIALS'] - swift_names = ['WALE_SWIFT_PREFIX', 'SWIFT_AUTHURL', 'SWIFT_TENANT', 'SWIFT_TENANT_ID', 'SWIFT_USER', + gs_names = ['WALG_GS_PREFIX', 'GOOGLE_APPLICATION_CREDENTIALS'] + swift_names = ['WALG_SWIFT_PREFIX', 'SWIFT_AUTHURL', 'SWIFT_TENANT', 'SWIFT_TENANT_ID', 'SWIFT_USER', 'SWIFT_USER_ID', 'SWIFT_USER_DOMAIN_NAME', 'SWIFT_USER_DOMAIN_ID', 'SWIFT_PASSWORD', 'SWIFT_AUTH_VERSION', 'SWIFT_ENDPOINT_TYPE', 'SWIFT_REGION', 'SWIFT_DOMAIN_NAME', 'SWIFT_DOMAIN_ID', 'SWIFT_PROJECT_NAME', 'SWIFT_PROJECT_ID', 'SWIFT_PROJECT_DOMAIN_NAME', 'SWIFT_PROJECT_DOMAIN_ID'] ssh_names = WALG_SSH_NAMES walg_names = ['WALG_DELTA_MAX_STEPS', 'WALG_DELTA_ORIGIN', 'WALG_DOWNLOAD_CONCURRENCY', 'WALG_UPLOAD_CONCURRENCY', 'WALG_UPLOAD_DISK_CONCURRENCY', 'WALG_DISK_RATE_LIMIT', - 'WALG_NETWORK_RATE_LIMIT', 'WALG_COMPRESSION_METHOD', 'USE_WALG_BACKUP', - 'USE_WALG_RESTORE', 'WALG_BACKUP_COMPRESSION_METHOD', 'WALG_BACKUP_FROM_REPLICA', + 'WALG_NETWORK_RATE_LIMIT', 'WALG_COMPRESSION_METHOD', + 'WALG_BACKUP_COMPRESSION_METHOD', 'WALG_BACKUP_FROM_REPLICA', 'WALG_SENTINEL_USER_DATA', 'WALG_PREVENT_WAL_OVERWRITE', 'WALG_S3_CA_CERT_FILE', 'WALG_LIBSODIUM_KEY', 'WALG_LIBSODIUM_KEY_PATH', 'WALG_LIBSODIUM_KEY_TRANSFORM', 'WALG_PGP_KEY', 'WALG_PGP_KEY_PATH', 'WALG_PGP_KEY_PASSPHRASE', 'no_proxy', 'http_proxy', 'https_proxy'] aws_imds_names = ['AWS_EC2_METADATA_SERVICE_ENDPOINT', 'AWS_EC2_METADATA_SERVICE_ENDPOINT_MODE'] - wale = defaultdict(lambda: '') - for name in ['PGVERSION', 'PGPORT', 'WALE_ENV_DIR', 'SCOPE', 'WAL_BUCKET_SCOPE_PREFIX', 'WAL_BUCKET_SCOPE_SUFFIX', - 'WAL_S3_BUCKET', 'WAL_GCS_BUCKET', 'WAL_GS_BUCKET', 'WAL_SWIFT_BUCKET', 'BACKUP_NUM_TO_RETAIN', + walg = defaultdict(lambda: '') + for name in ['PGVERSION', 'PGPORT', 'WALG_ENV_DIR', 'SCOPE', 'WAL_BUCKET_SCOPE_PREFIX', 'WAL_BUCKET_SCOPE_SUFFIX', + 'WAL_S3_BUCKET', 'WAL_GS_BUCKET', 'WAL_SWIFT_BUCKET', 'BACKUP_NUM_TO_RETAIN', 'ENABLE_WAL_PATH_COMPAT'] + s3_names + swift_names + gs_names + walg_names + azure_names + \ azure_auth_names + ssh_names: - wale[name] = placeholders.get(prefix + name, '') + walg[name] = placeholders.get(prefix + name, '') - if wale.get('WAL_S3_BUCKET') or wale.get('WALE_S3_PREFIX') or wale.get('WALG_S3_PREFIX'): - wale_endpoint = wale.pop('WALE_S3_ENDPOINT', None) - aws_endpoint = wale.pop('AWS_ENDPOINT', None) - aws_region = wale.pop('AWS_REGION', None) + if walg.get('WAL_S3_BUCKET') or walg.get('WALG_S3_PREFIX'): + walg_endpoint = walg.pop('WALG_S3_ENDPOINT', None) + aws_endpoint = walg.pop('AWS_ENDPOINT', None) + aws_region = walg.pop('AWS_REGION', None) - # for S3-compatible storage we want to specify WALE_S3_ENDPOINT and AWS_ENDPOINT, but not AWS_REGION - if aws_endpoint or wale_endpoint: + # for S3-compatible storage we want to specify WALG_S3_ENDPOINT and AWS_ENDPOINT, but not AWS_REGION + if aws_endpoint or walg_endpoint: convention = 'path' - if not wale_endpoint: - wale_endpoint = aws_endpoint.replace('://', '+path://') + if not walg_endpoint: + walg_endpoint = aws_endpoint.replace('://', '+path://') else: - match = re.match(r'^(\w+)\+(\w+)(://.+)$', wale_endpoint) + match = re.match(r'^(\w+)\+(\w+)(://.+)$', walg_endpoint) if match: convention = match.group(2) else: - logging.warning('Invalid WALE_S3_ENDPOINT, the format is protocol+convention://hostname:port, ' - 'but got %s', wale_endpoint) + logging.warning('Invalid WALG_S3_ENDPOINT, the format is protocol+convention://hostname:port, ' + 'but got %s', walg_endpoint) if not aws_endpoint: - aws_endpoint = match.expand(r'\1\3') if match else wale_endpoint - wale.update(WALE_S3_ENDPOINT=wale_endpoint, AWS_ENDPOINT=aws_endpoint) - for name in ('WALE_DISABLE_S3_SSE', 'WALG_DISABLE_S3_SSE'): - if not wale.get(name): - wale[name] = 'true' - wale['AWS_S3_FORCE_PATH_STYLE'] = 'true' if convention == 'path' else 'false' - if aws_region and wale.get('USE_WALG_BACKUP') == 'true': - wale['AWS_REGION'] = aws_region + aws_endpoint = match.expand(r'\1\3') if match else walg_endpoint + walg.update(WALG_S3_ENDPOINT=walg_endpoint, AWS_ENDPOINT=aws_endpoint) + walg.setdefault('WALG_DISABLE_S3_SSE', 'true') + walg['AWS_S3_FORCE_PATH_STYLE'] = 'true' if convention == 'path' else 'false' + if aws_region: + walg['AWS_REGION'] = aws_region elif not aws_region: # try to determine region from the endpoint or bucket name - name = wale.get('WAL_S3_BUCKET') or wale.get('WALE_S3_PREFIX') + name = walg.get('WAL_S3_BUCKET') or walg.get('WALG_S3_PREFIX') match = re.search(r'.*(\w{2}-\w+-\d)-.*', name) if match: aws_region = match.group(1) else: aws_region = placeholders['instance_data']['zone'][:-1] - wale['AWS_REGION'] = aws_region + walg['AWS_REGION'] = aws_region else: - wale['AWS_REGION'] = aws_region - - if not (wale.get('AWS_SECRET_ACCESS_KEY') and wale.get('AWS_ACCESS_KEY_ID')): - wale['AWS_INSTANCE_PROFILE'] = 'true' + walg['AWS_REGION'] = aws_region - if wale.get('WALE_DISABLE_S3_SSE') and not wale.get('WALG_DISABLE_S3_SSE'): - wale['WALG_DISABLE_S3_SSE'] = wale['WALE_DISABLE_S3_SSE'] + if not (walg.get('AWS_SECRET_ACCESS_KEY') and walg.get('AWS_ACCESS_KEY_ID')): + walg['AWS_INSTANCE_PROFILE'] = 'true' - if wale.get('USE_WALG_BACKUP') and wale.get('WALG_DISABLE_S3_SSE') != 'true' and not wale.get('WALG_S3_SSE'): - wale['WALG_S3_SSE'] = 'AES256' + if walg.get('WALG_DISABLE_S3_SSE') != 'true' and not walg.get('WALG_S3_SSE'): + walg['WALG_S3_SSE'] = 'AES256' # write IMDS env vars for any prefix if defined for name in aws_imds_names: if placeholders.get(name): - wale[name] = placeholders.get(name) + walg[name] = placeholders.get(name) write_envdir_names = s3_names + walg_names + aws_imds_names - elif wale.get('WAL_GCS_BUCKET') or wale.get('WAL_GS_BUCKET') or\ - wale.get('WALE_GCS_PREFIX') or wale.get('WALE_GS_PREFIX') or wale.get('WALG_GS_PREFIX'): - if wale.get('WALE_GCS_PREFIX'): - wale['WALE_GS_PREFIX'] = wale['WALE_GCS_PREFIX'] - elif wale.get('WAL_GCS_BUCKET'): - wale['WAL_GS_BUCKET'] = wale['WAL_GCS_BUCKET'] + elif walg.get('WAL_GS_BUCKET') or walg.get('WALG_GS_PREFIX'): write_envdir_names = gs_names + walg_names - elif wale.get('WAL_SWIFT_BUCKET') or wale.get('WALE_SWIFT_PREFIX'): + elif walg.get('WAL_SWIFT_BUCKET') or walg.get('WALG_SWIFT_BUCKET'): write_envdir_names = swift_names - elif wale.get("WALG_AZ_PREFIX"): + elif walg.get("WALG_AZ_PREFIX"): azure_auth = [] auth_opts = 0 - if wale.get('AZURE_STORAGE_ACCESS_KEY'): + if walg.get('AZURE_STORAGE_ACCESS_KEY'): azure_auth.append('AZURE_STORAGE_ACCESS_KEY') auth_opts += 1 - if wale.get('AZURE_STORAGE_SAS_TOKEN'): + if walg.get('AZURE_STORAGE_SAS_TOKEN'): if auth_opts == 0: azure_auth.append('AZURE_STORAGE_SAS_TOKEN') auth_opts += 1 - if wale.get('AZURE_CLIENT_ID') and wale.get('AZURE_CLIENT_SECRET') and wale.get('AZURE_TENANT_ID'): + if walg.get('AZURE_CLIENT_ID') and walg.get('AZURE_CLIENT_SECRET') and walg.get('AZURE_TENANT_ID'): if auth_opts == 0: azure_auth.extend(['AZURE_CLIENT_ID', 'AZURE_CLIENT_SECRET', 'AZURE_TENANT_ID']) auth_opts += 1 @@ -953,47 +948,51 @@ def write_wale_environment(placeholders, prefix, overwrite): write_envdir_names = azure_names + azure_auth + walg_names - elif wale.get("WALG_SSH_PREFIX"): + elif walg.get("WALG_SSH_PREFIX"): write_envdir_names = ssh_names + walg_names else: return prefix_env_name = write_envdir_names[0] store_type = prefix_env_name[5:].split('_')[0] - if not wale.get(prefix_env_name): # WALE_*_PREFIX is not defined in the environment - bucket_path = '/spilo/{WAL_BUCKET_SCOPE_PREFIX}{SCOPE}{WAL_BUCKET_SCOPE_SUFFIX}/wal/{PGVERSION}'.format(**wale) + if not walg.get(prefix_env_name): # WALG_*_PREFIX is not defined in the environment + bucket_path = '/spilo/{WAL_BUCKET_SCOPE_PREFIX}{SCOPE}{WAL_BUCKET_SCOPE_SUFFIX}/wal/{PGVERSION}'.format(**walg) prefix_template = '{0}://{{WAL_{1}_BUCKET}}{2}'.format(store_type.lower(), store_type, bucket_path) - wale[prefix_env_name] = prefix_template.format(**wale) + walg[prefix_env_name] = prefix_template.format(**walg) # Set WALG_*_PREFIX for future compatibility - if store_type in ('S3', 'GS') and not wale.get(write_envdir_names[1]): - wale[write_envdir_names[1]] = wale[prefix_env_name] + if store_type in ('S3', 'GS') and not walg.get(write_envdir_names[1]): + walg[write_envdir_names[1]] = walg[prefix_env_name] - if not os.path.exists(wale['WALE_ENV_DIR']): - os.makedirs(wale['WALE_ENV_DIR']) + if not os.path.exists(walg['WALG_ENV_DIR']): + os.makedirs(walg['WALG_ENV_DIR']) - wale['WALE_LOG_DESTINATION'] = 'stderr' - for name in write_envdir_names + ['WALE_LOG_DESTINATION', 'PGPORT'] + ([] if prefix else ['BACKUP_NUM_TO_RETAIN']): - if wale.get(name): - path = os.path.join(wale['WALE_ENV_DIR'], name) - write_file(wale[name], path, overwrite) + walg['WALG_LOG_DESTINATION'] = 'stderr' + for name in write_envdir_names + ['WALG_LOG_DESTINATION', 'PGPORT'] + ([] if prefix else ['BACKUP_NUM_TO_RETAIN']): + if walg.get(name): + path = os.path.join(walg['WALG_ENV_DIR'], name) + write_file(walg[name], path, overwrite) adjust_owner(path, gid=-1) - if not os.path.exists(placeholders['WALE_TMPDIR']): - os.makedirs(placeholders['WALE_TMPDIR']) - os.chmod(placeholders['WALE_TMPDIR'], 0o1777) + if not os.path.exists(placeholders['WALG_TMPDIR']): + os.makedirs(placeholders['WALG_TMPDIR']) + os.chmod(placeholders['WALG_TMPDIR'], 0o1777) - write_file(placeholders['WALE_TMPDIR'], os.path.join(wale['WALE_ENV_DIR'], 'TMPDIR'), True) + write_file(placeholders['WALG_TMPDIR'], os.path.join(walg['WALG_ENV_DIR'], 'TMPDIR'), True) -def update_and_write_wale_configuration(placeholders, prefix, overwrite): - set_walg_placeholders(placeholders, prefix) - write_wale_environment(placeholders, prefix, overwrite) +def update_and_write_walg_configuration(placeholders, prefix, overwrite): + write_walg_environment(placeholders, prefix, overwrite) def write_clone_pgpass(placeholders, overwrite): pgpassfile = placeholders['CLONE_PGPASS'] # pgpass is host:port:database:user:password - r = {'host': escape_pgpass_value(placeholders['CLONE_HOST']), + clone_host = escape_pgpass_value(placeholders['CLONE_HOST']) + # IPv6 addresses contain colons which conflict with the pgpass delimiter; + # wrap them in brackets so libpq can parse the host field correctly. + if ':' in str(clone_host): + clone_host = f'[{clone_host}]' + r = {'host': clone_host, 'port': placeholders['CLONE_PORT'], 'database': '*', 'user': escape_pgpass_value(placeholders['CLONE_USER']), @@ -1061,8 +1060,8 @@ def write_crontab(placeholders, overwrite): hash_dir = os.path.join(placeholders['RW_DIR'], 'tmp') lines += ['*/5 * * * * {0} /scripts/test_reload_ssl.sh {1}'.format(env, hash_dir)] - if bool(placeholders.get('USE_WALE')): - lines += [('{BACKUP_SCHEDULE} envdir "{WALE_ENV_DIR}" /scripts/postgres_backup.sh' + + if bool(placeholders.get('USE_WALG')): + lines += [('{BACKUP_SCHEDULE} envdir "{WALG_ENV_DIR}" /scripts/postgres_backup.sh' + ' "{PGDATA}"').format(**placeholders)] if bool(placeholders.get('LOG_S3_BUCKET')): @@ -1192,9 +1191,9 @@ def main(): elif section == 'log': if bool(placeholders.get('LOG_S3_BUCKET')): write_log_environment(placeholders) - elif section == 'wal-e': - if placeholders['USE_WALE']: - write_wale_environment(placeholders, '', args['force']) + elif section == 'wal-g': + if placeholders['USE_WALG']: + write_walg_environment(placeholders, '', args['force']) elif section == 'certificate': write_certificates(placeholders, args['force']) write_restapi_certificates(placeholders, args['force']) @@ -1205,18 +1204,18 @@ def main(): elif section == 'pgbouncer': write_pgbouncer_configuration(placeholders, args['force']) elif section == 'bootstrap': - if placeholders['CLONE_WITH_WALE']: - update_and_write_wale_configuration(placeholders, 'CLONE_', args['force']) + if placeholders['CLONE_WITH_WALG']: + update_and_write_walg_configuration(placeholders, 'CLONE_', args['force']) if placeholders['CLONE_WITH_BASEBACKUP']: write_clone_pgpass(placeholders, args['force']) elif section == 'standby-cluster': - if placeholders['STANDBY_WITH_WALE']: - update_and_write_wale_configuration(placeholders, 'STANDBY_', args['force']) + if placeholders['STANDBY_WITH_WALG']: + update_and_write_walg_configuration(placeholders, 'STANDBY_', args['force']) else: raise Exception('Unknown section: {}'.format(section)) # We will abuse non zero exit code as an indicator for the launch.sh that it should not even try to create a backup - sys.exit(int(not placeholders['USE_WALE'])) + sys.exit(int(not placeholders['USE_WALG'])) def escape_pgpass_value(val): diff --git a/postgres-appliance/scripts/hypopg/after-create.sql b/postgres-appliance/scripts/hypopg/after-create.sql new file mode 100644 index 000000000..32d5d47cd --- /dev/null +++ b/postgres-appliance/scripts/hypopg/after-create.sql @@ -0,0 +1,2 @@ +GRANT SELECT ON hypopg_hidden_indexes TO admin; +GRANT SELECT ON hypopg_list_indexes TO admin; diff --git a/postgres-appliance/scripts/post_init.sh b/postgres-appliance/scripts/post_init.sh index c12e9c83b..b1f8f325a 100755 --- a/postgres-appliance/scripts/post_init.sh +++ b/postgres-appliance/scripts/post_init.sh @@ -139,16 +139,12 @@ CREATE TABLE IF NOT EXISTS public.postgres_log ( query_pos integer, location text, application_name text, + backend_type text, + leader_pid integer, + query_id bigint, CONSTRAINT postgres_log_check CHECK (false) NO INHERIT ); GRANT SELECT ON public.postgres_log TO admin;" -if [ "$PGVER" -ge 13 ]; then - echo "ALTER TABLE public.postgres_log ADD COLUMN IF NOT EXISTS backend_type text;" -fi -if [ "$PGVER" -ge 14 ]; then - echo "ALTER TABLE public.postgres_log ADD COLUMN IF NOT EXISTS leader_pid integer;" - echo "ALTER TABLE public.postgres_log ADD COLUMN IF NOT EXISTS query_id bigint;" -fi # Sunday could be 0 or 7 depending on the format, we just create both LOG_SHIP_HOURLY=$(echo "SELECT text(current_setting('log_rotation_age') = '1h')" | psql -tAX -d postgres 2> /dev/null | tail -n 1) diff --git a/postgres-appliance/scripts/postgres_backup.sh b/postgres-appliance/scripts/postgres_backup.sh index 37ce37bc0..ce5850d3a 100755 --- a/postgres-appliance/scripts/postgres_backup.sh +++ b/postgres-appliance/scripts/postgres_backup.sh @@ -23,23 +23,13 @@ else log "ERROR: Recovery state unknown: $IN_RECOVERY" && exit 1 fi -if [[ "$USE_WALG_BACKUP" == "true" ]]; then - readonly WAL_E="wal-g" - [[ -z $WALG_BACKUP_COMPRESSION_METHOD ]] || export WALG_COMPRESSION_METHOD=$WALG_BACKUP_COMPRESSION_METHOD - export PGHOST=/var/run/postgresql -else - readonly WAL_E="wal-e" - - # Ensure we don't have more workes than CPU's - POOL_SIZE=$(grep -c ^processor /proc/cpuinfo 2>/dev/null || 1) - [ "$POOL_SIZE" -gt 4 ] && POOL_SIZE=4 - POOL_SIZE=(--pool-size "$POOL_SIZE") -fi +export WALG_COMPRESSION_METHOD="${WALG_BACKUP_COMPRESSION_METHOD:-$WALE_BACKUP_COMPRESSION_METHOD}" +export PGHOST=/var/run/postgresql # push a new base backup log "producing a new backup" # We reduce the priority of the backup for CPU consumption -nice -n 5 $WAL_E backup-push "$PGDATA" "${POOL_SIZE[@]}" +nice -n 5 wal-g backup-push "$PGDATA" # Collect all backups and sort them by modification time mapfile -t backup_records < <(wal-g backup-list 2>/dev/null | diff --git a/postgres-appliance/scripts/restore_command.sh b/postgres-appliance/scripts/restore_command.sh index a4bb88939..b861c6bdf 100755 --- a/postgres-appliance/scripts/restore_command.sh +++ b/postgres-appliance/scripts/restore_command.sh @@ -5,14 +5,14 @@ if [[ "$ENABLE_WAL_PATH_COMPAT" = "true" ]]; then bash "$(readlink -f "${BASH_SOURCE[0]}")" "$@" exitcode=$? [[ $exitcode = 0 ]] && exit 0 - for wale_env in $(printenv -0 | tr '\n' ' ' | sed 's/\x00/\n/g' | sed -n 's/^\(WAL[EG]_[^=][^=]*_PREFIX\)=.*$/\1/p'); do - suffix=$(basename "${!wale_env}") + for walg_env in $(printenv -0 | tr '\n' ' ' | sed 's/\x00/\n/g' | sed -n 's/^\(WALG_[^=][^=]*_PREFIX\)=.*$/\1/p'); do + suffix=$(basename "${!walg_env}") if [[ -x "/usr/lib/postgresql/$suffix/bin/postgres" ]]; then - prefix=$(dirname "${!wale_env}") + prefix=$(dirname "${!walg_env}") if [[ $prefix =~ /spilo/ ]] && [[ $prefix =~ /wal$ ]]; then - printf -v "$wale_env" "%s" "$prefix" + printf -v "$walg_env" "%s" "$prefix" # shellcheck disable=SC2163 - export "$wale_env" + export "$walg_env" changed_env=true fi fi @@ -34,22 +34,6 @@ readonly wal_fast_source if [[ "$wal_destination" =~ /$wal_filename$ ]]; then # Patroni fetching missing files for pg_rewind export WALG_DOWNLOAD_CONCURRENCY=1 - POOL_SIZE=0 -else - POOL_SIZE=$WALG_DOWNLOAD_CONCURRENCY fi -[[ "$USE_WALG_RESTORE" == "true" ]] && exec wal-g wal-fetch "${wal_filename}" "${wal_destination}" - -[[ $POOL_SIZE -gt 8 ]] && POOL_SIZE=8 - -if [[ -z $WALE_S3_PREFIX ]]; then # non AWS environment? - readonly wale_prefetch_source=${wal_dir}/.wal-e/prefetch/${wal_filename} - if [[ -f $wale_prefetch_source ]]; then - exec mv "${wale_prefetch_source}" "${wal_destination}" - else - exec wal-e wal-fetch -p $POOL_SIZE "${wal_filename}" "${wal_destination}" - fi -else - exec bash /scripts/wal-e-wal-fetch.sh wal-fetch -p $POOL_SIZE "${wal_filename}" "${wal_destination}" -fi +exec wal-g wal-fetch "${wal_filename}" "${wal_destination}" diff --git a/postgres-appliance/scripts/spilo_commons.py b/postgres-appliance/scripts/spilo_commons.py index 0543bf771..981c9caa8 100644 --- a/postgres-appliance/scripts/spilo_commons.py +++ b/postgres-appliance/scripts/spilo_commons.py @@ -12,13 +12,13 @@ # (min_version, max_version, shared_preload_libraries, extwlist.extensions) extensions = { - 'timescaledb': (9.6, 17, True, True), - 'pg_cron': (9.5, 17, True, False), - 'pg_stat_kcache': (9.4, 17, True, False), - 'pg_partman': (9.4, 17, False, True) + 'timescaledb': (9.6, 18, True, True), + 'pg_cron': (9.5, 18, True, False), + 'pg_stat_kcache': (9.4, 18, True, False), + 'pg_partman': (9.4, 18, False, True) } if os.environ.get('ENABLE_PG_MON') == 'true': - extensions['pg_mon'] = (11, 17, True, False) + extensions['pg_mon'] = (11, 18, True, False) def adjust_extensions(old, version, extwlist=False): diff --git a/postgres-appliance/scripts/wal-e-wal-fetch.sh b/postgres-appliance/scripts/wal-e-wal-fetch.sh deleted file mode 100755 index c5302aae7..000000000 --- a/postgres-appliance/scripts/wal-e-wal-fetch.sh +++ /dev/null @@ -1,225 +0,0 @@ -#!/bin/bash -set -e - -date - -prefetch=8 - -function load_aws_instance_profile() { - local CREDENTIALS_URL=http://169.254.169.254/latest/meta-data/iam/security-credentials/ - local INSTANCE_PROFILE - INSTANCE_PROFILE=$(curl -s "$CREDENTIALS_URL") - # shellcheck source=/dev/null - source <(curl -s "$CREDENTIALS_URL$INSTANCE_PROFILE" | jq -r '"AWS_SECURITY_TOKEN=\"" + .Token + "\"\nAWS_SECRET_ACCESS_KEY=\"" + .SecretAccessKey + "\"\nAWS_ACCESS_KEY_ID=\"" + .AccessKeyId + "\""') -} - -function load_region_from_aws_instance_profile() { - local AZ - AZ=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone) - AWS_REGION=${AZ:0:-1} -} - -function usage() { - echo "Usage: $0 wal-fetch [--prefetch PREFETCH] WAL_SEGMENT WAL_DESTINATION" - exit 1 -} - -while [[ $# -gt 0 ]]; do - case $1 in - --s3-prefix ) - WALE_S3_PREFIX=$2 - shift - ;; - -k|--aws-access-key-id ) - AWS_ACCESS_KEY_ID=$2 - shift - ;; - --aws-instance-profile ) - AWS_INSTANCE_PROFILE=true - ;; - wal-fetch ) - ;; - -p|--prefetch ) - prefetch=$2 - shift - ;; - * ) - PARAMS+=("$1") - ;; - esac - shift -done - -[[ ${#PARAMS[@]} == 2 ]] || usage - -[[ "$AWS_INSTANCE_PROFILE" == "true" ]] && load_aws_instance_profile - -if [[ -z $AWS_SECRET_ACCESS_KEY || -z $AWS_ACCESS_KEY_ID || -z $WALE_S3_PREFIX ]]; then - echo bad environment - exit 1 -fi - -readonly SEGMENT=${PARAMS[-2]} -readonly DESTINATION=${PARAMS[-1]} - -if [[ $WALE_S3_PREFIX =~ ^s3://([^\/]+)(.+) ]]; then - readonly BUCKET=${BASH_REMATCH[1]} - BUCKET_PATH=${BASH_REMATCH[2]} - readonly BUCKET_PATH=${BUCKET_PATH%/} -else - echo bad WALE_S3_PREFIX - exit 1 -fi - -if [[ -n $WALE_S3_ENDPOINT && $WALE_S3_ENDPOINT =~ ^([a-z\+]{2,10}://)?([^:\/?]+) ]]; then - S3_HOST=${BASH_REMATCH[2]} -fi - -if [[ -z $AWS_REGION ]]; then - if [[ -n $WALE_S3_ENDPOINT && $WALE_S3_ENDPOINT =~ ^([a-z\+]{2,10}://)?s3-([^\.]+) ]]; then - AWS_REGION=${BASH_REMATCH[2]} - elif [[ "$AWS_INSTANCE_PROFILE" == "true" ]]; then - load_region_from_aws_instance_profile - fi -fi - -if [[ -z $AWS_REGION ]]; then - echo AWS_REGION is unknown - exit 1 -fi - -if [[ -z $S3_HOST ]]; then - S3_HOST=s3.$AWS_REGION.amazonaws.com -fi - -readonly SERVICE=s3 -readonly REQUEST=aws4_request -readonly HOST=$BUCKET.$S3_HOST -TIME=$(date +%Y%m%dT%H%M%SZ) -readonly TIME -readonly DATE=${TIME%T*} -readonly DRSR="$DATE/$AWS_REGION/$SERVICE/$REQUEST" -readonly EMPTYHASH=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 - -function hmac_sha256() { - echo -en "$2" | openssl dgst -sha256 -mac HMAC -macopt "$1" | sed 's/^.* //' -} - -# Four-step signing key calculation -DATE_KEY=$(hmac_sha256 key:"AWS4$AWS_SECRET_ACCESS_KEY" "$DATE") -readonly DATE_KEY -DATE_REGION_KEY=$(hmac_sha256 "hexkey:$DATE_KEY" "$AWS_REGION") -readonly DATE_REGION_KEY -DATE_REGION_SERVICE_KEY=$(hmac_sha256 "hexkey:$DATE_REGION_KEY" "$SERVICE") -readonly DATE_REGION_SERVICE_KEY -SIGNING_KEY=$(hmac_sha256 "hexkey:$DATE_REGION_SERVICE_KEY" "$REQUEST") -readonly SIGNING_KEY - -if [[ -z $AWS_INSTANCE_PROFILE ]]; then - readonly SIGNED_HEADERS="host;x-amz-content-sha256;x-amz-date" - readonly REQUEST_TOKEN="" - readonly TOKEN_HEADER=() -else - readonly SIGNED_HEADERS="host;x-amz-content-sha256;x-amz-date;x-amz-security-token" - readonly REQUEST_TOKEN="x-amz-security-token:$AWS_SECURITY_TOKEN\n" - readonly TOKEN_HEADER=(-H "x-amz-security-token: $AWS_SECURITY_TOKEN") -fi - -function s3_get() { - local segment=$1 - local destination=$2 - local FILE=$BUCKET_PATH/wal_005/$segment.lzo - local CANONICAL_REQUEST="GET\n$FILE\n\nhost:$HOST\nx-amz-content-sha256:$EMPTYHASH\nx-amz-date:$TIME\n$REQUEST_TOKEN\n$SIGNED_HEADERS\n$EMPTYHASH" - local CANONICAL_REQUEST_HASH - CANONICAL_REQUEST_HASH=$(echo -en "$CANONICAL_REQUEST" | openssl dgst -sha256 | sed 's/^.* //') - local STRING_TO_SIGN="AWS4-HMAC-SHA256\n$TIME\n$DRSR\n$CANONICAL_REQUEST_HASH" - local SIGNATURE - SIGNATURE=$(hmac_sha256 "hexkey:$SIGNING_KEY" "$STRING_TO_SIGN") - - if curl -s "https://$HOST$FILE" "${TOKEN_HEADER[@]}" -H "x-amz-content-sha256: $EMPTYHASH" -H "x-amz-date: $TIME" \ - -H "Authorization: AWS4-HMAC-SHA256 Credential=$AWS_ACCESS_KEY_ID/$DRSR, SignedHeaders=$SIGNED_HEADERS, Signature=$SIGNATURE" \ - | lzop -dc > "$destination" 2> /dev/null && [[ ${PIPESTATUS[0]} == 0 ]]; then - [[ -s $destination ]] && echo "$$ success $FILE" && return 0 - fi - rm -f "$destination" - echo "$$ failed $FILE" - return 1 -} - -function generate_next_segments() { - local num=$1 - - local timeline=${SEGMENT:0:8} - local log=$((16#${SEGMENT:8:8})) - local seg=$((16#${SEGMENT:16:8})) - - while [[ $((num--)) -gt 0 ]]; do - seg=$((seg+1)) - printf "%s%08X%08X\n" "$timeline" $((log+seg/256)) $((seg%256)) - done -} - -function clear_except() { - set +e - for dir in "$PREFETCHDIR"/running/0*; do - item=$(basename "$dir") - if [[ $item =~ ^[0-9A-F]{24}$ ]]; then - [[ " ${PREFETCHES[*]} " =~ \ $item\ ]] || rm -fr "$dir" - fi - done - - for file in "$PREFETCHDIR"/0*; do - item=$(basename "$file") - if [[ $item =~ ^[0-9A-F]{24}$ ]]; then - [[ " ${PREFETCHES[*]} " =~ \ $item\ ]] || rm -f "$file" - fi - done - set -e - return 0 -} - -function try_to_promote_prefetched() { - local prefetched=$PREFETCHDIR/$SEGMENT - [[ -f $prefetched ]] || return 1 - echo "$$ promoting $prefetched" - mv "$prefetched" "$DESTINATION" && clear_except && exit 0 -} - -echo "$$ $SEGMENT" - -PREFETCHDIR=$(dirname "$DESTINATION")/.wal-e/prefetch -readonly PREFETCHDIR -if [[ $prefetch -gt 0 && $SEGMENT =~ ^[0-9A-F]{24}$ ]]; then - mapfile -t PREFETCHES < <(generate_next_segments "$prefetch") - readonly PREFETCHES - for segment in "${PREFETCHES[@]}"; do - running="$PREFETCHDIR/running/$segment" - [[ -d $running || -f $PREFETCHDIR/$segment ]] && continue - - mkdir -p "$running" - ( - trap 'rm -fr $running' QUIT TERM EXIT - TMPFILE=$(mktemp -p "$running") - echo "$$ prefetching $segment" - s3_get "$segment" "$TMPFILE" && mv "$TMPFILE" "$PREFETCHDIR/$segment" - ) & - done - - last_size=0 - while ! try_to_promote_prefetched; do - size=$(du -bs "$PREFETCHDIR/running/$SEGMENT" 2> /dev/null | cut -f1) - if [[ -z $size ]]; then - try_to_promote_prefetched || break - elif [[ $size > $last_size ]]; then - echo "($size > $last_size), sleeping 1" - last_size=$size - sleep 1 - else - echo "size=$size, last_size=$last_size" - break - fi - done - clear_except -fi - -s3_get "$SEGMENT" "$DESTINATION" diff --git a/postgres-appliance/scripts/wale_restore.sh b/postgres-appliance/scripts/walg_restore.sh similarity index 94% rename from postgres-appliance/scripts/wale_restore.sh rename to postgres-appliance/scripts/walg_restore.sh index 4fbcedd01..70768188f 100755 --- a/postgres-appliance/scripts/wale_restore.sh +++ b/postgres-appliance/scripts/walg_restore.sh @@ -33,16 +33,11 @@ done [[ -z $DATA_DIR ]] && exit 1 [[ -z $NO_MASTER && -z "$CONNSTR" ]] && exit 1 -if [[ "$USE_WALG_RESTORE" == "true" ]]; then - readonly WAL_E="wal-g" -else - readonly WAL_E="wal-e" -fi ATTEMPT=0 server_version="-1" while true; do - [[ -z $wal_segment_backup_start ]] && wal_segment_backup_start=$($WAL_E backup-list 2> /dev/null \ + [[ -z $wal_segment_backup_start ]] && wal_segment_backup_start=$(wal-g backup-list 2> /dev/null \ | sed '0,/^\(backup_\)\?name\s*\(last_\)\?modified\s*/d' | sort -bk2 | tail -n1 | awk '{print $3;}' | sed 's/_.*$//') [[ -n "$CONNSTR" && $server_version == "-1" ]] && server_version=$(psql -d "$CONNSTR" -tAc 'show server_version_num' 2> /dev/null || echo "-1") @@ -84,7 +79,7 @@ fi ATTEMPT=0 while true; do - if $WAL_E backup-fetch "$DATA_DIR" LATEST; then + if wal-g backup-fetch "$DATA_DIR" LATEST; then version=$(<"$DATA_DIR/PG_VERSION") [[ "$version" =~ \. ]] && wal_name=xlog || wal_name=wal readonly wal_dir=$DATA_DIR/pg_$wal_name diff --git a/postgres-appliance/tests/docker-compose.yml b/postgres-appliance/tests/docker-compose.yml index f0399ecb2..3358f61a9 100644 --- a/postgres-appliance/tests/docker-compose.yml +++ b/postgres-appliance/tests/docker-compose.yml @@ -33,8 +33,7 @@ services: AWS_ENDPOINT: &aws_endpoint 'http://minio:9000' AWS_S3_FORCE_PATH_STYLE: &aws_s3_force_path_style 'true' WAL_S3_BUCKET: &bucket testbucket -# USE_WALG: 'true' # wal-e is used and tested by default, wal-g is used automatically for restore in case of S3 - WALE_DISABLE_S3_SSE: &wale_disable_s3_sse 'true' + WALG_DISABLE_S3_SSE: &walg_disable_s3_sse 'true' ETCDCTL_ENDPOINTS: http://etcd:2379 ETCD3_HOST: "etcd:2379" SCOPE: demo @@ -51,14 +50,14 @@ services: postgresql: parameters: shared_buffers: 32MB - PGVERSION: '13' + PGVERSION: '14' # Just to test upgrade with clone. Without CLONE_SCOPE they don't work CLONE_WAL_S3_BUCKET: *bucket CLONE_AWS_ACCESS_KEY_ID: *access_key CLONE_AWS_SECRET_ACCESS_KEY: *secret_key CLONE_AWS_ENDPOINT: *aws_endpoint CLONE_AWS_S3_FORCE_PATH_STYLE: *aws_s3_force_path_style - CLONE_WALE_DISABLE_S3_SSE: *wale_disable_s3_sse + CLONE_WALG_DISABLE_S3_SSE: *walg_disable_s3_sse hostname: spilo1 container_name: demo-spilo1 diff --git a/postgres-appliance/tests/test_spilo.sh b/postgres-appliance/tests/test_spilo.sh index 2570c0974..a68c86a7f 100755 --- a/postgres-appliance/tests/test_spilo.sh +++ b/postgres-appliance/tests/test_spilo.sh @@ -124,20 +124,19 @@ function drop_timescaledb() { } function test_inplace_upgrade_wrong_version() { - docker_exec "$1" "PGVERSION=13 $UPGRADE_SCRIPT 3" 2>&1 | grep 'Upgrade is not required' + docker_exec "$1" "PGVERSION=14 $UPGRADE_SCRIPT 3" 2>&1 | grep 'Upgrade is not required' } function test_inplace_upgrade_wrong_capacity() { - docker_exec "$1" "PGVERSION=14 $UPGRADE_SCRIPT 4" 2>&1 | grep 'number of replicas does not match' + docker_exec "$1" "PGVERSION=15 $UPGRADE_SCRIPT 4" 2>&1 | grep 'number of replicas does not match' } -function test_successful_inplace_upgrade_to_14() { - docker_exec "$1" "PGVERSION=14 $UPGRADE_SCRIPT 3" +function test_successful_inplace_upgrade_to_15() { + docker_exec "$1" "PGVERSION=15 $UPGRADE_SCRIPT 3" } function test_envdir_suffix() { - docker_exec "$1" "cat /run/etc/wal-e.d/env/WALG_S3_PREFIX" | grep -q "$2$" \ - && docker_exec "$1" "cat /run/etc/wal-e.d/env/WALE_S3_PREFIX" | grep -q "$2$" + docker_exec "$1" "cat /run/etc/wal-e.d/env/WALG_S3_PREFIX" | grep -q "$2$" } function test_envdir_updated_to_x() { @@ -147,11 +146,7 @@ function test_envdir_updated_to_x() { } function test_failed_inplace_upgrade_big_replication_lag() { - ! test_successful_inplace_upgrade_to_14 "$1" -} - -function test_successful_inplace_upgrade_to_15() { - docker_exec "$1" "PGVERSION=15 $UPGRADE_SCRIPT 3" + ! test_successful_inplace_upgrade_to_15 "$1" } function test_successful_inplace_upgrade_to_16() { @@ -162,49 +157,53 @@ function test_successful_inplace_upgrade_to_17() { docker_exec "$1" "PGVERSION=17 $UPGRADE_SCRIPT 3" } -function test_pg_upgrade_to_17_check_failed() { - ! test_successful_inplace_upgrade_to_17 "$1" +function test_successful_inplace_upgrade_to_18() { + docker_exec "$1" "PGVERSION=18 $UPGRADE_SCRIPT 3" } -function start_clone_with_wale_upgrade_container() { +function test_pg_upgrade_to_18_check_failed() { + ! test_successful_inplace_upgrade_to_18 "$1" +} + +function start_clone_with_walg_upgrade_container() { local ID=${1:-1} docker-compose run \ -e SCOPE=upgrade \ - -e PGVERSION=14 \ + -e PGVERSION=15 \ -e CLONE_SCOPE=demo \ - -e CLONE_METHOD=CLONE_WITH_WALE \ + -e CLONE_METHOD=CLONE_WITH_WALG \ -e CLONE_TARGET_TIME="$(next_minute)" \ - -e WALE_BACKUP_THRESHOLD_PERCENTAGE=80 \ + -e WALG_BACKUP_THRESHOLD_PERCENTAGE=80 \ --name "${PREFIX}upgrade$ID" \ -d "spilo$ID" } -function start_clone_with_wale_upgrade_replica_container() { - start_clone_with_wale_upgrade_container 2 +function start_clone_with_walg_upgrade_replica_container() { + start_clone_with_walg_upgrade_container 2 } -function start_clone_with_wale_upgrade_to_17_container() { +function start_clone_with_walg_upgrade_to_18_container() { docker-compose run \ -e SCOPE=upgrade3 \ - -e PGVERSION=17 \ + -e PGVERSION=18 \ -e CLONE_SCOPE=demo \ - -e CLONE_PGVERSION=13 \ - -e CLONE_METHOD=CLONE_WITH_WALE \ + -e CLONE_PGVERSION=14 \ + -e CLONE_METHOD=CLONE_WITH_WALG \ -e CLONE_TARGET_TIME="$(next_minute)" \ --name "${PREFIX}upgrade4" \ -d "spilo3" } -function start_clone_with_wale_17_container() { +function start_clone_with_walg_18_container() { docker-compose run \ - -e SCOPE=clone16 \ - -e PGVERSION=17 \ + -e SCOPE=clone17 \ + -e PGVERSION=18 \ -e CLONE_SCOPE=upgrade3 \ - -e CLONE_PGVERSION=17 \ - -e CLONE_METHOD=CLONE_WITH_WALE \ + -e CLONE_PGVERSION=18 \ + -e CLONE_METHOD=CLONE_WITH_WALG \ -e CLONE_TARGET_TIME="$(next_hour)" \ - --name "${PREFIX}clone16" \ + --name "${PREFIX}clone17" \ -d "spilo3" } @@ -212,7 +211,7 @@ function start_clone_with_basebackup_upgrade_container() { local container=$1 docker-compose run \ -e SCOPE=upgrade2 \ - -e PGVERSION=15 \ + -e PGVERSION=16 \ -e CLONE_SCOPE=upgrade \ -e CLONE_METHOD=CLONE_WITH_BASEBACKUP \ -e CLONE_HOST="$(docker_exec "$container" "hostname --ip-address")" \ @@ -226,11 +225,11 @@ function start_clone_with_basebackup_upgrade_container() { function start_clone_with_hourly_log_rotation() { docker-compose run \ -e SCOPE=hourlylogs \ - -e PGVERSION=17 \ + -e PGVERSION=18 \ -e LOG_SHIP_HOURLY="true" \ -e CLONE_SCOPE=upgrade2 \ - -e CLONE_PGVERSION=15 \ - -e CLONE_METHOD=CLONE_WITH_WALE \ + -e CLONE_PGVERSION=16 \ + -e CLONE_METHOD=CLONE_WITH_WALG \ -e CLONE_TARGET_TIME="$(next_minute)" \ --name "${PREFIX}hourlylogs" \ -d "spilo3" @@ -261,18 +260,18 @@ function verify_hourly_log_rotation() { [ "$log_rotation_age" = "1h" ] && [ "$log_filename" = "postgresql-%u-%H.log" ] && [ "$postgres_log_ftables" -eq 192 ] && [ "$postgres_log_views" -eq 8 ] && [ "$postgres_failed_auth_views" -eq 200 ] } -# TEST SUITE 1 - In-place major upgrade 13->14->...->17 -# TEST SUITE 2 - Major upgrade 13->17 after wal-e clone (with CLONE_PGVERSION set) -# TEST SUITE 3 - PITR (clone with wal-e) with unreachable target (14+) -# TEST SUITE 4 - Major upgrade 13->14 after wal-e clone (no CLONE_PGVERSION) -# TEST SUITE 5 - Replica bootstrap with wal-e -# TEST SUITE 6 - Major upgrade 14->15 after clone with basebackup +# TEST SUITE 1 - In-place major upgrade 14->15->...->18 +# TEST SUITE 2 - Major upgrade 14->18 after wal-g clone (with CLONE_PGVERSION set) +# TEST SUITE 3 - PITR (clone with wal-g) with unreachable target (15+) +# TEST SUITE 4 - Major upgrade 14->15 after wal-g clone (no CLONE_PGVERSION) +# TEST SUITE 5 - Replica bootstrap with wal-g +# TEST SUITE 6 - Major upgrade 15->16 after clone with basebackup # TEST SUITE 7 - Hourly log rotation function test_spilo() { # TEST SUITE 1 local container=$1 - run_test test_envdir_suffix "$container" 13 + run_test test_envdir_suffix "$container" 14 log_info "[TS1] Testing wrong upgrade setups" run_test test_inplace_upgrade_wrong_version "$container" @@ -289,66 +288,66 @@ function test_spilo() { # TEST SUITE 2 local upgrade3_container - upgrade3_container=$(start_clone_with_wale_upgrade_to_17_container) # SCOPE=upgrade3 PGVERSION=17 CLONE: _SCOPE=demo _PGVERSION=13 _TARGET_TIME= - log_info "[TS2] Started $upgrade3_container for testing major upgrade 13->17 after clone with wal-e" + upgrade3_container=$(start_clone_with_walg_upgrade_to_18_container) # SCOPE=upgrade3 PGVERSION=18 CLONE: _SCOPE=demo _PGVERSION=14 _TARGET_TIME= + log_info "[TS2] Started $upgrade3_container for testing major upgrade 14->18 after clone with wal-g" # TEST SUITE 4 local upgrade_container - upgrade_container=$(start_clone_with_wale_upgrade_container) # SCOPE=upgrade PGVERSION=14 CLONE: _SCOPE=demo _TARGET_TIME= - log_info "[TS4] Started $upgrade_container for testing major upgrade 13->14 after clone with wal-e" + upgrade_container=$(start_clone_with_walg_upgrade_container) # SCOPE=upgrade PGVERSION=15 CLONE: _SCOPE=demo _TARGET_TIME= + log_info "[TS4] Started $upgrade_container for testing major upgrade 14->15 after clone with wal-g" # TEST SUITE 1 # wait clone to finish and prevent timescale installation gets cloned find_leader "$upgrade3_container" find_leader "$upgrade_container" - create_timescaledb "$container" # we don't install it at the beginning, as we do 13->17 in a clone + create_timescaledb "$container" # we don't install it at the beginning, as we do 14->18 in a clone - log_info "[TS1] Testing in-place major upgrade 13->14" + log_info "[TS1] Testing in-place major upgrade 14->15" wait_zero_lag "$container" - run_test test_successful_inplace_upgrade_to_14 "$container" + run_test test_successful_inplace_upgrade_to_15 "$container" wait_all_streaming "$container" - run_test test_envdir_updated_to_x 14 + run_test test_envdir_updated_to_x 15 # TEST SUITE 2 - log_info "[TS2] Testing in-place major upgrade 13->17 after wal-e clone" - run_test verify_clone_upgrade "$upgrade3_container" "wal-e" 13 17 + log_info "[TS2] Testing in-place major upgrade 14->18 after wal-g clone" + run_test verify_clone_upgrade "$upgrade3_container" "wal-g" 14 18 run_test verify_archive_mode_is_on "$upgrade3_container" wait_backup "$upgrade3_container" # TEST SUITE 3 - local clone17_container - clone17_container=$(start_clone_with_wale_17_container) # SCOPE=clone17 CLONE: _SCOPE=upgrade3 _PGVERSION=17 _TARGET_TIME= - log_info "[TS3] Started $clone17_container for testing point-in-time recovery (clone with wal-e) with unreachable target on 14+" + local clone18_container + clone18_container=$(start_clone_with_walg_18_container) # SCOPE=clone18 CLONE: _SCOPE=upgrade3 _PGVERSION=18 _TARGET_TIME= + log_info "[TS3] Started $clone18_container for testing point-in-time recovery (clone with wal-g) with unreachable target on 15+" # TEST SUITE 1 - log_info "[TS1] Testing in-place major upgrade 14->15" - run_test test_successful_inplace_upgrade_to_15 "$container" + log_info "[TS1] Testing in-place major upgrade 15->16" + run_test test_successful_inplace_upgrade_to_16 "$container" wait_all_streaming "$container" - run_test test_envdir_updated_to_x 15 + run_test test_envdir_updated_to_x 16 # TEST SUITE 3 - find_leader "$clone17_container" - run_test verify_archive_mode_is_on "$clone17_container" + find_leader "$clone18_container" + run_test verify_archive_mode_is_on "$clone18_container" # TEST SUITE 1 wait_backup "$container" - log_info "[TS1] Testing in-place major upgrade to 15->16" - run_test test_successful_inplace_upgrade_to_16 "$container" + log_info "[TS1] Testing in-place major upgrade to 16->17" + run_test test_successful_inplace_upgrade_to_17 "$container" wait_all_streaming "$container" - run_test test_envdir_updated_to_x 16 + run_test test_envdir_updated_to_x 17 # TEST SUITE 4 - log_info "[TS4] Testing in-place major upgrade 13->14 after clone with wal-e" - run_test verify_clone_upgrade "$upgrade_container" "wal-e" 13 14 + log_info "[TS4] Testing in-place major upgrade 14->15 after clone with wal-g" + run_test verify_clone_upgrade "$upgrade_container" "wal-g" 14 15 run_test verify_archive_mode_is_on "$upgrade_container" wait_backup "$upgrade_container" @@ -356,26 +355,26 @@ function test_spilo() { # TEST SUITE 5 local upgrade_replica_container - upgrade_replica_container=$(start_clone_with_wale_upgrade_replica_container) # SCOPE=upgrade - log_info "[TS5] Started $upgrade_replica_container for testing replica bootstrap with wal-e" + upgrade_replica_container=$(start_clone_with_walg_upgrade_replica_container) # SCOPE=upgrade + log_info "[TS5] Started $upgrade_replica_container for testing replica bootstrap with wal-g" # TEST SUITE 6 local basebackup_container - basebackup_container=$(start_clone_with_basebackup_upgrade_container "$upgrade_container") # SCOPE=upgrade2 PGVERSION=15 CLONE: _SCOPE=upgrade - log_info "[TS6] Started $basebackup_container for testing major upgrade 14->15 after clone with basebackup" + basebackup_container=$(start_clone_with_basebackup_upgrade_container "$upgrade_container") # SCOPE=upgrade2 PGVERSION=16 CLONE: _SCOPE=upgrade + log_info "[TS6] Started $basebackup_container for testing major upgrade 15->16 after clone with basebackup" wait_backup "$basebackup_container" # TEST SUITE 1 - # run_test test_pg_upgrade_to_17_check_failed "$container" # pg_upgrade --check complains about timescaledb + # run_test test_pg_upgrade_to_18_check_failed "$container" # pg_upgrade --check complains about timescaledb wait_backup "$container" - # drop_timescaledb "$container" - log_info "[TS1] Testing in-place major upgrade 16->17" - run_test test_successful_inplace_upgrade_to_17 "$container" + drop_timescaledb "$container" + log_info "[TS1] Testing in-place major upgrade 17->18" + run_test test_successful_inplace_upgrade_to_18 "$container" wait_all_streaming "$container" - run_test test_envdir_updated_to_x 17 + run_test test_envdir_updated_to_x 18 # TEST SUITE 5 @@ -388,8 +387,8 @@ function test_spilo() { log_info "[TS7] Started $hourlylogs_container for testing hourly log rotation" # TEST SUITE 6 - log_info "[TS6] Testing in-place major upgrade 14->15 after clone with basebackup" - run_test verify_clone_upgrade "$basebackup_container" "basebackup" 14 15 + log_info "[TS6] Testing in-place major upgrade 15->16 after clone with basebackup" + run_test verify_clone_upgrade "$basebackup_container" "basebackup" 15 16 run_test verify_archive_mode_is_on "$basebackup_container" # TEST SUITE 7