Skip to content

/pgdata/pgbackrest/log/db-archive-get-async.log unbounded growth on Standby Cluster #4142

Open
@wmuldergov

Description

@wmuldergov

Overview

If you setup a CrunchyDB cluster using an external repo with s3, the Standby Cluster will create this log: /pgdata/pgbackrest/log/db-archive-get-async.log that gets updated every time it syncs from s3 which looks to be every 5-10 seconds. Since logrotate doesn't seem to be enabled by default on this folder (like it is for /pgdata/pg17/log) this log will continue to grow and eventually cause the space in the pgdata PVC to get exhausted.

Environment

Please provide the following details:

  • Platform: OpenShift
  • Platform Version: 4.16
  • PGO Image Tag: ubi8-17.0-3.4-0
  • Postgres Version: 17
  • Storage: s3

Steps to Reproduce

REPRO

  1. Setup a Primary and Standby Cluster for Crunchy using the External S3 Repo method.
  2. On the Standby Cluster monitor the /pgdata/pgbackrest/log/db-archive-get-async.log size and watch it grow.

EXPECTED

This log file gets rotated like the logs in /pgdata/pg17/log

ACTUAL

The log file keeps growing till the space on the PVC is exhausted.

Logs

-------------------PROCESS START-------------------
2025-03-20 19:02:14.552 P00   INFO: archive-get:async command begin 2.53.1: [00000003000000C500000091, 00000003000000C500000092, 00000003000000C500000093, 00000003000000C500000094, 00000003000000C500000095, 00000003000000C500000096, 00000003000000C500000097, 00000003000000C500000098] --archive-async --exec-id=675601-15d23e31 --log-level-console=off --log-level-stderr=off --log-path=/pgdata/pgbackrest/log --pg1-path=/pgdata/pg17 --repo=2 --repo1-host=<REDACTED> --repo1-host-ca-file=/etc/pgbackrest/conf.d/~postgres-operator/tls-ca.crt --repo1-host-cert-file=/etc/pgbackrest/conf.d/~postgres-operator/client-tls.crt --repo1-host-key-file=/etc/pgbackrest/conf.d/~postgres-operator/client-tls.key --repo1-host-type=tls --repo1-host-user=postgres --repo1-path=/pgbackrest/repo1 --repo2-path=/db/dbbackup --repo2-s3-bucket=<REDACTED> --repo2-s3-endpoint=<REDACTED> --repo2-s3-key=<redacted> --repo2-s3-key-secret=<redacted> --repo2-s3-region=ca-central-1 --repo2-s3-uri-style=path --repo2-type=s3 --spool-path=/pgdata/pgbackrest-spool --stanza=db
2025-03-20 19:02:14.552 P00   INFO: get 8 WAL file(s) from archive: 00000003000000C500000091...00000003000000C500000098
2025-03-20 19:02:14.623 P00   INFO: archive-get:async command end: completed successfully (71ms)

Additional Information

I see this was merged in recently: #4108 however it looks like they put the logrotate function behind a feature flag for Open Telemetry. This issue would affect anyone that has a cluster setup in Standby mode, so I would suggest not putting it behind the feature flag for Open Telemetry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions