Description
Overview
If you setup a CrunchyDB cluster using an external repo with s3, the Standby Cluster will create this log: /pgdata/pgbackrest/log/db-archive-get-async.log
that gets updated every time it syncs from s3 which looks to be every 5-10 seconds. Since logrotate doesn't seem to be enabled by default on this folder (like it is for /pgdata/pg17/log) this log will continue to grow and eventually cause the space in the pgdata PVC to get exhausted.
Environment
Please provide the following details:
- Platform: OpenShift
- Platform Version: 4.16
- PGO Image Tag: ubi8-17.0-3.4-0
- Postgres Version: 17
- Storage: s3
Steps to Reproduce
REPRO
- Setup a Primary and Standby Cluster for Crunchy using the External S3 Repo method.
- On the Standby Cluster monitor the
/pgdata/pgbackrest/log/db-archive-get-async.log
size and watch it grow.
EXPECTED
This log file gets rotated like the logs in /pgdata/pg17/log
ACTUAL
The log file keeps growing till the space on the PVC is exhausted.
Logs
-------------------PROCESS START-------------------
2025-03-20 19:02:14.552 P00 INFO: archive-get:async command begin 2.53.1: [00000003000000C500000091, 00000003000000C500000092, 00000003000000C500000093, 00000003000000C500000094, 00000003000000C500000095, 00000003000000C500000096, 00000003000000C500000097, 00000003000000C500000098] --archive-async --exec-id=675601-15d23e31 --log-level-console=off --log-level-stderr=off --log-path=/pgdata/pgbackrest/log --pg1-path=/pgdata/pg17 --repo=2 --repo1-host=<REDACTED> --repo1-host-ca-file=/etc/pgbackrest/conf.d/~postgres-operator/tls-ca.crt --repo1-host-cert-file=/etc/pgbackrest/conf.d/~postgres-operator/client-tls.crt --repo1-host-key-file=/etc/pgbackrest/conf.d/~postgres-operator/client-tls.key --repo1-host-type=tls --repo1-host-user=postgres --repo1-path=/pgbackrest/repo1 --repo2-path=/db/dbbackup --repo2-s3-bucket=<REDACTED> --repo2-s3-endpoint=<REDACTED> --repo2-s3-key=<redacted> --repo2-s3-key-secret=<redacted> --repo2-s3-region=ca-central-1 --repo2-s3-uri-style=path --repo2-type=s3 --spool-path=/pgdata/pgbackrest-spool --stanza=db
2025-03-20 19:02:14.552 P00 INFO: get 8 WAL file(s) from archive: 00000003000000C500000091...00000003000000C500000098
2025-03-20 19:02:14.623 P00 INFO: archive-get:async command end: completed successfully (71ms)
Additional Information
I see this was merged in recently: #4108 however it looks like they put the logrotate function behind a feature flag for Open Telemetry. This issue would affect anyone that has a cluster setup in Standby mode, so I would suggest not putting it behind the feature flag for Open Telemetry.