Open
Description
Problem description
We're running a standard deployment of eoapi-k8s for IFRC Montandon with these configuration values:
ingress:
host: "montandon-eoapi-stage.ifrc.org"
tls:
enabled: false
pgstacBootstrap:
settings:
envVars:
LOAD_FIXTURES: "0"
RUN_FOREVER: "1"
postgrescluster:
instances:
- name: eoapi
replicas: 1
dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: "500Gi"
cpu: "1024m"
memory: "3048Mi"
We've hit a snag where the database service eventually runs out of disk space after ingesting data for a while. When this happens, it fails its health checks and becomes unresponsive, which brings down the raster service with it.
Looking into our staging instance, we found that the PostgreSQL WAL (Write-Ahead Log) directory is the culprit - it's eating up almost all of our storage. Here's what we're seeing:
bash-4.4$ df -h /pgdata/
Filesystem Size Used Avail Use% Mounted on
/dev/sdd 493G 493G 0 100% /pgdata
bash-4.4$ du -sh /pgdata/*
16K /pgdata/lost+found
116M /pgdata/pg16
492G /pgdata/pg16_wal
12K /pgdata/pgbackrest
The 492GB WAL directory is quite large compared to the 116MB data directory.
Expected Output
Ideally, the WAL directory should stay under certain usage limit and should not cause down time.
Environment Information
Crunchy Postgres Operator: v5.5.2
eoapi-k8s: v0.5.0