Description
Proposal
Use case. Why is this important?
Majority of PG backups or standby cluster creation processes take place via the pg_basebackup process, where the given backup/standby cluster initiates the backup process and then the backup is streamed to the given cluster/s.
Some of these PostgreSQL clusters can range from GBs to TBs to even PBs. The PG tool, pg_stat_progress_basebackup was introduced in PG 13 and the data it provides should be exposed as prometheus metrics for DBAs, DevOps folks to monitor their standby cluster creation or backup processes.
We can create five main metrics:
- pg_basebackup_bytes_total: For tracking total bytes to be streamed
- pg_basebackup_bytes_streamed: For tracking amount of bytes that have been streamed
- pg_basebackup_tablespaces_total: For tracking total count of tablespace to be streamed
- pg_basebackup_tablespaces_streamed: For tracking count of tablespace that have been streamed
- pg_basebackup_phase: Phase of the basebackup process
and can put the label as follows:
- pid: As a label because there could be multiple concurrent backups
Along with other global labels to be used as cluster identifiers.
The basebackup phase can be numerically coded as follows to emit the phase metric as a number instead:
INITIALIZING = 0
WAITING_FOR_CHECKPOINT = 1
ESTIMATING_BACKUP_SIZE = 2
STREAMING_DB_FILES = 3
WAITING_FOR_WAL_ARCHIVE = 4
TRANSFERRING_WAL = 5
Folks, let me know if this sounds like a useful proposal and I'll be down to present the implementation for the same. Also please point out if there is any error in this implementation idea or if I've missed something. Thanks in advance!