Skip to content

Bug: Replication collector crashes on Aurora PostgreSQL #1273

@dannotripp

Description

@dannotripp

Summary

The replication collector (pg_replication) fails on every scrape cycle when
running against Amazon Aurora PostgreSQL. This causes the entire collection
cycle to abort with an error, producing no metrics for that scrape interval.

Root Cause

Aurora PostgreSQL does not support pg_last_xact_replay_timestamp(). When the
replication collector executes its primary query, which calls this function,
Aurora returns:

ERROR: pg_last_xact_replay_timestamp() is currently not supported for Aurora

Because the collector treats any scan error as fatal, this error propagates up
and terminates the scrape. No replication metrics (including is_replica, which
Aurora can answer) are emitted.

Impact

  • Any postgres_exporter deployment targeting Aurora PostgreSQL instances loses
    all replication metrics on every scrape.
  • The error appears in exporter logs on every scrape interval, producing noise.
  • pg_replication_is_replica, which is useful for distinguishing writer from
    reader instances, is never emitted even though Aurora supports
    pg_is_in_recovery().

Expected Behavior

The collector should degrade gracefully on Aurora: emit NaN for the
time-based metrics that rely on pg_last_xact_replay_timestamp(), and still
emit pg_replication_is_replica using the supported pg_is_in_recovery()
function.

Environment

  • postgres_exporter v0.19.x
  • Amazon Aurora PostgreSQL (any version)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions