Skip to content

fix(collector): handle Aurora's unsupported pg_last_xact_replay_times…#1274

Open
dannotripp wants to merge 1 commit intoprometheus-community:masterfrom
dannotripp:fix/aurora-replication-collector
Open

fix(collector): handle Aurora's unsupported pg_last_xact_replay_times…#1274
dannotripp wants to merge 1 commit intoprometheus-community:masterfrom
dannotripp:fix/aurora-replication-collector

Conversation

@dannotripp
Copy link
Copy Markdown
Contributor

fix(collector): handle Aurora's unsupported pg_last_xact_replay_timestamp

Fixes #1273

Aurora PostgreSQL does not support pg_last_xact_replay_timestamp(), causing
the replication collector to abort every scrape with a fatal error on Aurora
instances.

This change detects the Aurora-specific feature_not_supported error (Postgres
error class 0A) and falls back gracefully: pg_replication_is_replica is
still reported via pg_is_in_recovery(), while the time-based metrics emit
NaN to signal they are unavailable.

A new test TestPgReplicationCollectorAurora covers the fallback path.

…tamp

Aurora PostgreSQL does not support pg_last_xact_replay_timestamp() and
returns a feature_not_supported error (code 0A000) when the replication
collector queries it. This causes the collector to crash on every scrape
for Aurora instances.

When this error is detected, the collector now falls back to a simpler
query that only reads pg_is_in_recovery(), so is_replica is still
reported correctly. The time-based metrics (lag_seconds and
last_replay_seconds) are emitted as NaN to signal that the values are
unavailable, rather than crashing the collection cycle entirely.

The error is identified by checking for a *pq.Error with class "0A"
(feature_not_supported) and a message that contains "Aurora", which
avoids incorrectly suppressing the same error code on standard Postgres.

A new test TestPgReplicationCollectorAurora covers this fallback path.

Signed-off-by: Danno Tripp <danno.tripp@reddit.com>
@dannotripp dannotripp force-pushed the fix/aurora-replication-collector branch from df20274 to b7c82e9 Compare March 5, 2026 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Replication collector crashes on Aurora PostgreSQL

1 participant