fix(collector): handle Aurora's unsupported pg_last_xact_replay_times…#1274
Open
dannotripp wants to merge 1 commit intoprometheus-community:masterfrom
Open
fix(collector): handle Aurora's unsupported pg_last_xact_replay_times…#1274dannotripp wants to merge 1 commit intoprometheus-community:masterfrom
dannotripp wants to merge 1 commit intoprometheus-community:masterfrom
Conversation
…tamp Aurora PostgreSQL does not support pg_last_xact_replay_timestamp() and returns a feature_not_supported error (code 0A000) when the replication collector queries it. This causes the collector to crash on every scrape for Aurora instances. When this error is detected, the collector now falls back to a simpler query that only reads pg_is_in_recovery(), so is_replica is still reported correctly. The time-based metrics (lag_seconds and last_replay_seconds) are emitted as NaN to signal that the values are unavailable, rather than crashing the collection cycle entirely. The error is identified by checking for a *pq.Error with class "0A" (feature_not_supported) and a message that contains "Aurora", which avoids incorrectly suppressing the same error code on standard Postgres. A new test TestPgReplicationCollectorAurora covers this fallback path. Signed-off-by: Danno Tripp <danno.tripp@reddit.com>
df20274 to
b7c82e9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(collector): handle Aurora's unsupported pg_last_xact_replay_timestamp
Fixes #1273
Aurora PostgreSQL does not support
pg_last_xact_replay_timestamp(), causingthe replication collector to abort every scrape with a fatal error on Aurora
instances.
This change detects the Aurora-specific
feature_not_supportederror (Postgreserror class
0A) and falls back gracefully:pg_replication_is_replicaisstill reported via
pg_is_in_recovery(), while the time-based metrics emitNaNto signal they are unavailable.A new test
TestPgReplicationCollectorAuroracovers the fallback path.