Skip to content

fix: Repair PostgreSQL Patroni pg_hba and standby password drift#33

Merged
wallyxjh merged 3 commits into
labring:fix/v0.9.3from
im0x0ing:fix-postgresql-patroni-pg-hba-dcs-repair
Jun 1, 2026
Merged

fix: Repair PostgreSQL Patroni pg_hba and standby password drift#33
wallyxjh merged 3 commits into
labring:fix/v0.9.3from
im0x0ing:fix-postgresql-patroni-pg-hba-dcs-repair

Conversation

@im0x0ing

Copy link
Copy Markdown
Collaborator

Adds PostgreSQL-specific repair logic to the Component reconciliation pipeline to recover from Patroni failover drift.

Improvements to PostgreSQL recovery:

  • Added componentPatroniDCSRepairTransformer to verify and repair Patroni dynamic postgresql.pg_hba config after failover. The transformer reads expected rules from the rendered ConfigMap when available, falls back to the minimal required remote access rules, patches Patroni /config, reloads Patroni, and verifies the repaired config.
  • Added componentPostgreSQLStandbyPasswordRepairTransformer to repair drift between the standby password used by PostgreSQL pods and the password stored in the leader database. It validates that all running pods agree on the standby credential, updates the leader only when the hash differs, and reports inconsistent credentials through Component conditions.

Runtime safety and observability:

  • Both repairs only run for PostgreSQL components that are already Running with an available workload and a known leader pod.
  • The standby password repair skips standby-cluster mode, where standby credentials may refer to a remote primary.
  • Repair results are exposed through Component conditions and Kubernetes events.

Validation:

  • Added focused unit tests for pg_hba parsing/merging, Patroni REST repair flow, standby password parsing, env fallback, inconsistent password handling, standby-mode skip, and Component status condition persistence.
  • Manually verified the latest image on the test cluster by injecting Patroni pg_hba drift and standby password drift; both were repaired successfully and replication remained streaming with zero lag.

im0x0ing added 3 commits May 26, 2026 16:58
Add a Component reconcile transformer for PostgreSQL Patroni clusters that repairs missing pg_hba rules in Patroni dynamic config after failover.

The repair finds the current InstanceSet leader, reads the rendered pg_hba.conf ConfigMap when available, falls back to the required remote client/replication rules, patches Patroni /config, reloads Patroni, and verifies the resulting dynamic config.

Also watch Pod changes so leader/failover state changes can retrigger Component reconciliation.

Tests:
- go test ./controllers/apps -run 'Test(ParsePgHBAContent|MergePgHBARules|RepairPatroniPgHBA|RepairPatroniPgHBANoop|RepairPatroniPgHBAReloadsAfterPreviousFailure|EnsurePgHBARemoteRules|HTTPPatroniConfigClient|ComponentPatroniDCSRepairTransformer|ComponentPatroniDCSRepairTransformerFallbackPgHBA)$' -count=1
- go test ./controllers/apps -run 'Test(ParsePgHBAContent|MergePgHBARules|RepairPatroniPgHBA|RepairPatroniPgHBANoop|RepairPatroniPgHBAReloadsAfterPreviousFailure|EnsurePgHBARemoteRules|HTTPPatroniConfigClient|ComponentPatroniDCSRepairTransformer|ComponentPatroniDCSRepairTransformerFallbackPgHBA)$' -race -count=1
Add a Component reconcile transformer for PostgreSQL clusters that repairs drift between the standby password stored in pod pgpass files and the password stored in the PostgreSQL leader after failover.

The repair lists running component Pods, reads each Pod's /run/postgresql/pgpass entry for the standby user, stops automatic repair when pod passwords differ, finds the current InstanceSet leader, compares the expected md5 password hash in pg_authid, updates the standby role password when needed, and verifies the result.

Tests:
- go test ./controllers/apps -run 'Test(ParseStandbyPasswordFromPgpass|ConsistentStandbyPassword|ConsistentStandbyPasswordInconsistent|EnsureLeaderStandbyPassword|EnsureLeaderStandbyPasswordNoop|EnsureLeaderStandbyPasswordRejectsNewline|ComponentPostgreSQLStandbyPasswordRepairTransformer|ComponentPostgreSQLStandbyPasswordRepairTransformerInconsistent)$' -count=1
- go test ./controllers/apps -run 'Test(ParseStandbyPasswordFromPgpass|ConsistentStandbyPassword|ConsistentStandbyPasswordInconsistent|EnsureLeaderStandbyPassword|EnsureLeaderStandbyPasswordNoop|EnsureLeaderStandbyPasswordRejectsNewline|ComponentPostgreSQLStandbyPasswordRepairTransformer|ComponentPostgreSQLStandbyPasswordRepairTransformerInconsistent)$' -race -count=1
Handle PostgreSQL leaders that expose the standby credential through PGPASSWORD_STANDBY instead of a standby entry in /run/postgresql/pgpass, and skip automatic password repair for standby-cluster mode where the addon may use remote primary credentials.
@im0x0ing im0x0ing changed the title Repair PostgreSQL Patroni pg_hba and standby password drift fix: Repair PostgreSQL Patroni pg_hba and standby password drift May 28, 2026
@wallyxjh wallyxjh merged commit d9c84b5 into labring:fix/v0.9.3 Jun 1, 2026
4 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants