Skip to content

feat(resource,datasource): pg_config, password rotation, replicas, restore, data sources (Phases 3–5)#557

Draft
amogiska wants to merge 18 commits into
amogiska/managed-postgres-resourcefrom
amogiska/managed-postgres-phases-3to5
Draft

feat(resource,datasource): pg_config, password rotation, replicas, restore, data sources (Phases 3–5)#557
amogiska wants to merge 18 commits into
amogiska/managed-postgres-resourcefrom
amogiska/managed-postgres-phases-3to5

Conversation

@amogiska
Copy link
Copy Markdown

@amogiska amogiska commented Jun 1, 2026

Summary

Phases 3–5 of the alpha clickhouse_postgres_service resource, stacked on #553 (Phase 2) which is stacked on #544 (Phase 1). Merge #544 then #553 first — this PR's base is the Phase 2 branch so reviewers see only the Phase 3–5 diff. All work is behind //go:build alpha; stable builds exclude it.

Phase 3 — pg_config / pgbouncer_config

Terraform-owned, full-replacement runtime config via POST /config. Modeled as nested attributes (not blocks — matches the provider-wide convention; the provider has zero blocks) and Optional (not Optional+ComputedGET /config is sparse, and Computed would break clear-by-omission). Config updates are message-driven: the server's restart-required hint surfaces as a warning; no state polling.

Phase 4 — password rotation

password (Optional+Computed) plus write-only password_wo + password_wo_version. Rotation via PATCH /password (regular: on value change; write-only: on version bump). A config-aware ModifyPlan replaces UseStateForUnknown on password/connection_string — it pins the password to prior state only when the user didn't configure one (so server-generated values don't replan as unknown) while letting configured/interpolated passwords rotate, and marks connection_string unknown on rotation (it embeds the password). This fixes inconsistent-result/inconsistent-plan errors caught during live e2e.

Phase 5 — restore, read replicas, data sources, timeouts

  • read_replica_of (RequiresReplace; CreatePostgresReadReplica) and restore_to_point_in_time = { source_id, restore_target } (RequiresReplace; RestorePostgres), mutually exclusive; replica forbids password/password_wo.
  • timeouts {} block (create + update; delete uses the client's fixed budget).
  • Three alpha-gated data sources — clickhouse_postgres_service, clickhouse_postgres_services, clickhouse_postgres_service_ca_certificates — gated via pkg/datasource/register_{stable,debug}.go + provider.DataSources() delegation.
  • GetPostgresCaCertificates now routes through doRequest (gains retry/User-Agent/logging); the separately-planned doRawRequest proved unnecessary since doRequest returns bytes verbatim and formatLogBody tolerates non-JSON.

Notable plan deviations (full reasoning in phase-{three,four,five}-implementation.md)

  • pg_config/pgbouncer_config are nested attributes, Optional (not blocks, not Computed).
  • Phase 4 ModifyPlan is config-aware (the plan's "don't mark unrelated attrs unknown" understated the real need; e2e proved it essential).
  • doRawRequest dropped (route CA cert through doRequest).
  • Dependent-replica delete returns HTTP 200, not 409 (verified live) → the speculative fail-fast heuristic stays removed.

Test plan

  • go build -tags alpha ./... + go build ./... (stable) clean
  • go vet -tags alpha ./... clean
  • go test -tags alpha ./... + go test ./... green (helper-level unit tests: config diff, password-rotation decisions, request builders, validators)
  • make fmt clean; make docs-alpha renders the resource + 3 data-source pages (not committed, per the alpha-docs precedent)
  • Live e2e against the dev cluster (each phase; instances destroyed after):
    • Phase 3: create-with-config, update (restart warning surfaced), remove-param, clear-all, idempotency
    • Phase 4: regular / interpolated (random_password) / write-only password create + rotate (psql reconnects with the new password), version-as-trigger
    • Phase 5: read replica (is_primary=false, pg_is_in_recovery()=true, psql), all 3 data sources, CA cert PEM, timeouts block, dependent-replica delete behavior, and point-in-time restore (restored instance is a writable primary — pg_is_in_recovery()=falsepsql connects)
  • Phase 6 nightly CI e2e workflow (separate, upcoming)

amogiska added 18 commits May 28, 2026 11:22
Previously, TF_LOG=DEBUG logged the raw body of every HTTP request and
response sent through doRequest. Bodies for password rotation, service
create, and any future Postgres password / config flow contain plaintext
secrets (password, newPasswordHash, newDoubleSha1Hash, tokenSecret),
which then landed verbatim in the structured log output.

Add a redactSensitiveBody walker that JSON-decodes the body, replaces
values for a small set of sensitive keys with "REDACTED", and stamps
the entire subtree under secrets/credentials containers. Wrap it in
formatLogBody so the four log-call sites in doRequest collapse to two
single-line tflog.SetField calls. Malformed JSON yields a placeholder
rather than the raw bytes so the failure path can't leak either.
Hand-written API client for the ClickHouse Cloud Managed Postgres
endpoints under /v1/organizations/{orgId}/postgres. Follows the
existing service.go / clickpipe.go pattern: no codegen, ClientImpl
methods on a hand-written interface, minimock-generated mock.

Models (postgres_models.go) mirror the wire shapes in
control-plane/apps/openapi/src/protocol/v1/ManagedPostgresV1.ts.
storageSize is deliberately omitted (DEPRECATED server-side). State
constants match ManagedPostgresInstanceStatuses verbatim. PgConfigMap
has a custom UnmarshalJSON that accepts both string and numeric values
per the server's `[key: string]: string | number` shape.

Client methods (postgres.go):
- GET / LIST / CREATE / UPDATE / DELETE
- WaitForPostgresState (mirrors WaitForServiceState's 5s constant
  backoff) and WaitForPostgresLeaveAndReturn (new variant for
  post-PATCH transitions that may not have started yet)
- SetPostgresPassword
- GetPostgresConfig + ReplacePostgresConfig (POST, full replacement,
  matches Phase 0 curl-verified empty-map behavior) + UpdatePostgresConfig
  (PATCH, exposed for completeness)
- RestorePostgres + CreatePostgresReadReplica
- PostgresStateCommandSend (client-only; resource excludes operational
  commands in v1)
- GetPostgresCaCertificates (raw PEM, bypasses JSON envelope decoding)

DeletePostgres retries 409 conflicts for up to 15 min but fails fast
when the response indicates a dependent replica exists.

Tests (postgres_test.go + postgres_models_test.go): 42 cases covering
happy paths, 404 -> IsNotFound, 409 retry, 409-with-dependent fail
fast, rate-limit honoring, state transitions including unknown values,
password rotation idempotency, empty-map config POST, raw PEM round
trip, and the full state-command table.

interface.go extends Client with the 15 new method signatures;
client_mock.go is regenerated via `make mock`.
Tighten the dependent-replica heuristic and document known limitations
flagged by code review. Operational fixes (User-Agent on raw endpoint,
minimum observation window for transition detection, anchoring the
heuristic to a real server response) belong in the phases that have
real consumers and real wire data; FIXMEs in each location point to
the owning phase.

- errIndicatesDependentReplica: AND instead of OR, removed redundant
  "depend" arm. Was matching `"replica"` alone (e.g. "replication slot
  exists") and would have failed fast on transient conflicts. New
  negative test asserts a 409 containing "replica" but not "depend"
  is retried.

- WaitForPostgresLeaveAndReturn: doc note requiring callers to verify
  the mutating call returned 2xx before invoking. FIXME points Phase 2
  at the residual race (server hasn't started transitioning when
  leave-detection times out).

- GetPostgresCaCertificates: doc note enumerating the User-Agent / 429
  / 5xx / tflog gaps vs other methods. FIXME points Phase 5 at the
  doRawRequest refactor when the data source lands.
… clearing

Three issues surfaced in code review, all in scope for this PR
because the affected types and behaviors live here.

- common.go: add `connectionString` (and snake_case variant) to
  sensitiveBodyKeys. The Postgres create response embeds the generated
  password directly in the URI alongside the redacted password field,
  so TF_LOG=DEBUG was logging the credential through connectionString
  even with the field-level redaction in place. New test exercises
  the realistic create-response shape and asserts the password value
  never appears in formatted output via any path.

- postgres.go: WaitForPostgresLeaveAndReturn was swallowing real
  GetPostgres errors (404, 401/403, exhausted 5xx, context cancel)
  as "no-op success" because the phase-1 wait loop only checked
  whether the state had observed-ly left terminalState, not whether
  the polling itself failed. Introduce errPostgresStateUnchanged
  as a sentinel; only sentinel-caused budget exhaustion returns nil,
  every other error propagates. Two new tests: 404 throughout phase
  1 must return an IsNotFound-compatible error; the observed-stable
  case (server returns terminal cleanly) is still a no-op success.

- postgres_models.go: PostgresUpdate.Tags changes from []Tag to
  *[]Tag so callers can distinguish "leave existing tags alone"
  (nil) from "clear all tags" (&[]Tag{}). With omitempty on a plain
  slice, both intents marshal to the same `{}` body and an existing
  tagged Postgres service can't be cleared via PATCH. Two new tests
  cover the empty-list and populated-list shapes.
…as timeout

Mirror of the fix already applied to WaitForPostgresLeaveAndReturn.
Before: a 404 (or auth failure, exhausted 5xx, context cancel) during
the polling loop would exhaust the retry budget and the wrapper would
unconditionally rewrite the final error to
"postgres ... did not reach the expected state in the allocated time
(last seen state: )" — losing the real error and confusing callers
who needed IsNotFound or auth-error semantics.

Use a sentinel errPostgresStateNotYetTarget for the "polled OK,
stateChecker false" case. Only that sentinel triggers the
last-seen-state timeout rewrite; real errors from GetPostgres
propagate verbatim.

New test TestWaitForPostgresState_PropagatesGetErrors covers the
404-throughout scenario and asserts IsNotFound matches the returned
error and the error message does not contain "did not reach the
expected state".

Existing tests (TimesOutWithLastSeenState, UnknownStateDoesNotCrash,
TransitionsToRunning) continue to pass — the timeout-rewrite path is
unchanged when polling succeeded and the state simply didn't match.
YAGNI cleanup. Both removals were "shipped for completeness" with no
in-tree consumer planned in any later phase.

- UpdatePostgresConfig (PATCH-merge variant): the Phase 3 resource
  uses ReplacePostgresConfig (POST, full replacement) because the
  PATCH-merge semantics don't fit Terraform's declarative model.
  Nothing else calls the method. configMutate helper inlined into
  ReplacePostgresConfig.

- PostgresStateCommandSend + PostgresStateCommandRequest +
  PostgresCommand{Restart,Promote,Switchover} constants: the v1 plan
  explicitly excludes operational commands from the Terraform surface
  and the only "future tooling" justification is speculative. Easy
  to re-add (~15 lines) if a real consumer ever materializes.

Also trimmed verbose comments throughout to focus on intent/WHY rather
than restating WHAT. FIXMEs and load-bearing caller-precondition notes
are preserved verbatim.

Mock regenerated; 53 tests still pass; golangci-lint clean.
Implements Phase 2 of the Managed Postgres provider work: an alpha-tagged
clickhouse_postgres_service resource providing CRUD + Import on the
ClickHouse Cloud Managed Postgres API. Builds on the Phase 1 HTTP client.

Surface
- Schema attributes: id, name, cloud_provider, region, postgres_version,
  size, ha_type, tags (SetNestedAttribute of {key, value}), state,
  created_at, is_primary, hostname, port, username, connection_string
  (Sensitive), password (Sensitive, Computed-only in Phase 2).
- Lifecycle: Configure, Create (with explicit mid-Create partial-state
  write of id + server-generated password), Read (filters chc_-prefixed
  tags), Update (sparse PATCH; *[]Tag clears tags correctly; waits for
  size/ha_type transitions via WaitForPostgresLeaveAndReturn), Delete
  (thin wrapper — retry/poll lives in the client), ImportState by id.
- Validators: cloud_provider/postgres_version/ha_type/size OneOf, name
  length 1-50, chc_ tag prefix rejected at plan time.
- Constants snapshotted from cp-common protocol (82 VM_SPECS keys,
  PG_VERSIONS, POSTGRES_HA_TYPES, CLOUD_PROVIDERS).

Tests (helper-level, matching codebase convention)
- syncPostgresState round-trip with primary/HA/missing-IsPrimary cases
- apiTagsToSetValue chc_ filtering, empty/system-only/empty-value paths
- planTagsToAPI null/unknown/populated/null-value-attr paths
- buildPostgresUpdate diff matrix: no-op, size-only, ha-only, tags-only,
  tags-cleared (regression test for *[]Tag), combined
- notReservedTagPrefixValidator accept/reject/null/short-key
- isPostgresStateRunning forward-compat

Phase 2 deviations from the plan (documented in phase-two-implementation.md):
1. timeouts {} block deferred to Phase 5 (no existing resource uses it;
   restore/replica flows in Phase 5 will benefit from the same dependency).
   Phase 2 ships hardcoded 30m/30m/10m as package-level Go constants.
2. Lifecycle-level tests (Create/Update/Delete/Import) deferred to Phase 6
   integration tests (matches codebase convention — no resource_test.go in
   pkg/resource/ builds synthetic tfsdk.Plan/State).

Per the alpha-docs precedent (release.yaml:418-425 + cleanup-alpha-examples
job), docs/resources/postgres_service.md is generated by make docs-alpha
during release and is not committed to main. The embedded source lives at
pkg/resource/descriptions/postgres_service.md.

Refs: Phase 2 of clickhouse-postgres-service-tf-plan.md
Review-driven follow-up to 2af1b7c. Two real bugs plus dead code and
polish.

Bug fixes
- Reject empty-string tag values at plan time. The server normalizes
  "" to no-value, so apiTagsToSetValue would materialize that back as
  types.StringNull() — perpetual plan/state drift for any user who
  writes value = "". Add stringvalidator.LengthAtLeast(1) on the
  tags.value attribute and a TestTagValueValidator_RejectsEmptyString
  regression test. Description doc rewritten to direct users at null
  or omission instead.
- Drop booldefault.StaticBool(true) from is_primary. The Default
  field has no semantics on a Computed-only (non-Optional) attribute;
  syncPostgresState already supplies the fallback when the server
  omits the field.

Dead code removed
- int64validator.Between(1, 65535) on port (validators don't fire on
  Computed-only attributes).
- else-if final.Password != nil branch in Create — GetPostgres never
  echoes the password.
- plan.Password = state.Password in Update — UseStateForUnknown
  already carries the prior state value through to plan.

Polish
- strings.HasPrefix instead of hand-rolled slice compares in
  apiTagsToSetValue and notReservedTagPrefixValidator.
- Fix TestNotReservedTagPrefixValidator path construction: SetValue
  elements are Object values, not String values. Tests passed by
  accident because the validator doesn't inspect Path content.

Verification
- go build -tags alpha ./... clean
- go test -tags alpha ./... green (26 Postgres test cases, all pass)
- go test ./... (stable) green
- go vet -tags alpha ./... clean
- golangci-lint run clean
- make docs-alpha renders correctly
Bundles the open review concerns plus the data-loss bug caught live in
the Phase 2 e2e run against the dev cluster.

Bug: silent tag wipe on unrelated updates (e2e-caught, P0)
- Add setplanmodifier.UseStateForUnknown() to the tags SetNestedAttribute.
  Without it, the framework marks tags=(known after apply) on every plan,
  the resource layer receives Tags=Unknown in Update, planTagsToAPI
  returns nil, and buildPostgresUpdate treats nil as "clear all tags" —
  silently wiping any tags the user has set whenever they change any
  other attribute. Reproduced on the live resize test (r6gd.large →
  r6gd.xlarge); PATCH body included an unrequested "tags": [].
- Defense-in-depth: buildPostgresUpdate now explicitly skips the tags
  branch when plan.Tags.IsUnknown(). Even if a future regression drops
  the schema plan modifier, the diff logic refuses to clear tags while
  the plan is unresolved.
- Regression test: TestBuildPostgresUpdate/plan.Tags == Unknown is
  treated as 'leave server tags alone'.

Drift fix: explicit tags=[] is rejected at plan time
- Add setvalidator.SizeAtLeast(1). The server→state round-trip collapses
  empty arrays to SetNull (chc_-filtering produces an empty filtered
  list), so an explicit empty set in config would diff perpetually
  against null state. Users wanting no tags omit the attribute;
  UseStateForUnknown carries prior state forward.
- Tighten setvalidator.SizeAtMost to 50 (was 64) to match the server's
  MAX_TAGS_PER_RESOURCE constant.

Lifecycle test gap: extract buildPartialCreateState helper + unit test
- Move the mid-Create partial-state population from inline-in-Create
  into a standalone helper. Phase 2's "most novel piece" — the state
  write between CreatePostgres success and the post-wait re-read — is
  now testable without constructing synthetic tfsdk.Plan/State.
- TestBuildPartialCreateState covers: server-generated password
  branch, no-password branch (Phase 4 preview), user-set HaType /
  PostgresVersion / Tags preservation, and the critical "every other
  computed attr must be explicit Null (not Unknown)" contract.

Description: import-password gap, no-value PATCH constraint, no-empty-list
- Make the import gap impossible to miss: prominent warning callout
  with the connection_string workaround and Phase 4 plan.
- Document the server's PATCH-time "value is required" constraint
  (server-side 400 even though Create accepts no-value tags — Phase 2
  e2e finding).
- Document the new no-empty-list rule and explain why.

Polish (review nits)
- buildPostgresUpdate now returns a postgresUpdatePlan struct
  {Body, TransitionExpected} instead of three positional values. Read
  better at the call site. All tests updated.
- apiTagsToSetValue's internal-error message now names the concrete
  unexpected type instead of a generic "internal error".
- syncPostgresState empty-string-fallthrough behavior documented
  inline (deliberate: prefer prior state to silent overwrite).
- is_primary nil-defaulting-to-true behavior documented inline with
  a Phase 5 revisit pointer (mismarks a replica if server starts
  returning nil for replicas).

Verification
- go build -tags alpha ./... clean
- go test -tags alpha ./... green (37+ Postgres test cases pass)
- go test ./... (stable) green
- go vet -tags alpha ./... clean
- golangci-lint run ./... clean

E2E run captured in phase-two-e2e-notes.md (uncommitted, lives outside
the provider repo).
…round)

E2E v2 against the dev cluster (2026-05-28) surfaced a server-side PATCH
behavior that the original Phase 2 design did not account for:

  Sending PATCH /postgres/{id} with body {"size":"r6gd.xlarge"} (no
  tags field) returns 200 AND clears all server-side tags.

The provider's request body was correct — just {"size":...} — but the
server's downstream Ubicloud call treats a missing tags field as
clear-all rather than no-change. Confirmed against the dev cluster:
instance had two tags before the resize PATCH, server-side tags = []
immediately after, despite no tag-clear request from the provider.

This is asymmetric with size / ha_type, which the server DOES preserve
when omitted. The asymmetry isn't documented in the OpenAPI handler
validator. Worth a server-side ticket (FIXME placed inline); meanwhile
the provider works around it.

Fix
- buildPostgresUpdate now always re-asserts the current state's tags
  in the PATCH body whenever it's also mutating size or ha_type. If
  state has no tags, the field stays nil (safe — server has no tags
  to clear).
- diffTags helper extracted to make the three-way intent (no-change,
  clear, replace) explicit. Plan.Tags == Unknown still defends against
  a regression in UseStateForUnknown — when Unknown meets non-empty
  state, the workaround forces the state tags into the body.

Tests
- Existing TestBuildPostgresUpdate/plan.Tags == Unknown test was
  asserting the OLD (buggy) behavior; updated to assert the new
  server-clear-defense semantics.
- Two new regression subtests:
  - size-only diff with non-empty state tags → tags re-asserted
  - size-only diff with no state tags → tags stays nil

Description
- New "Server-side PUT-like tag semantics" callout in differences-from-
  clickhouse_service section. Users will see tags repeated in
  TF_LOG=DEBUG bodies on non-tag mutations; harmless and explains why.

Verification
- go build -tags alpha ./... clean
- go test -tags alpha ./... green (3 new test subtests pass; all
  previous tests still green)
- go test ./... (stable) green
- golangci-lint run ./... clean
Eight review concerns landed in one commit. Three were genuine blockers
that would have surfaced as real user pain post-merge; the rest improve
correctness / observability for the alpha.

High-severity (blockers)
- #1 state attribute UseStateForUnknown: without USFU the framework
  marked state as (known after apply) on every plan, forcing Update
  even on no-op applies — and the no-op branch wrote Unknown straight
  to final state, which the framework rejects with "Provider produced
  inconsistent result after apply." Every other persistent computed
  attribute already had USFU; this one was missed. Fix is the same
  one-liner used at clickpipe.go:179-185.
- #2 ha_type silent downgrade: Default("none") fires not just on
  Create but also when a user deletes the ha_type line on an existing
  resource — silently downgrading async/sync → none. A naive
  UseStateForUnknown won't help (Default beats USFU in plan), so the
  fix is to drop the Default entirely and add UseStateForUnknown.
  The server still defaults to "none" on Create when omitted, so
  fresh resources behave identically. An explicit ha_type = "none"
  still downgrades.
- #3 no-value tags break on PATCH: schema invited null values, but
  server returns 400 BAD_REQUEST on PATCH if any tag entry omits
  value (verified live in Phase 2 e2e v1 step 2). Create accepted
  them; an Update with a no-value tag in state would silently fail.
  Schema's value attribute is now Required to keep Create and Update
  behavior symmetric. Description updated.

Medium-severity
- #4 tag value mutation test: existing matrix never exercised
  changing the value on an existing key. Added the test plus a JSON
  wire-shape assertion for the tag-clear path ("tags": [] must reach
  the wire, not be omitted via marshalling).
- #5 size OneOf dropped: the 82-entry compile-time snapshot meant
  every new AWS instance family needed a provider patch release
  before users could adopt it. For the most-frequently-changed
  attribute the friction outweighed plan-time validation. Matches
  region's "server validates" pattern. The plan itself flagged this
  trade-off as worth dropping; the review made the case now to act.
- #6 context cancellation in waits: the wait helpers' outer
  backoff.Retry and doRequest's inner backoff both ignored ctx
  cancellation, so Ctrl-C / Terraform deadlines could spin up to
  the full retry budget. Wrapped both with backoff.WithContext.
  Regression test in pkg/internal/api/postgres_test.go asserts
  cancellation aborts within ~5s on a 5s-interval / 100-retry
  budget (would otherwise be 500s). Touches a Phase 1 file but
  the fix is strictly additive — all existing tests still pass.
- #7 log verbatim 409 body on dependent-replica fail-fast:
  errIndicatesDependentReplica is a speculative keyword heuristic
  (FIXME for Phase 6). Added a tflog.Warn capturing the verbatim
  error so Phase 6's dependent-replica integration test has real
  data to anchor on (or replace with a stable error code).

Nit
- Delete missing return after AddError: consistency with
  Create/Read/Update; framework already preserves state when
  diagnostics contain an error, but the missing return is a
  future-fall-through trap.

Documentation
- descriptions/postgres_service.md updated to reflect the schema
  changes: ha_type omission preserves prior state, value is now
  required on tags, size has no client-side enum, known-limitation
  text rewritten.

Verification
- go build -tags alpha ./... clean; go build ./... clean
- go test -tags alpha ./... green
- go test ./... (stable) green
- golangci-lint run ./... clean
The errIndicatesDependentReplica heuristic shipped without ever observing
the real server response. The Phase 2 resource doesn't support read
replicas (Phase 5 ships read_replica_of), so we couldn't trigger the
scenario from Terraform. The heuristic was guessing on three counts:

1. That the server returns HTTP 409 (might be 422, 400, or 500).
2. That the body contains the words "depend" AND "replica" (might be
   "child resource", "blocked by downstream", or structured errorCode).
3. That the scenario even produces an error at all (Ubicloud might
   cascade-delete or orphan the replica instead).

Three rounds of e2e tests against the dev cluster never exercised this
code path. The unit test asserted behavior against a synthetic 409 with
hand-written text matching the heuristic — proving only the matcher
works against its own anchor, not that the anchor reflects reality.

Removed:
- errIndicatesDependentReplica function
- The if-branch in deletePostgresWithBudget that called it
- tflog.Warn line in the same branch
- TestDeletePostgres_FailsFastOnDependentReplica (synthetic match)
- TestDeletePostgres_RetriesOn409WithoutDependentSignal (negative test)
- tflog + strings imports (now unused)
- Comment reference in pkg/resource/postgres_service.go Delete

Net behavior: 409s now uniformly retry for ~15 minutes. If Ubicloud
does refuse dependent-replica deletes with a 409, the user waits 15
min and then sees the error — annoying but bounded, and rare because
Terraform's dependency graph normally destroys replicas before primaries.

Reintroduction plan: Phase 5 ships read_replica_of, making the
primary+replica pair createable from Terraform. The first Phase 6
integration test that attempts a primary delete with a replica still
attached captures the real response. If it's a non-retryable 4xx with
a stable distinguishing field (errorCode, reason, structured JSON),
reintroduce the fail-fast anchored on that field — not a text match.

Verification
- go build -tags alpha ./... clean
- go test -tags alpha ./... green
- go test ./... (stable) green
- golangci-lint run ./... clean
Declarative Postgres + PgBouncer runtime configuration on
clickhouse_postgres_service, Terraform-owned with full-replacement
semantics via POST /config.

- Schema: pg_config / pgbouncer_config as SetNestedAttribute of
  {name, value} (Optional). Matches the provider-wide nested-attribute
  convention (the provider has 0 blocks); Optional (not Optional+Computed)
  because GET /config is sparse and Computed would break clear-by-omission.
- Create includes both maps in the body and fans out to GET /config;
  Read merges GET /config; Update diffs both maps and POSTs the full
  desired state. Config updates are message-driven (restart-required
  surfaced as a Warning), never state-polled.
- Helpers: planConfigToMap, apiConfigToSet, syncPostgresConfig,
  buildConfigUpdate; noDuplicateConfigNameValidator.
- Helper-level unit tests; lifecycle tests deferred to Phase 6 per
  established convention. Embedded description documents the behavior.
User-supplied passwords on clickhouse_postgres_service:
- password: Optional+Computed+Sensitive, complexity-validated, rotates via
  PATCH /password when changed.
- password_wo + password_wo_version: write-only password (never persisted to
  state); rotation triggered by bumping the version. ConflictsWith password.
- Create rotates to the user value (write-only preferred) after the instance
  is running; Update detects rotation via version bump or password diff.

ModifyPlan (new) replaces UseStateForUnknown on password and connection_string
with config-aware logic: pin password to prior state only when unconfigured
(so server-generated values don't replan as unknown, while interpolated user
passwords still rotate); mark connection_string unknown when a rotation is
planned (it embeds the password). This fixes inconsistent-result/inconsistent-
plan errors on rotation caught by the live e2e.

Helper-level unit tests (validators + decidePasswordOnCreate +
decidePasswordRotationOnUpdate + passwordRotationPlanned); lifecycle tests
deferred to Phase 6. Embedded description documents the three password modes.
…outs (Phase 5)

Resource:
- read_replica_of (RequiresReplace, ConflictsWith restore/password): Create
  calls CreatePostgresReadReplica.
- restore_to_point_in_time {source_id, restore_target} (RequiresReplace,
  ConflictsWith read_replica_of): Create calls RestorePostgres. Restored
  instance name = top-level name.
- Three-path Create switch (replica / restore / standard), shared wait+sync.
- timeouts {} block (create+update) wired into the create + update waits.

Data sources (alpha-gated via pkg/datasource/register_{stable,debug}.go;
provider.DataSources delegates to GetDataSourceFactories):
- clickhouse_postgres_service (by id), clickhouse_postgres_services (list),
  clickhouse_postgres_service_ca_certificates (PEM).

API: GetPostgresCaCertificates now routes through doRequest (gains
retry/User-Agent/logging); the separate doRawRequest is unnecessary since
doRequest returns bytes verbatim and formatLogBody tolerates non-JSON.

Dependent-replica: e2e confirmed the server returns 200 (not 409) when
deleting a primary with a replica, so the previously-removed fail-fast
heuristic correctly stays removed.

Helper-level unit tests for the request builders; live e2e validated read
replica (psql), all 3 data sources, CA cert, timeouts, and dependent-replica
delete behavior.
It is a SingleNestedAttribute, so HCL requires
restore_to_point_in_time = { ... }, not block syntax. The example showed
block syntax; corrected.
@amogiska amogiska force-pushed the amogiska/managed-postgres-resource branch from a11d326 to be6b706 Compare June 1, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant