Skip to content

feat(api): add Managed Postgres HTTP client#544

Open
amogiska wants to merge 8 commits into
mainfrom
amogiska/managed-postgres-api-client
Open

feat(api): add Managed Postgres HTTP client#544
amogiska wants to merge 8 commits into
mainfrom
amogiska/managed-postgres-api-client

Conversation

@amogiska
Copy link
Copy Markdown

Summary

  • Hand-written HTTP client for the ClickHouse Cloud Managed Postgres endpoints under /v1/organizations/{orgId}/postgres. Mirrors the existing service.go / clickpipe.go pattern: models in postgres_models.go, methods on *ClientImpl in postgres.go, interface extension in interface.go, regenerated client_mock.go.
  • 15 client methods covering GET / LIST / CREATE / UPDATE / DELETE plus restore, read replica, password rotation, config get/replace/update, state command, raw PEM CA certificates, and two wait helpers (WaitForPostgresState mirrors WaitForServiceState; WaitForPostgresLeaveAndReturn is a new variant for post-PATCH transitions that may not have started yet).
  • DeletePostgres retries 409 conflicts for ~15 min but fails fast when the response indicates a dependent replica.
  • PgConfigMap has a custom UnmarshalJSON that accepts both string and numeric values, matching the server's [key: string]: string \| number shape.
  • Lands a separate redaction patch to common.go:doRequest first: sensitive body fields (password, newPassword, newPasswordHash, newDoubleSha1Hash, password_wo, tokenSecret, plus secrets / credentials containers) are stamped to "REDACTED" before going to tflog. Fixes a pre-existing leak where TF_LOG=DEBUG during a clickhouse_service password rotation logged the plaintext hash.

Test plan

  • go test ./pkg/internal/api/... — 62 new cases (20 redaction + 42 Postgres) green
  • go test ./... — whole module green
  • go vet ./... clean
  • golangci-lint run ./... (pinned v1.64.8 from Makefile) clean
  • make mock regenerated client_mock.go cleanly; all 15 new methods present
  • Pre-commit hook (make fmt && make docs && make build) green on both commits

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this is generated

@amogiska amogiska force-pushed the amogiska/managed-postgres-api-client branch from c3c2cc9 to 16cf999 Compare May 28, 2026 18:24
@amogiska amogiska marked this pull request as ready for review June 1, 2026 15:27
@amogiska amogiska requested a review from a team as a code owner June 1, 2026 15:27
Copy link
Copy Markdown

@theory theory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it will work! Apologies for the panoply of comments; I assume you're new to Go, so made some suggestions, but also asked a bunch of questions where I found things confusing. I hope it's mostly helpful and not too annoying 😂

Comment thread pkg/internal/api/common.go
Comment thread pkg/internal/api/common.go Outdated
Comment thread pkg/internal/api/common.go Outdated
t[k] = redactJSONValue(child)
}
return t
case []interface{}:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer the newer and nicer-to-read any (unless the rest of the code base is littered with []inteface{}s.

Copy link
Copy Markdown
Author

@amogiska amogiska Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swapped

)

func TestRedactSensitiveBody(t *testing.T) {
cases := []struct {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests for slices that I can see. Also worthwhile throwing in bools, numbers and nulls to make sure they're all passed through properly.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added bool/number/null pass-through cases. Slices were already covered in array of objects: each element's sensitive key redacted exercises arrays of objects, and the top-level ["a","b","c"] case covers arrays of scalars. Null on a sensitive key is in non-string sensitive value still redacted.

}

for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using t.Parallel() in all tests so accelerate testing.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repo doesn't use t.Parallel() as a convention and the api suite runs in under 2s so skipping for now

Comment thread pkg/internal/api/postgres.go Outdated
Comment on lines +341 to +349
respBody, err := c.doRequest(ctx, req)
if err != nil {
return nil, err
}
resp := ResponseWithResult[PostgresConfig]{}
if err := json.Unmarshal(respBody, &resp); err != nil {
return nil, err
}
return &resp.Result, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern repeats a lot. Consider replacing it with a function

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing code has the same repeated pattern. will do a follow up to refactor to reduce repated code.

Comment thread pkg/internal/api/postgres_test.go
Comment thread pkg/internal/api/postgres_test.go Outdated
Comment thread pkg/internal/api/postgres_test.go Outdated
@theory
Copy link
Copy Markdown

theory commented Jun 1, 2026 via email

Comment thread pkg/internal/api/postgres_models_test.go Outdated
// Postgres instance state values. Mirrors ManagedPostgresInstanceStatuses
// in packages/cp-common/src/protocol/postgres/ManagedPostgres.ts:59-68.
const (
PostgresStateCreating = "creating"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should document which of these states are on primary only and which are on standby only

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked both ManagedPostgres.ts and the OpenAPI doc. It's a flat enum with a generic "Current state of the service" description. The primary-vs-standby breakdown isn't documented at the source, it's Ubicloud behavior. Rather not commit inferences as Go comments that could drift wrong. If we can get authoritative docs upstream, I'll backfill the annotations here.

Comment thread pkg/internal/api/common.go
Comment thread pkg/internal/api/postgres.go Outdated
Comment thread pkg/internal/api/postgres.go Outdated
Comment thread pkg/internal/api/common.go Outdated
Comment thread pkg/internal/api/postgres.go Outdated
Comment thread pkg/internal/api/postgres.go Outdated
Comment thread pkg/internal/api/postgres.go Outdated
Comment thread pkg/internal/api/postgres_test.go Outdated
@amogiska amogiska requested a review from heavycrystal June 1, 2026 20:30
amogiska added 8 commits June 1, 2026 14:02
Previously, TF_LOG=DEBUG logged the raw body of every HTTP request and
response sent through doRequest. Bodies for password rotation, service
create, and any future Postgres password / config flow contain plaintext
secrets (password, newPasswordHash, newDoubleSha1Hash, tokenSecret),
which then landed verbatim in the structured log output.

Add a redactSensitiveBody walker that JSON-decodes the body, replaces
values for a small set of sensitive keys with "REDACTED", and stamps
the entire subtree under secrets/credentials containers. Wrap it in
formatLogBody so the four log-call sites in doRequest collapse to two
single-line tflog.SetField calls. Malformed JSON yields a placeholder
rather than the raw bytes so the failure path can't leak either.
Hand-written API client for the ClickHouse Cloud Managed Postgres
endpoints under /v1/organizations/{orgId}/postgres. Follows the
existing service.go / clickpipe.go pattern: no codegen, ClientImpl
methods on a hand-written interface, minimock-generated mock.

Models (postgres_models.go) mirror the wire shapes in
control-plane/apps/openapi/src/protocol/v1/ManagedPostgresV1.ts.
storageSize is deliberately omitted (DEPRECATED server-side). State
constants match ManagedPostgresInstanceStatuses verbatim. PgConfigMap
has a custom UnmarshalJSON that accepts both string and numeric values
per the server's `[key: string]: string | number` shape.

Client methods (postgres.go):
- GET / LIST / CREATE / UPDATE / DELETE
- WaitForPostgresState (mirrors WaitForServiceState's 5s constant
  backoff) and WaitForPostgresLeaveAndReturn (new variant for
  post-PATCH transitions that may not have started yet)
- SetPostgresPassword
- GetPostgresConfig + ReplacePostgresConfig (POST, full replacement,
  matches Phase 0 curl-verified empty-map behavior) + UpdatePostgresConfig
  (PATCH, exposed for completeness)
- RestorePostgres + CreatePostgresReadReplica
- PostgresStateCommandSend (client-only; resource excludes operational
  commands in v1)
- GetPostgresCaCertificates (raw PEM, bypasses JSON envelope decoding)

DeletePostgres retries 409 conflicts for up to 15 min but fails fast
when the response indicates a dependent replica exists.

Tests (postgres_test.go + postgres_models_test.go): 42 cases covering
happy paths, 404 -> IsNotFound, 409 retry, 409-with-dependent fail
fast, rate-limit honoring, state transitions including unknown values,
password rotation idempotency, empty-map config POST, raw PEM round
trip, and the full state-command table.

interface.go extends Client with the 15 new method signatures;
client_mock.go is regenerated via `make mock`.
Tighten the dependent-replica heuristic and document known limitations
flagged by code review. Operational fixes (User-Agent on raw endpoint,
minimum observation window for transition detection, anchoring the
heuristic to a real server response) belong in the phases that have
real consumers and real wire data; FIXMEs in each location point to
the owning phase.

- errIndicatesDependentReplica: AND instead of OR, removed redundant
  "depend" arm. Was matching `"replica"` alone (e.g. "replication slot
  exists") and would have failed fast on transient conflicts. New
  negative test asserts a 409 containing "replica" but not "depend"
  is retried.

- WaitForPostgresLeaveAndReturn: doc note requiring callers to verify
  the mutating call returned 2xx before invoking. FIXME points Phase 2
  at the residual race (server hasn't started transitioning when
  leave-detection times out).

- GetPostgresCaCertificates: doc note enumerating the User-Agent / 429
  / 5xx / tflog gaps vs other methods. FIXME points Phase 5 at the
  doRawRequest refactor when the data source lands.
… clearing

Three issues surfaced in code review, all in scope for this PR
because the affected types and behaviors live here.

- common.go: add `connectionString` (and snake_case variant) to
  sensitiveBodyKeys. The Postgres create response embeds the generated
  password directly in the URI alongside the redacted password field,
  so TF_LOG=DEBUG was logging the credential through connectionString
  even with the field-level redaction in place. New test exercises
  the realistic create-response shape and asserts the password value
  never appears in formatted output via any path.

- postgres.go: WaitForPostgresLeaveAndReturn was swallowing real
  GetPostgres errors (404, 401/403, exhausted 5xx, context cancel)
  as "no-op success" because the phase-1 wait loop only checked
  whether the state had observed-ly left terminalState, not whether
  the polling itself failed. Introduce errPostgresStateUnchanged
  as a sentinel; only sentinel-caused budget exhaustion returns nil,
  every other error propagates. Two new tests: 404 throughout phase
  1 must return an IsNotFound-compatible error; the observed-stable
  case (server returns terminal cleanly) is still a no-op success.

- postgres_models.go: PostgresUpdate.Tags changes from []Tag to
  *[]Tag so callers can distinguish "leave existing tags alone"
  (nil) from "clear all tags" (&[]Tag{}). With omitempty on a plain
  slice, both intents marshal to the same `{}` body and an existing
  tagged Postgres service can't be cleared via PATCH. Two new tests
  cover the empty-list and populated-list shapes.
…as timeout

Mirror of the fix already applied to WaitForPostgresLeaveAndReturn.
Before: a 404 (or auth failure, exhausted 5xx, context cancel) during
the polling loop would exhaust the retry budget and the wrapper would
unconditionally rewrite the final error to
"postgres ... did not reach the expected state in the allocated time
(last seen state: )" — losing the real error and confusing callers
who needed IsNotFound or auth-error semantics.

Use a sentinel errPostgresStateNotYetTarget for the "polled OK,
stateChecker false" case. Only that sentinel triggers the
last-seen-state timeout rewrite; real errors from GetPostgres
propagate verbatim.

New test TestWaitForPostgresState_PropagatesGetErrors covers the
404-throughout scenario and asserts IsNotFound matches the returned
error and the error message does not contain "did not reach the
expected state".

Existing tests (TimesOutWithLastSeenState, UnknownStateDoesNotCrash,
TransitionsToRunning) continue to pass — the timeout-rewrite path is
unchanged when polling succeeded and the state simply didn't match.
YAGNI cleanup. Both removals were "shipped for completeness" with no
in-tree consumer planned in any later phase.

- UpdatePostgresConfig (PATCH-merge variant): the Phase 3 resource
  uses ReplacePostgresConfig (POST, full replacement) because the
  PATCH-merge semantics don't fit Terraform's declarative model.
  Nothing else calls the method. configMutate helper inlined into
  ReplacePostgresConfig.

- PostgresStateCommandSend + PostgresStateCommandRequest +
  PostgresCommand{Restart,Promote,Switchover} constants: the v1 plan
  explicitly excludes operational commands from the Terraform surface
  and the only "future tooling" justification is speculative. Easy
  to re-add (~15 lines) if a real consumer ever materializes.

Also trimmed verbose comments throughout to focus on intent/WHY rather
than restating WHAT. FIXMEs and load-bearing caller-precondition notes
are preserved verbatim.

Mock regenerated; 53 tests still pass; golangci-lint clean.
Single rollup commit for inline review feedback from @theory.

Style and idiom:
- interface{} → any in common.go (matches dominant repo style)
- Drop hand-rolled jsonStringContains; use strings.Contains
- "iff" → "if" in two doc comments
- Drop nonNegU64; inline uint64() with //nolint:gosec to match
  the convention used in service.go:125
- Trim verbose doc comments; rewrite phase-tagged FIXMEs to be
  self-contained
- "fail-fasts" → "fails fast" grammar fix

Type cleanup:
- Postgres.Username, Postgres.Password, PostgresPassword.Password
  flipped from *string to plain string. Matches ClickPipe.Password
  and ServicePasswordUpdateResult.Password. The pointer was solving
  a problem that doesn't exist — the server validator rejects empty
  passwords, so omitempty on string gives the two meaningful states
  (absent → server generates, set → server adopts) without ambiguity.
- CreatePostgres return type updated from *string to string for the
  password component. Defensive copy removed (Go strings are
  immutable, so the copy was no-op).

Error wrapping:
- Marshal/unmarshal errors wrapped with fmt.Errorf("…: %w", err) at
  10 sites to match the clickpipe.go pattern. doRequest errors stay
  raw because IsNotFound / IsConflict / is5xx use strings.HasPrefix.

Test improvements:
- Switch PgConfigMap.UnmarshalJSON to json.Decoder.UseNumber() + type
  switch instead of two-pass try-unmarshal-with-fallback
- Add bool/number/null pass-through cases to TestRedactSensitiveBody
- Add reject tests for null, object, array values on PgConfigMap
- Rename misleading test names (RoundTrips → Unmarshals, ClearsTags
  → MarshalsAsEmptyArray, OmitsAbsentableFields → OmitsNilPointerFields)
- Remove tests redundant with type definitions (DoesNotIncludeNameField,
  NilTagsOmitsField)
- Add t.Fatalf error checks on the four request-body decode sites in
  tests where ignored errors would mask the real assertion failure
- Add doc comment explaining why CreatePostgres returns empty password
  when the user supplied their own
- Soften docstring claim about GET endpoint not echoing password —
  per swagger, it's optional in the response

client_mock.go regenerated via make mock for the CreatePostgres
return-type change.
Rollup commit for @heavycrystal's inline comments.

Style and naming:
- Tighten the connectionString sensitive-key comment to a single line
- Rename postgresDeleteRetryBudgetSecs (int) to postgresDeleteRetryAttempts
  (uint64). Eliminates the seconds/interval arithmetic and the matching
  //nolint:gosec at the DeletePostgres call site
- Rename deletePostgresWithBudget to deletePostgresWithInterval so all
  three test seams share the *WithInterval naming and (interval,
  maxRetries) signature
- Rename WaitForPostgresLeaveAndReturn to
  WaitForPostgresStateTransitionAndReturn (and the matching test seam +
  3 tests) for clearer intent

Correctness:
- PgConfigMap.UnmarshalJSON now builds into a local map and assigns to
  the receiver only after the full pass succeeds. A mid-loop type
  mismatch no longer leaves the receiver in a half-populated state
- Remove dead branch in waitForPostgresStateTransitionAndReturnWithInterval:
  with maxRetries >= 2, halfBudget = maxRetries / 2 is always >= 1, so
  the if halfBudget < 1 guard is unreachable
- formatLogBody now takes a ctx and tflog.Warn-s if json.Indent fails on
  already-redacted output, instead of silently returning the placeholder.
  The branch is still unreachable by construction, but if anything ever
  breaks that invariant, an operator running TF_LOG=DEBUG will see the
  warning

Test changes:
- Remove TestPostgresState_ConstantsMatchWireValues — it only compared
  constants to literals I wrote myself; would not catch a server-side
  rename
- Convert all `var calls int32` + `atomic.AddInt32(&calls, 1)` to
  `var calls atomic.Int32` + `calls.Add(1)` / `calls.Load()` so `calls++`
  is a compile error

Mock regenerated for the WaitForPostgres*StateTransitionAndReturn rename.
@amogiska amogiska force-pushed the amogiska/managed-postgres-api-client branch from dc6f705 to 9795ee1 Compare June 1, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants