Skip to content

fix(server): rotate iceberg secret alongside DuckLake on STS expiry#563

Merged
bill-ph merged 1 commit into
mainfrom
claude/iceberg-refresh-on-sts-rotation
May 15, 2026
Merged

fix(server): rotate iceberg secret alongside DuckLake on STS expiry#563
bill-ph merged 1 commit into
mainfrom
claude/iceberg-refresh-on-sts-rotation

Conversation

@bill-ph
Copy link
Copy Markdown
Collaborator

@bill-ph bill-ph commented May 15, 2026

Summary

Follow-up to #562. That PR fixed the initial-attach path so iceberg works at session activation, but the hot-idle credential-refresh path in `duckdbservice/activation.go` only rotates the DuckLake S3 secret (`ducklake_s3`). The iceberg secret (`iceberg_sigv4`) shares the same STS-minted credentials but was never being re-emitted, so iceberg queries on a long-lived worker would 403 ~1h after activation while DuckLake stayed fresh.

This PR adds the missing rotation.

What changed

`server/server.go` — new `RefreshIcebergSecret`

Mirrors `RefreshS3Secret`:

  • No-op when iceberg is disabled.
  • Rejects empty credentials with the same error shape as `AttachIcebergCatalog` (no silent fallback to credential_chain — that path was removed in fix(server): use explicit STS credentials for iceberg secret #562).
  • Re-emits the `iceberg_sigv4` secret via the existing `BuildIcebergSecretStmt`.
  • Handles DuckDB's "Current transaction is aborted" recovery the same way the S3 refresh does.

Importantly, `RefreshIcebergSecret` does not short-circuit when the iceberg catalog is already attached (unlike `AttachIcebergCatalog`). The whole point of refresh is to overwrite the existing secret in place; DuckDB resolves secrets at request time, so the new credentials take effect for the next iceberg query without an explicit reattach.

`duckdbservice/service.go` — `refreshIcebergSecret` indirection

New field on `SessionPool`, defaulted to `server.RefreshIcebergSecret`. Same shape as the existing `refreshS3Secret` so tests can inject a stub.

`duckdbservice/activation.go` (`reuseExistingActivation`)

Two changes:

  1. `needsRefresh` predicate now triggers when `Iceberg.Enabled` or `DuckLake.ObjectStore` is set. Previously, an iceberg-only tenant (DuckLake metadata-only, S3 bucket empty, iceberg enabled — e.g. `test-org-smoke-1778167994` in dev) would skip refresh entirely because `ObjectStore==""` gated it out — even though `Iceberg.Enabled=true` and the credentials were in the payload. The activator populates `DuckLake.S3*` from STS regardless of whether DuckLake itself uses S3.

  2. Refresh block now invokes both refresh functions, each gated on its own enable check. They share the same `DuckLake.S3*` triple because the per-tenant IAM role grants both `s3:` and `s3tables:` and STS returns one credential bundle covering both.

`server/iceberg_refresh_test.go` — new tests

Three tests in the server package (so they can exercise the exported `RefreshIcebergSecret` while the `server/iceberg` subpackage stays free of `server`-package imports):

  • `TestRefreshIcebergSecretRotatesCredentials` uses an in-memory DuckDB with the iceberg extension to verify back-to-back `CREATE OR REPLACE` with different credentials both succeed — the exact behavior STS rotation depends on. Skips when the extension can't be installed (air-gapped CI), same posture as the existing integration test.
  • `TestRefreshIcebergSecretNoOpWhenDisabled` confirms the activation layer can call this unconditionally without harm when the tenant hasn't opted in to iceberg.
  • `TestRefreshIcebergSecretRejectsEmptyCredentials` locks in the "fail loud, no silent fallback" invariant for the refresh path, matching what fix(server): use explicit STS credentials for iceberg secret #562 established for the attach path.

Test plan

  • `go test ./server/ -run RefreshIceberg -v` — three new tests run (not skipped) and pass
  • `go test ./server/... ./duckdbservice/...` — full regression sweep including existing `TestReuseExistingActivationDoesNotBlockHealthChecks` (which exercises the refresh path with a stub)
  • On dev, after this lands: let a worker for an iceberg-enabled tenant sit hot-idle past STS expiry (~1h) without a query, then run an iceberg query and verify it doesn't 403. Before this PR: 403 after ~1h. After: should keep working indefinitely (until the worker is naturally retired).

🤖 Generated with Claude Code

Follow-up to #562. That PR fixed the initial-attach path so iceberg
works at session activation, but the hot-idle credential refresh path
in duckdbservice/activation.go only rotates the DuckLake S3 secret
(ducklake_s3). The iceberg secret (iceberg_sigv4) shares the same
STS-minted credentials but was never being re-emitted, so iceberg
queries on a long-lived worker would 403 ~1h after activation while
DuckLake stayed fresh.

Changes:

- server/server.go: new RefreshIcebergSecret. Mirrors RefreshS3Secret —
  no-op when iceberg is disabled, rejects empty credentials with the
  same error shape as AttachIcebergCatalog (no silent fallback to a
  credential_chain that we removed in #562), re-emits the iceberg_sigv4
  secret via the existing BuildIcebergSecretStmt, and handles DuckDB's
  "Current transaction is aborted" recovery the same way the S3 refresh
  does.

  Importantly, RefreshIcebergSecret does NOT short-circuit when the
  iceberg catalog is already attached (unlike AttachIcebergCatalog). The
  whole point of refresh is to overwrite the existing secret in place;
  DuckDB resolves the secret at request time so the new credentials
  take effect for the next iceberg query without an explicit reattach.

- duckdbservice/service.go: new refreshIcebergSecret indirection on
  SessionPool, defaulted to server.RefreshIcebergSecret. Same shape as
  refreshS3Secret so tests can inject a stub.

- duckdbservice/activation.go (reuseExistingActivation): two changes.

  1. The needsRefresh predicate now triggers when Iceberg.Enabled is
     true *or* DuckLake.ObjectStore is set. Previously, an iceberg-only
     tenant (DuckLake metadata-only, S3 bucket empty, iceberg enabled)
     would skip refresh entirely because ObjectStore=="" gated it out —
     even though Iceberg.Enabled=true and the credentials WERE in the
     payload (the activator populates DuckLake.S3* from STS regardless
     of whether DuckLake itself uses S3).

  2. The refresh block now invokes both refresh functions, each gated
     on its own enable check. They share the same DuckLake.S3* triple
     because the per-tenant IAM role grants both s3:* and s3tables:*
     and STS returns one credential bundle covering both.

- server/iceberg_refresh_test.go: three new regression tests run in the
  server package (so they can exercise the exported RefreshIcebergSecret
  while the iceberg subpackage stays free of server-package imports).

  * TestRefreshIcebergSecretRotatesCredentials uses an in-memory DuckDB
    with the iceberg extension to verify back-to-back CREATE OR REPLACE
    with different credentials both succeed — the exact behavior STS
    rotation depends on. Skips when the extension can't be installed
    (air-gapped CI), same posture as the existing integration test.
  * TestRefreshIcebergSecretNoOpWhenDisabled confirms the activation
    layer can call this unconditionally without harm when the tenant
    hasn't opted in to iceberg.
  * TestRefreshIcebergSecretRejectsEmptyCredentials locks in the
    "fail loud, no silent fallback" invariant for the refresh path,
    matching what #562 established for the attach path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@bill-ph bill-ph merged commit 12a9304 into main May 15, 2026
22 checks passed
@bill-ph bill-ph deleted the claude/iceberg-refresh-on-sts-rotation branch May 15, 2026 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant