fix(server): use explicit STS credentials for iceberg secret#562
Merged
Conversation
PR #560 wired the iceberg S3 secret to PROVIDER credential_chain, but DuckDB's built-in credential chain does not support EKS Pod Identity (the comment on fetchAWSSDKCredentials, server.go:2060, calls this out explicitly). On the PostHog multitenant deploy the worker pod has no AWS identity of its own at all — the duckgres-worker SA has no IRSA annotation, no Pod Identity association, no AWS env vars. So CREATE SECRET fails at validation time with: Invalid Configuration Error: Secret Validation Failure: during `create` using the following: Credential Chain: 'config' DuckLake doesn't hit this because the control plane runs STS:AssumeRole on the per-tenant IAM role and ships the resulting short-lived credentials inline in the worker activation payload. The DuckLake S3 secret is then built with PROVIDER config and those explicit credentials (see buildConfigSecret at server.go:1979). The per-tenant IAM role rendered by the duckling Crossplane Composition already grants both s3:* (on the data bucket) and s3tables:* (on the iceberg table bucket) when spec.iceberg.enabled=true. The credentials in cfg.DuckLake.S3AccessKey/SecretKey/SessionToken are therefore already authorized for S3 Tables — they just need to be threaded into the iceberg secret too. Changes: - server/iceberg/migration.go: BuildIcebergSecretStmt now accepts keyID/secret/sessionToken. When keyID is non-empty, emit PROVIDER config with the explicit credentials. When empty, fall back to PROVIDER credential_chain (preserves the standalone/host-AWS-identity path where DuckDB's chain can find env or IMDS credentials). - server/server.go: AttachIcebergCatalog takes the same three strings and forwards them. All three call sites (initial connect, shared warm activation, passthrough) pass cfg.DuckLake.S3AccessKey/SecretKey/ SessionToken — the same credentials DuckLake uses. - server/iceberg/migration_test.go: cover the new SQL shape (with and without session token), the credential_chain fallback, region default, and quote-escaping on attacker-controlled credential values. Known follow-up not addressed here: RefreshS3Secret (server.go:1906) rotates the DuckLake secret when STS credentials expire (~1h), but the iceberg secret is not refreshed alongside. Iceberg queries will start failing with 403 after the first STS rotation on a long-lived hot-idle worker. Fixing this needs a sibling RefreshIcebergSecret invoked from duckdbservice/activation.go's refresh path; scoped out of this PR to keep the diff focused on the immediate "iceberg attach fails on every activation" bug. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds an in-memory DuckDB integration test that loads the iceberg extension and exercises the actual CREATE SECRET DDL emitted by BuildIcebergSecretStmt. No AWS network access needed — DuckDB just parses and validates the secret. This is the regression coverage that PR #560 was missing: the bug there was a CREATE SECRET binder/validation error at the DuckDB layer, not something the existing string-comparison unit tests could catch. The new test would have failed on #560's old SQL form and would fail on any future revert. Explicit-credential cases (PROVIDER config with inlined KEY_ID/SECRET/ SESSION_TOKEN) are strict — must succeed cleanly because they don't depend on any host-side AWS state. The credential_chain fallback case tolerates a "Secret Validation Failure" outcome (which is exactly what the original #560 bug produced on EKS workers and what reproduces locally on hosts without AWS identity) while still catching parser/ binder/provider-mismatch errors that would indicate the SQL form itself regressed. The test self-skips if INSTALL/LOAD iceberg fails — covers air-gapped sandbox environments without losing coverage where it can run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Removes the credential_chain branch introduced in the previous commit on this PR. The fallback was hedging — it dressed up the broken DuckDB chain path (the very thing this PR exists to fix) as forward-compat. In the multitenant deploy the worker pod has no AWS identity, so a silent fallback would just defer the same failure to attach time and obscure the real misconfiguration. Explicit STS credentials minted by the control plane are the only supported auth model. If iceberg.Enabled is true and the activation payload lacks credentials, that is a configuration bug (broken STS broker, missing per-tenant IAM role, race in the activator) and should surface as a clear error at attach time, not a credential_chain SQL that will fail in DuckDB with an opaque "Secret Validation Failure". Changes: - BuildIcebergSecretStmt: no more empty-keyID branch; always emit PROVIDER config. - AttachIcebergCatalog: validate keyID/secret are non-empty when iceberg is enabled; return an error naming the table_bucket so logs identify which tenant is affected. - Tests: drop the credential_chain-fallback case; keep the with/without session-token, region default, and quote-escaping coverage. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The integration test added in 8213934 had a "credential chain fallback" subtest that exercised BuildIcebergSecretStmt with empty credentials and asserted DuckDB's response wasn't one of a handful of "fatal" parser errors. That made sense when empty creds produced a `PROVIDER credential_chain` secret. After this PR's removal of that fallback, empty creds produce `PROVIDER config, KEY_ID '', SECRET ''` instead — DuckDB accepts that without raising any of the listed fatal signals (it either creates a useless empty-cred secret or fails with a non-matching error string), so the subtest logged-and-passed while no longer testing anything meaningful. The strict explicit-credential subtests still cover what matters: the SQL DuckDB actually sees on the only live code path. Removing the dead subtest also removes the `strings` import since no other place uses it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3 tasks
bill-ph
added a commit
that referenced
this pull request
May 15, 2026
…563) Follow-up to #562. That PR fixed the initial-attach path so iceberg works at session activation, but the hot-idle credential refresh path in duckdbservice/activation.go only rotates the DuckLake S3 secret (ducklake_s3). The iceberg secret (iceberg_sigv4) shares the same STS-minted credentials but was never being re-emitted, so iceberg queries on a long-lived worker would 403 ~1h after activation while DuckLake stayed fresh. Changes: - server/server.go: new RefreshIcebergSecret. Mirrors RefreshS3Secret — no-op when iceberg is disabled, rejects empty credentials with the same error shape as AttachIcebergCatalog (no silent fallback to a credential_chain that we removed in #562), re-emits the iceberg_sigv4 secret via the existing BuildIcebergSecretStmt, and handles DuckDB's "Current transaction is aborted" recovery the same way the S3 refresh does. Importantly, RefreshIcebergSecret does NOT short-circuit when the iceberg catalog is already attached (unlike AttachIcebergCatalog). The whole point of refresh is to overwrite the existing secret in place; DuckDB resolves the secret at request time so the new credentials take effect for the next iceberg query without an explicit reattach. - duckdbservice/service.go: new refreshIcebergSecret indirection on SessionPool, defaulted to server.RefreshIcebergSecret. Same shape as refreshS3Secret so tests can inject a stub. - duckdbservice/activation.go (reuseExistingActivation): two changes. 1. The needsRefresh predicate now triggers when Iceberg.Enabled is true *or* DuckLake.ObjectStore is set. Previously, an iceberg-only tenant (DuckLake metadata-only, S3 bucket empty, iceberg enabled) would skip refresh entirely because ObjectStore=="" gated it out — even though Iceberg.Enabled=true and the credentials WERE in the payload (the activator populates DuckLake.S3* from STS regardless of whether DuckLake itself uses S3). 2. The refresh block now invokes both refresh functions, each gated on its own enable check. They share the same DuckLake.S3* triple because the per-tenant IAM role grants both s3:* and s3tables:* and STS returns one credential bundle covering both. - server/iceberg_refresh_test.go: three new regression tests run in the server package (so they can exercise the exported RefreshIcebergSecret while the iceberg subpackage stays free of server-package imports). * TestRefreshIcebergSecretRotatesCredentials uses an in-memory DuckDB with the iceberg extension to verify back-to-back CREATE OR REPLACE with different credentials both succeed — the exact behavior STS rotation depends on. Skips when the extension can't be installed (air-gapped CI), same posture as the existing integration test. * TestRefreshIcebergSecretNoOpWhenDisabled confirms the activation layer can call this unconditionally without harm when the tenant hasn't opted in to iceberg. * TestRefreshIcebergSecretRejectsEmptyCredentials locks in the "fail loud, no silent fallback" invariant for the refresh path, matching what #562 established for the attach path. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR #560 wired the iceberg S3 secret to
PROVIDER credential_chain. DuckDB's built-in credential chain does not support EKS Pod Identity (the comment onfetchAWSSDKCredentialscalls this out explicitly), and on the PostHog multitenant deploy the worker pod has no AWS identity of its own at all — theduckgres-workerSA has no IRSA annotation, no Pod Identity association, no AWS env vars. So `CREATE SECRET` fails at validation time:DuckLake doesn't hit this because the control plane runs `STS:AssumeRole` on the per-tenant IAM role and ships the resulting short-lived credentials inline in the worker activation payload. The DuckLake S3 secret is then built with `PROVIDER config` and those explicit credentials (see `buildConfigSecret`).
The per-tenant IAM role rendered by the duckling Crossplane Composition already grants both `s3:` (data bucket) and `s3tables:` (iceberg table bucket) when `spec.iceberg.enabled=true`. The credentials in `cfg.DuckLake.S3AccessKey/SecretKey/SessionToken` are already authorized for S3 Tables — they just need to be threaded into the iceberg secret too.
What changed
Why no credential_chain fallback?
Earlier iterations of this PR included a fallback: when `keyID` was empty, emit a `PROVIDER credential_chain` secret as before. It was removed because:
`buildCredentialChainSecret` (for DuckLake, in `server.go`) is a separate code path used by the Ansible single-tenant EC2 deploy (`posthog-cloud-infra/ansible/roles/duckgres/templates/duckgres.yaml.j2:36`) and is not affected by this PR. Iceberg never had a live deployment using credential_chain.
Known follow-up (not in this PR)
`RefreshS3Secret` (server.go:1906) rotates the DuckLake secret when STS credentials expire (~1h), but no equivalent yet rotates the iceberg secret. After this PR lands, a long-lived hot-idle worker will start 403'ing iceberg queries after the first STS rotation. Fix is a sibling `RefreshIcebergSecret` invoked from `duckdbservice/activation.go`'s refresh path. Scoped out here for an immediate follow-up PR.
Other follow-ups identified during this review:
Test plan
🤖 Generated with Claude Code