Skip to content

Commit cedafa4

Browse files
fuziontechclaude
andauthored
fix(s3): survive STS token expiry — freshness floor + httpfs fork credential refresh proof (#778)
* fix(controlplane): raise STS freshness floor to 35min; surface expiry on config-store aws path DuckDB statements capture S3 credentials when they start executing (secret lookups resolve against the statement's MVCC snapshot), so the credential- refresh scheduler's CREATE OR REPLACE SECRET push never reaches an in-flight statement. With a 10min cache safety margin, a statement could begin on near-expiry cached STS creds and die with ExpiredToken after as little as 10 minutes — which is what killed a long-running events-backfill DELETE. - Raise stsCacheSafetyMargin 10min -> 35min: every credential capture point (worker activation, scheduler refresh push) now hands out creds with at least 35min of runway, and since the margin exceeds the scheduler's 30min lookahead, a refresh push always stamps the worker out of the "due" window — eliminating the 5s identical-creds re-push churn observed in prod (~700 push events / 15min). - Move credentialRefreshLookahead to package scope and add a compile-time guard that fails the build if the margin ever drops to/below the lookahead. - Fix a latent bug: the config-store "aws" provider path brokers STS creds but returned no expiration, leaving S3CredentialsExpiresAt nil — the refresh scheduler never refreshed those workers, so any session outliving the 1h token died. The path now returns the STS expiry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(s3): pin no-mid-statement-credential-recovery; STS session-duration knob; always pin SESSION_TOKEN Investigation result (REFUTES the rotating-credential-file design explored on this branch): DuckDB v1.5.3's refresh-on-403 hook (TryRefreshS3Secret) lives only in S3FileHandle::Initialize and fires only when the file open performs a network request. It never does for scan workloads: the S3 glob (ListObjectsV2), DuckLake, and Iceberg file lists all pre-populate file_size/etag/last_modified in extended_info, which marks the handle initialized and skips the HEAD — the first auth failure surfaces on a range GET in the read path, which has no refresh handling, and the statement dies. Verified empirically against MinIO (an in-flight 48-file scan died on a newly-opened file's first GET despite the credential file being rewritten and the secret CREATE OR REPLACE'd first) and at source level in PostHog/duckdb-httpfs v1.5.3-stoi-fix, duckdb/ducklake, and duckdb/duckdb-iceberg v1.5. The credential_chain + rotating-file mechanism would therefore only guarantee freshness for NEW statements — which plain CREATE OR REPLACE SECRET rotation already achieves — so it is not included; the 35-min freshness floor (26fede7) remains the only (and honest) guarantee. What this commit ships instead: - tests/integration/credential_rotation_pin_test.go: pins the limitation the freshness floor is designed around — an in-flight scan does NOT survive credential rotation (MVCC secret snapshot + no HEAD at open), while a fresh statement on the rotated secret does. If a DuckDB upgrade ever makes the in-flight scan survive, the test fails loudly telling us to revisit the floor. - DUCKGRES_STS_SESSION_DURATION (env-only): overrides the 1h AssumeRole session duration (clamped to AWS's 900s floor) so a soak test can exercise real in-statement expiry. credentialRefreshLookahead (duration/2) and stsCacheSafetyMargin (lookahead+5m, the historical 35m at default) now derive from it; the margin>lookahead invariant holds by construction and is unit-tested. - buildConfigSecret always emits SESSION_TOKEN (empty pins "no token"): httpfs copies a host AWS_SESSION_TOKEN env var into the global s3_session_token setting at load and secret lookup falls back to settings for omitted keys, signing with a mismatched (key, token) pair. - tests/e2e-mw-dev/README.md: documents why the freshness floor cannot be asserted in-Job and where its coverage lives. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(s3): prove mid-statement credential recovery with patched httpfs fork The PostHog httpfs fork (PostHog/duckdb-httpfs branch cred-refresh-read-path) adds what DuckDB v1.5.3 lacks: on an auth failure (400/403) in any S3 request path it re-resolves the LATEST COMMITTED secret — bypassing the statement's MVCC snapshot, so the credential-refresh scheduler's CREATE OR REPLACE SECRET pushes are picked up mid-statement — and retries once with the new credentials. That removes the statement-length ceiling the STS freshness floor could only push to ~35-60min. Restructure the credential-rotation pin test into a shared scenario (rotate to user B on a side connection, then disable user A while a glob scan is in flight) driving two tests: - TestInFlightScanDiesOnCredentialRotation (unchanged semantics): the STOCK extension still dies on the first post-revocation range GET — keeps pinning the upstream limitation so we notice if DuckDB gains recovery natively. - TestInFlightScanSurvivesRotationWithPatchedHTTPFS (new): with DUCKGRES_TEST_PATCHED_HTTPFS pointing at a locally-built patched extension, the SAME in-flight scan survives rotation + revocation with an exact row count (no loss or duplication on retry). Verified locally against MinIO: stock dies at T+4.7s with 403; patched completes 1,920,000 rows at T+25s with the starting credentials revoked at T+4.2s. Also update the stsCacheSafetyMargin doc: once a fork release with the patch is pinned via HTTPFS_EXTENSION_TAG in Dockerfile.worker, the freshness floor becomes defense-in-depth rather than the only guarantee for long statements. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(controlplane): clamp DUCKGRES_STS_SESSION_DURATION to the 1h role-chaining ceiling Review finding (codex): values above 1h passed straight through to AssumeRole. STS does not truncate an oversized DurationSeconds — it REJECTS the call (role chaining via EKS Pod Identity caps at 3600s, and every duckling-* role has MaxSessionDuration=3600), so an accidentally high env value (e.g. "2h") would break activation for every org. Clamp down to maxSTSSessionDuration=1h with a loud warning, mirroring the existing 900s floor clamp; TestResolveSTSSessionDuration covers both bounds. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 4573179 commit cedafa4

10 files changed

Lines changed: 701 additions & 30 deletions

controlplane/credential_refresh_scheduler.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,12 @@ import (
1010
"github.com/posthog/duckgres/controlplane/configstore"
1111
)
1212

13+
// credentialRefreshLookahead and the broker's cache safety margin now live in
14+
// sts_broker.go alongside stsSessionDuration: all three derive from the
15+
// env-overridable session duration (DUCKGRES_STS_SESSION_DURATION), and the
16+
// former compile-time guard (margin strictly greater than lookahead) holds by
17+
// construction — stsCacheSafetyMargin = credentialRefreshLookahead + 5m.
18+
1319
// credentialRefreshScheduleStore is the slice of the runtime store the
1420
// scheduler depends on. Defined narrowly so unit tests can fake it without
1521
// pulling in the full store.

controlplane/multitenant.go

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -360,12 +360,6 @@ func SetupMultiTenant(
360360
refreshActivator.resolveDucklingStatus = resolveDucklingStatus
361361
}
362362

363-
// Half the configured STS session duration: a worker due for refresh
364-
// gets picked up well before its current session token actually goes
365-
// stale, with a full half-life of slack to retry transient STS / RPC
366-
// failures on subsequent ticks.
367-
const credentialRefreshLookahead = stsSessionDuration / 2
368-
369363
// Per-CP scheduler — runs on every CP regardless of leader status, since
370364
// each CP refreshes only the workers it owns (filtered by cpInstanceID in
371365
// the SQL). Running this on the janitor leader only would leave workers

controlplane/shared_worker_activator.go

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -371,9 +371,11 @@ func (a *SharedWorkerActivator) BuildActivationRequest(ctx context.Context, org
371371
}
372372
}
373373
if a.resolveDucklingStatus == nil || err != nil {
374-
// Config-store warehouses use static creds (no STS), so expiresAt
375-
// stays nil and the credential refresh scheduler skips them.
376-
dl, err = a.buildDuckLakeConfigFromConfigStore(ctx, org.Warehouse)
374+
// Static-cred config-store warehouses (secret-ref S3 credentials)
375+
// return a nil expiresAt and the credential refresh scheduler skips
376+
// them; the "aws" provider path brokers STS creds and returns their
377+
// expiration so the scheduler keeps those workers fresh.
378+
dl, expiresAt, err = a.buildDuckLakeConfigFromConfigStore(ctx, org.Warehouse)
377379
}
378380
if err != nil {
379381
return TenantActivationPayload{}, err
@@ -505,10 +507,10 @@ func ducklingMetadataStoreAddress(status *provisioner.DucklingStatus, orgID stri
505507

506508
// buildDuckLakeConfigFromConfigStore reads infrastructure details from the config store
507509
// and K8s Secrets. Used for non-Crossplane warehouses (manual seed, MinIO, etc.).
508-
func (a *SharedWorkerActivator) buildDuckLakeConfigFromConfigStore(ctx context.Context, warehouse *configstore.ManagedWarehouseConfig) (server.DuckLakeConfig, error) {
510+
func (a *SharedWorkerActivator) buildDuckLakeConfigFromConfigStore(ctx context.Context, warehouse *configstore.ManagedWarehouseConfig) (server.DuckLakeConfig, *time.Time, error) {
509511
metadataPassword, err := a.readSecretValue(ctx, warehouse.MetadataStoreCredentials)
510512
if err != nil {
511-
return server.DuckLakeConfig{}, fmt.Errorf("metadata store credentials: %w", err)
513+
return server.DuckLakeConfig{}, nil, fmt.Errorf("metadata store credentials: %w", err)
512514
}
513515

514516
dl := server.DuckLakeConfig{
@@ -532,7 +534,7 @@ func (a *SharedWorkerActivator) buildDuckLakeConfigFromConfigStore(ctx context.C
532534
case warehouse.S3Credentials.Name != "":
533535
accessKey, secretKey, sessionToken, err := a.readS3Credentials(ctx, warehouse.S3Credentials)
534536
if err != nil {
535-
return server.DuckLakeConfig{}, fmt.Errorf("s3 credentials: %w", err)
537+
return server.DuckLakeConfig{}, nil, fmt.Errorf("s3 credentials: %w", err)
536538
}
537539
dl.S3Provider = "config"
538540
dl.S3AccessKey = accessKey
@@ -548,22 +550,29 @@ func (a *SharedWorkerActivator) buildDuckLakeConfigFromConfigStore(ctx context.C
548550
case strings.EqualFold(warehouse.S3.Provider, "aws"):
549551
roleARN := warehouse.WorkerIdentity.IAMRoleARN
550552
if roleARN == "" {
551-
return server.DuckLakeConfig{}, fmt.Errorf("managed warehouse %q requires worker_identity.iam_role_arn for worker activation", warehouse.OrgID)
553+
return server.DuckLakeConfig{}, nil, fmt.Errorf("managed warehouse %q requires worker_identity.iam_role_arn for worker activation", warehouse.OrgID)
552554
}
553555
if a.stsBroker == nil {
554-
return server.DuckLakeConfig{}, fmt.Errorf("STS broker is required for worker activation for org %q", warehouse.OrgID)
556+
return server.DuckLakeConfig{}, nil, fmt.Errorf("STS broker is required for worker activation for org %q", warehouse.OrgID)
555557
}
556558
creds, err := a.stsBroker.AssumeRole(ctx, roleARN)
557559
if err != nil {
558-
return server.DuckLakeConfig{}, fmt.Errorf("STS AssumeRole for org %q: %w", warehouse.OrgID, err)
560+
return server.DuckLakeConfig{}, nil, fmt.Errorf("STS AssumeRole for org %q: %w", warehouse.OrgID, err)
559561
}
560562
dl.S3Provider = "config"
561563
dl.S3AccessKey = creds.AccessKeyID
562564
dl.S3SecretKey = creds.SecretAccessKey
563565
dl.S3SessionToken = creds.SessionToken
566+
// STS-vended creds expire: surface the expiration so the
567+
// credential-refresh scheduler keeps this worker's secret fresh.
568+
// Without it the worker record's expiry stays NULL, the scheduler
569+
// never lists the worker as due, and any session outliving the
570+
// 1h token dies with ExpiredToken.
571+
expiresAt := creds.Expiration
572+
return dl, &expiresAt, nil
564573
}
565574

566-
return dl, nil
575+
return dl, nil, nil
567576
}
568577

569578
func BuildTenantActivationPayload(ctx context.Context, clientset kubernetes.Interface, defaultNamespace string, org *configstore.OrgConfig, stsBroker *STSBroker) (TenantActivationPayload, error) {
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
//go:build kubernetes
2+
3+
package controlplane
4+
5+
import (
6+
"context"
7+
"testing"
8+
"time"
9+
10+
"github.com/posthog/duckgres/controlplane/configstore"
11+
corev1 "k8s.io/api/core/v1"
12+
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
13+
"k8s.io/client-go/kubernetes/fake"
14+
)
15+
16+
// testWarehouseConfig returns a minimal config-store warehouse whose metadata
17+
// credentials resolve against the fake clientset's "metadata-creds" secret.
18+
func testWarehouseConfig() *configstore.ManagedWarehouseConfig {
19+
return &configstore.ManagedWarehouseConfig{
20+
OrgID: "org-test",
21+
MetadataStore: configstore.ManagedWarehouseMetadataStore{
22+
Endpoint: "pg.test.local",
23+
Port: 5432,
24+
DatabaseName: "ducklake",
25+
Username: "duck",
26+
},
27+
MetadataStoreCredentials: configstore.SecretRef{
28+
Namespace: "duckgres",
29+
Name: "metadata-creds",
30+
Key: "password",
31+
},
32+
S3: configstore.ManagedWarehouseS3{
33+
Region: "us-east-1",
34+
Bucket: "test-bucket",
35+
},
36+
}
37+
}
38+
39+
func testMetadataSecret() *corev1.Secret {
40+
return &corev1.Secret{
41+
ObjectMeta: metav1.ObjectMeta{Name: "metadata-creds", Namespace: "duckgres"},
42+
Data: map[string][]byte{"password": []byte("hunter2")},
43+
}
44+
}
45+
46+
// The "aws" provider path brokers STS credentials, and those expire: the
47+
// returned expiration is what the credential-refresh scheduler keys on to
48+
// keep the worker's DuckDB secret fresh. A nil expiry here means the
49+
// scheduler never refreshes the worker and any session outliving the STS
50+
// token dies with ExpiredToken mid-query.
51+
func TestBuildDuckLakeConfigFromConfigStoreAWSProviderReturnsExpiry(t *testing.T) {
52+
expiration := time.Date(2026, 6, 11, 13, 0, 0, 0, time.UTC)
53+
broker := newSTSBroker(func(ctx context.Context, roleARN string) (*AssumedCredentials, error) {
54+
return &AssumedCredentials{
55+
AccessKeyID: "ASIATEST",
56+
SecretAccessKey: "secret",
57+
SessionToken: "token",
58+
Expiration: expiration,
59+
}, nil
60+
})
61+
a := &SharedWorkerActivator{
62+
clientset: fake.NewSimpleClientset(testMetadataSecret()),
63+
defaultNamespace: "duckgres",
64+
stsBroker: broker,
65+
}
66+
warehouse := testWarehouseConfig()
67+
warehouse.S3.Provider = "aws"
68+
warehouse.WorkerIdentity = configstore.ManagedWarehouseWorkerIdentity{
69+
IAMRoleARN: "arn:aws:iam::123:role/duckling-test",
70+
}
71+
72+
dl, expiresAt, err := a.buildDuckLakeConfigFromConfigStore(context.Background(), warehouse)
73+
if err != nil {
74+
t.Fatalf("buildDuckLakeConfigFromConfigStore: %v", err)
75+
}
76+
if dl.S3AccessKey != "ASIATEST" || dl.S3SessionToken != "token" {
77+
t.Errorf("expected STS-brokered creds on config, got access_key=%q session_token=%q", dl.S3AccessKey, dl.S3SessionToken)
78+
}
79+
if expiresAt == nil {
80+
t.Fatal("expected non-nil credential expiry for STS-brokered aws provider; nil disables the credential-refresh scheduler for this worker")
81+
}
82+
if !expiresAt.Equal(expiration) {
83+
t.Errorf("expiresAt = %v, want %v", *expiresAt, expiration)
84+
}
85+
}
86+
87+
// Static secret-ref credentials have no known expiration — the scheduler
88+
// must skip them, so the path must return a nil expiry.
89+
func TestBuildDuckLakeConfigFromConfigStoreStaticCredsNilExpiry(t *testing.T) {
90+
s3Secret := &corev1.Secret{
91+
ObjectMeta: metav1.ObjectMeta{Name: "s3-creds", Namespace: "duckgres"},
92+
Data: map[string][]byte{
93+
"credentials": []byte(`{"access_key_id":"AKIATEST","secret_access_key":"secret"}`),
94+
},
95+
}
96+
a := &SharedWorkerActivator{
97+
clientset: fake.NewSimpleClientset(testMetadataSecret(), s3Secret),
98+
defaultNamespace: "duckgres",
99+
}
100+
warehouse := testWarehouseConfig()
101+
warehouse.S3Credentials = configstore.SecretRef{
102+
Namespace: "duckgres",
103+
Name: "s3-creds",
104+
Key: "credentials",
105+
}
106+
107+
dl, expiresAt, err := a.buildDuckLakeConfigFromConfigStore(context.Background(), warehouse)
108+
if err != nil {
109+
t.Fatalf("buildDuckLakeConfigFromConfigStore: %v", err)
110+
}
111+
if dl.S3AccessKey != "AKIATEST" {
112+
t.Errorf("expected static creds, got access_key=%q", dl.S3AccessKey)
113+
}
114+
if expiresAt != nil {
115+
t.Errorf("static creds must return nil expiry (scheduler skip), got %v", *expiresAt)
116+
}
117+
}
118+
119+
// Pin the capture-point freshness floor: any credentials the broker hands
120+
// out (cached or fresh) must retain more validity than the refresh
121+
// scheduler's lookahead. If served creds could be closer to expiry than the
122+
// lookahead, a refresh push would stamp an expiry already inside the "due"
123+
// window and the scheduler would re-push identical creds every tick; worse,
124+
// a statement starting on those creds would have less runway than the
125+
// scheduler is designed to guarantee.
126+
func TestSTSBrokerServedCredsAlwaysOutliveSchedulerLookahead(t *testing.T) {
127+
clock := &testClock{now: time.Date(2026, 6, 11, 12, 0, 0, 0, time.UTC)}
128+
b := newTestBroker(clock, func(ctx context.Context, roleARN string) (*AssumedCredentials, error) {
129+
return &AssumedCredentials{
130+
AccessKeyID: "AKID",
131+
Expiration: clock.Now().Add(stsSessionDuration),
132+
}, nil
133+
})
134+
135+
// Sweep a full session duration in 1-minute steps; at every point the
136+
// served creds must outlive the scheduler lookahead.
137+
for i := 0; i <= int(stsSessionDuration/time.Minute); i++ {
138+
creds, err := b.AssumeRole(context.Background(), "arn:aws:iam::123:role/org-a")
139+
if err != nil {
140+
t.Fatalf("AssumeRole at +%dm: %v", i, err)
141+
}
142+
if remaining := creds.Expiration.Sub(clock.Now()); remaining <= credentialRefreshLookahead {
143+
t.Fatalf("at +%dm served creds have %v remaining, must exceed scheduler lookahead %v",
144+
i, remaining, credentialRefreshLookahead)
145+
}
146+
clock.Advance(time.Minute)
147+
}
148+
}

controlplane/sts_broker.go

Lines changed: 109 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ package controlplane
55
import (
66
"context"
77
"fmt"
8+
"log/slog"
9+
"os"
10+
"strings"
811
"sync"
912
"time"
1013

@@ -15,21 +18,120 @@ import (
1518
)
1619

1720
const (
18-
stsSessionDuration = 1 * time.Hour
19-
stsSessionName = "duckgres-cp"
21+
stsSessionName = "duckgres-cp"
2022

21-
// stsCacheSafetyMargin is how long before the credentials' true expiration
22-
// we stop serving them from cache and mint a fresh session instead. It must
23-
// leave enough room for a worker activation that receives cached creds to
24-
// complete its DuckLake/Iceberg attach before the creds lapse.
25-
stsCacheSafetyMargin = 10 * time.Minute
23+
// defaultSTSSessionDuration is the AssumeRole DurationSeconds used unless
24+
// overridden by the env-only DUCKGRES_STS_SESSION_DURATION knob.
25+
defaultSTSSessionDuration = 1 * time.Hour
26+
27+
// minSTSSessionDuration is AWS's hard AssumeRole minimum (900s). The env
28+
// knob is clamped up to this so a typo can't make AssumeRole reject every
29+
// activation.
30+
minSTSSessionDuration = 15 * time.Minute
31+
32+
// maxSTSSessionDuration is the effective AssumeRole ceiling in our
33+
// deployment: the CP's own credentials come from EKS Pod Identity (a role
34+
// session), so the per-org AssumeRole is role chaining, which STS
35+
// hard-caps at 1h — and every duckling-* role's MaxSessionDuration is
36+
// 3600s anyway. A DurationSeconds above that is not silently truncated:
37+
// STS REJECTS the AssumeRole call, which would break activation for every
38+
// org. The env knob is clamped down to this for the same reason it is
39+
// clamped up to the minimum. Raising it for real requires de-chaining the
40+
// assume AND raising MaxSessionDuration on every duckling role (12h
41+
// absolute AWS max).
42+
maxSTSSessionDuration = 1 * time.Hour
2643

2744
// stsAssumeRoleTimeout bounds the underlying AWS AssumeRole call. The call
2845
// is detached from the triggering caller's context (other callers may be
2946
// waiting on its result via singleflight), so it needs its own deadline.
3047
stsAssumeRoleTimeout = 1 * time.Minute
3148
)
3249

50+
// stsSessionDuration is the STS AssumeRole session duration. Env-overridable
51+
// (DUCKGRES_STS_SESSION_DURATION, e.g. "900s" / "15m" / "1h", clamped to
52+
// [minSTSSessionDuration, maxSTSSessionDuration]) so a soak test can shorten
53+
// tokens enough to exercise real in-statement expiry — with the production 1h
54+
// tokens, proving "a statement outlives its STS token" needs a >25min query
55+
// even with the refresh scheduler running. Only shortening is supported:
56+
// values above 1h would make AssumeRole itself fail (see
57+
// maxSTSSessionDuration).
58+
var stsSessionDuration = resolveSTSSessionDuration(os.Getenv("DUCKGRES_STS_SESSION_DURATION"))
59+
60+
// credentialRefreshLookahead is how far ahead of a worker's recorded
61+
// credential expiry the scheduler refreshes it. Half the STS session
62+
// duration: a worker due for refresh gets picked up well before its current
63+
// session token actually goes stale, with a full half-life of slack to retry
64+
// transient STS / RPC failures on subsequent ticks.
65+
var credentialRefreshLookahead = stsSessionDuration / 2
66+
67+
// stsCacheSafetyMargin is how long before the credentials' true expiration
68+
// we stop serving them from cache and mint a fresh session instead.
69+
//
70+
// This is the freshness floor for every credential capture point: all
71+
// AssumeRole callers bake the result into a worker's DuckDB secret, and a
72+
// DuckDB statement captures those credentials when it starts executing — a
73+
// later CREATE OR REPLACE SECRET (the credential-refresh scheduler's push)
74+
// does not re-credential an already-running statement (DuckDB resolves
75+
// secrets through the statement's MVCC snapshot). Stock httpfs has NO
76+
// mid-statement recovery path for scan workloads: its refresh-on-403 hook
77+
// only runs at file open AND only when the open performs a network request,
78+
// but DuckLake, Iceberg, and S3-glob scans all pre-populate file
79+
// size/etag/last-modified so opens skip the HEAD entirely and the first auth
80+
// failure surfaces on a range GET, which is not retried (verified against
81+
// httpfs v1.5.3 / ducklake / duckdb-iceberg sources; pinned by
82+
// TestInFlightScanDiesOnCredentialRotation in tests/integration). A
83+
// statement on stock httpfs therefore lives or dies on its starting runway:
84+
// this margin guarantees every statement at least lookahead+5m of token
85+
// validity at start; statements longer than that can still die with
86+
// ExpiredToken.
87+
//
88+
// The PostHog httpfs fork (PostHog/duckdb-httpfs, branch
89+
// cred-refresh-read-path) FIXES this: on an auth failure in any S3 request
90+
// path it re-resolves the latest committed secret — picking up this
91+
// scheduler's rotation pushes — and retries (proven by
92+
// TestInFlightScanSurvivesRotationWithPatchedHTTPFS in tests/integration).
93+
// Once a fork release with that patch is pinned via HTTPFS_EXTENSION_TAG in
94+
// Dockerfile.worker, this margin becomes defense-in-depth (a statement's
95+
// first credentials still want a healthy runway so recovery stays rare)
96+
// rather than the only thing standing between a long statement and
97+
// ExpiredToken.
98+
//
99+
// Defined as lookahead + 5m so it stays strictly greater than
100+
// credentialRefreshLookahead by construction (at the default 1h session:
101+
// 30m + 5m = 35m, the historical floor). If the margin were <= the
102+
// lookahead, a refresh push could stamp an expiry already inside the
103+
// scheduler's lookahead window, leaving the worker perpetually "due" and
104+
// re-pushing identical cached creds every tick until the margin finally
105+
// forces a fresh mint.
106+
var stsCacheSafetyMargin = credentialRefreshLookahead + 5*time.Minute
107+
108+
// resolveSTSSessionDuration parses the env override, falling back to the
109+
// default on empty/garbage and clamping to AWS's AssumeRole minimum.
110+
func resolveSTSSessionDuration(raw string) time.Duration {
111+
raw = strings.TrimSpace(raw)
112+
if raw == "" {
113+
return defaultSTSSessionDuration
114+
}
115+
d, err := time.ParseDuration(raw)
116+
if err != nil {
117+
slog.Warn("Invalid DUCKGRES_STS_SESSION_DURATION; using default.",
118+
"value", raw, "default", defaultSTSSessionDuration, "error", err)
119+
return defaultSTSSessionDuration
120+
}
121+
if d < minSTSSessionDuration {
122+
slog.Warn("DUCKGRES_STS_SESSION_DURATION below AWS AssumeRole minimum; clamping.",
123+
"value", d, "minimum", minSTSSessionDuration)
124+
return minSTSSessionDuration
125+
}
126+
if d > maxSTSSessionDuration {
127+
slog.Warn("DUCKGRES_STS_SESSION_DURATION above the role-chaining AssumeRole ceiling; clamping. "+
128+
"A larger DurationSeconds would make STS reject every per-org AssumeRole and break activation.",
129+
"value", d, "maximum", maxSTSSessionDuration)
130+
return maxSTSSessionDuration
131+
}
132+
return d
133+
}
134+
33135
// assumeRoleFunc mints fresh credentials for a role ARN. It is a field on
34136
// STSBroker so tests can substitute a fake without an AWS client.
35137
type assumeRoleFunc func(ctx context.Context, roleARN string) (*AssumedCredentials, error)

0 commit comments

Comments
 (0)