You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(worker-pool): one query session per worker in k8s/remote mode (#691)
* feat(worker-pool): one query session per worker in k8s/remote mode
Guarantee deterministic per-query resources in the control-plane remote (k8s)
backend: a worker pod serves exactly one client query session, so each query
owns the pod's full resources (workerDuckDBLimits already hands one session ~75%
of pod RAM + all cores) and a heavy query can never be starved or OOM'd by a
co-resident one. Previously this held only emergently (OrgReservedPool never
co-assigned), with nothing enforcing it and the resource math already assuming
it — a single co-assignment would have caused ~150% memory overcommit.
Scope: remote/k8s (OrgReservedPool) only. Standalone and process backends
unchanged.
- Enforce the invariant: spawn remote worker pods with
DUCKGRES_DUCKDB_MAX_SESSIONS=1 (k8s_pool.go). A 2nd concurrent CreateSession is
rejected, not silently overcommitted. Internal control/maintenance work runs on
the worker's controlDB/warmupDB side connections (not counted sessions), so
cap=1 doesn't starve it. Remove the dead leastLoadedAssignedWorkerLocked so no
one re-wires session sharing into the org path.
- At org max workers + all busy: fail fast with the clear org-cap message
("your organization has reached its maximum number of concurrent Duckgres
workers...") instead of busy-waiting until the client deadline. Works without a
runtime store via an in-process cap check (atOrgWorkerCap).
- Under cap + all busy: hold up to warmAcquireTimeout for a worker to spawn
(bounded by ctx) — now for default/exclusive requests too, not only colocated.
- Anti-snatch: serialize the slow acquire path per org with a cancel-safe FIFO
turnstile (orgAcquireGate) so a worker the CP scaled up for an earlier waiter
cannot be snatched by a later connection. The fast idle-reuse path stays
ungated (only reuses already-org-owned Hot workers; neutral warm workers are
claimed only through the gate).
- Destroy-before-reuse ordering already holds (session_mgr awaits the worker
DestroySession RPC before ReleaseWorker); documented as load-bearing for cap=1.
Tests:
- duckdbservice: CreateSession rejects the 2nd session at MaxSessions=1.
- controlplane: org-cap fast-fail; FIFO gate ordering + cancelled-waiter skip
(race-clean); pod spec carries DUCKGRES_DUCKDB_MAX_SESSIONS=1.
- e2e harness: one_session_per_worker — two concurrent queries for one org land
on two distinct worker pods with correct results (regression net for sharing).
Docs: CLAUDE.md gains a "Worker Session Model" load-bearing contract section (and
a drain-protocol summary) so future changes don't reintroduce session sharing or
break the cap/hold/anti-snatch/destroy-before-reuse guarantees.
* feat(worker-pool): recover from worker session-cap drift instead of failing the query
If a worker rejects a control-plane-scheduled CreateSession because it already
holds its max session (the one-session-per-worker invariant momentarily drifted
between the CP's view and the worker's actual session count), don't punish the
client for our broken logic. CreateSessionWithProtocol now:
- detects the worker's "max sessions reached" rejection (isWorkerSessionCapError),
- logs loudly at ERROR and bumps a new metric
(duckgres_control_plane_worker_session_cap_drift_total) so the drift is visible
and we can fix the root cause,
- retires/recycles the inconsistent worker (graceful via the drain protocol), and
- re-acquires a fresh worker and retries, bounded by maxWorkerSessionCapDriftRetries.
So a transient drift self-heals (the query lands on a healthy worker) while a
persistent drift surfaces a clear error after the bounded retries rather than
spinning. Recovery + loud logging: fix our bug without dropping the user's query.
Unit test covers the wrapped-error classifier; CLAUDE.md documents the behavior
in the worker-session-model contract.
Help: "Times a worker rejected a CP-scheduled CreateSession at its session cap (CP↔worker accounting drift; recovered by recycling the worker and retrying).",
0 commit comments