test(e2e): land per-PR worker pods on the default nodepool (#746)

benben · web-flow · commit 314f8f22cb22 · 2026-06-10T11:26:10.000+02:00
* test(e2e): land per-PR worker pods on the default nodepool

The per-PR harness Job, control plane, and config store already run on the
default (untainted) nodepool; only the CP-spawned worker pods were pinned to
the real dev deployment's duckgres-workers pool. That made every PR's worker
churn (cold bursts, sized spawns, TTL reaps) fight the dev deployment's
headroom placeholders and Karpenter consolidation on the production-shaped
pool. Point the e2e CP's worker nodeSelector at the default pool instead and
drop the now-unneeded taint toleration.

The worker image is arm64-only and the default pool is mixed-arch, so the
selector pins kubernetes.io/arch=arm64 rather than going selector-less.

* test(e2e): align CP env knobs with prod-us logic

Same knobs as the prod-us chart render (values differ, the logic matches):

- add DUCKGRES_WORKER_QUEUE_TIMEOUT=5m (prod 5m; the binary default 60s is
  too tight for an on-demand cold spawn that needs a fresh node — previously
  observed flaking the sized-worker assertions)
- add DUCKGRES_K8S_MAX_WORKERS=50 (prod sets an explicit cap; unset, the
  binary derives one from the CP pod's 1Gi memory)
- drop DUCKGRES_K8S_WORKER_MAX_TTL (prod does not set a TTL clamp)

Deliberate divergences documented in-file: WORKER_PRIORITY_CLASS (prod's
1000000 class would let per-PR worker bursts preempt unrelated default-pool
workloads), CACHE_ENABLED (cache-proxy DaemonSet only runs on duckgres-workers
nodes), wildcard TLS + TRINO_* (need cert-manager / a Trino cluster in the
per-PR namespace).

* test(e2e): correct maxWorkers comment (k8s unset = unbounded, not CP-memory-derived)
diff --git a/tests/e2e-mw-dev/harness.sh b/tests/e2e-mw-dev/harness.sh
@@ -736,7 +736,7 @@ graceful_drain() { # org password
 # whole pod's resources and a heavy query can't be starved by a co-resident one.
 # Regression net: if a worker were ever shared (pre-change least-loaded sharing),
 # the org would peak at a single active-org-labeled pod for both queries.
-# Assumes the worker nodepool is already warm (prior resilience steps spawned
+# Assumes the default nodepool already has warm capacity (prior resilience steps spawned
 # pods), so the second pod schedules within the queries' runtime.
 one_session_per_worker() { # org password
   log "one session per worker: concurrent queries land on distinct pods on $1"
diff --git a/tests/e2e-mw-dev/manifests.tmpl.yaml b/tests/e2e-mw-dev/manifests.tmpl.yaml
@@ -249,14 +249,35 @@ spec:
             - { name: DUCKGRES_K8S_WORKER_PROFILE_MAX_CPU, value: "8" }
             - { name: DUCKGRES_K8S_WORKER_PROFILE_MIN_MEMORY, value: "2Gi" }
             - { name: DUCKGRES_K8S_WORKER_PROFILE_MAX_MEMORY, value: "16Gi" }
-            - { name: DUCKGRES_K8S_WORKER_MAX_TTL, value: "24h" }
+            # Prod-logic parity (sizes differ, the knobs match prod-us):
+            # - workerQueueTimeout bounds how long a connection waits for its
+            #   on-demand cold spawn (binary default 60s is too tight for a
+            #   fresh Karpenter node — observed flaking the sized assertions).
+            # - an explicit worker cap, like prod's maxWorkers (unset = unbounded
+            #   for the k8s backend; the nodepool is the only ceiling).
+            # Deliberately NOT aligned with prod:
+            # - DUCKGRES_K8S_WORKER_PRIORITY_CLASS: e2e workers run on the
+            #   shared default nodepool — prod's 1000000 PriorityClass would let
+            #   per-PR worker bursts preempt unrelated workloads there.
+            # - DUCKGRES_CACHE_ENABLED: the cache-proxy DaemonSet only runs on
+            #   duckgres-workers nodes; default-pool workers would dial a
+            #   nonexistent hostPort proxy.
+            # - DUCKGRES_CERT/_KEY (wildcard TLS) and DUCKGRES_TRINO_*: need
+            #   cert-manager / a Trino cluster in the per-PR namespace; the CP
+            #   serves its self-signed cert and Trino is not under e2e.
+            - { name: DUCKGRES_WORKER_QUEUE_TIMEOUT, value: "5m" }
+            - { name: DUCKGRES_K8S_MAX_WORKERS, value: "50" }
             - name: DUCKGRES_INTERNAL_SECRET
               valueFrom: { secretKeyRef: { name: duckgres-tokens, key: internal-secret } }
             - { name: DUCKGRES_AWS_REGION, value: "us-east-1" }
-            # Land workers on the dedicated worker nodepool (mw-dev arm64).
-            - { name: DUCKGRES_K8S_WORKER_NODE_SELECTOR, value: '{"posthog.com/nodepool":"duckgres-workers"}' }
-            - { name: DUCKGRES_K8S_WORKER_TOLERATION_KEY, value: "posthog.com/nodepool" }
-            - { name: DUCKGRES_K8S_WORKER_TOLERATION_VALUE, value: "duckgres-workers" }
+            # Land e2e workers on the DEFAULT (untainted) nodepool, NOT the real
+            # deployment's duckgres-workers pool: per-PR worker churn must not
+            # fight the dev deployment's headroom placeholders / Karpenter
+            # consolidation on the production-shaped pool, and the harness Job /
+            # per-PR CP / config-store already run on the default pool. The
+            # worker image is arm64-only and the default pool is mixed-arch, so
+            # pin the arch (no toleration needed — the default pool is untainted).
+            - { name: DUCKGRES_K8S_WORKER_NODE_SELECTOR, value: '{"kubernetes.io/arch":"arm64"}' }
             # Provision real per-org Lakekeeper instances (iceberg path).
             - { name: DUCKGRES_LAKEKEEPER_PROVISIONER_ENABLED, value: "true" }
             # Org identity is resolved from the TLS SNI hostname