test(integration): bump worker_initialization_timeout to 120s in CI

nv-alicheng · claude · nv-alicheng · commit a95ec2282dae · 2026-05-12T14:53:37.000-07:00
`TestTemplateIntegration::test_template_runs[concurrency_template.yaml]`
consistently hits the 60s `worker_initialization_timeout` in CI on
cold-start. `concurrency_template.yaml` is alphabetically first in the
parametrized lane, so it pays the full first-time-this-CI-job cost:

- Python `multiprocessing` `spawn`-mode re-import of the entire
  `inference_endpoint` package per worker subprocess (transformers,
  msgspec, pyzmq, etc.)
- First-time ZMQ IPC bind + connect handshake for the worker pool
- Concurrent aggregator subprocess cold-start contending for the
  same small-CI-runner CPU

Subsequent templates in the same lane benefit from warm module
caches and don't approach the limit. Local Docker runs finish all 6
templates in ~40 s total (~6.5 s/template), but CI runners with less
headroom (and `spawn` vs `fork`) consistently push the first test
past 60 s.

Bump to 120 s in this test only — `_resolve_template` injects
`settings.client.worker_initialization_timeout: 120.0` into each
template before running. Production default (60 s) is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/tests/integration/commands/test_benchmark_command.py b/tests/integration/commands/test_benchmark_command.py
@@ -223,6 +223,17 @@ def _resolve_template(template_path: Path, server_url: str) -> dict:
     data["settings"].setdefault("runtime", {})
     data["settings"]["runtime"]["n_samples_to_issue"] = 10
 
+    # Bump the worker-init timeout for CI. The production default (60 s) is
+    # tight on small CI runners where Python's `spawn`-mode multiprocessing
+    # pays a full re-import cost per worker on top of ZMQ IPC setup; cold-
+    # start of the *first* parametrized template (alphabetical, so
+    # `concurrency_template.yaml`) consistently exceeds the budget in CI.
+    # The other 5 templates benefit from warm module / IPC caches and don't
+    # need the headroom. 120 s is a generous safety margin that does not
+    # change the production default, only this integration test.
+    data["settings"].setdefault("client", {})
+    data["settings"]["client"]["worker_initialization_timeout"] = 120.0
+
     # Accuracy datasets can't run e2e against echo server (no scorer), so keep only performance datasets.
     data["datasets"] = [
         ds for ds in data.get("datasets", []) if ds.get("type") != "accuracy"