Skip to content

Commit 2913007

Browse files
committed
fix(basilica): raise startup_timeout floor when ML preload is on (closes #243)
When the resolved proxy env carries both LLMTRACE_ML_ENABLED and LLMTRACE_ML_PRELOAD truthy, the lifecycle library now floors ComponentSpec.startup_timeout_seconds at ML_PRELOAD_STARTUP_FLOOR_SECONDS (1500) before creating the deployment, emitting one WARN line per bump. Callers who set the field >= floor keep their value silently. Applied in provision, update(strategy="recreate"), and rotate_admin_key — every proxy-creating code path. Live e2e on cpu=2 confirmed the original 600s default could not fit preload (proxy never reached ready in 600s with ML on; reached ready in 41s with ML off). 1500s covers the observed band with margin. Both example configs now set the field explicitly to 1500 to silence the WARN and document the cost. README operational notes spell out the auto-bump, how to opt out via LLMTRACE_ML_PRELOAD=0 (lazy load), and how to silence the warning.
1 parent ff0e97f commit 2913007

5 files changed

Lines changed: 280 additions & 7 deletions

File tree

deployments/basilica/README.md

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -973,11 +973,32 @@ client doesn't accept that media type.
973973
### ML preload behaviour
974974

975975
The proxy image has `LLMTRACE_ML_ENABLED=1` and `LLMTRACE_ML_PRELOAD=1` by
976-
default in `starter.yaml`. Cold-boot includes model warm-up; in live tests
977-
the proxy hit `phase=ready` in ~60s. If your tenant config sets a smaller
978-
`startup_timeout_seconds`, the deployment may legitimately time out — bump
979-
the timeout or set `LLMTRACE_ML_PRELOAD=0` to defer model load to
980-
first-request time.
976+
default in `starter.yaml` and `pro.yaml`. Preload loads the
977+
`prompt_injection`, `ner`, `injecguard`, and `piguard` weights before
978+
`/health` flips to ready.
979+
980+
Cost on small CPU shapes: the cold-boot wall clock is dominated by model
981+
load. On `cpu=2 / memory=4Gi` (starter) this consistently lands in the
982+
600–1500s range; the same image without preload reaches ready in ~40s.
983+
Bigger shapes are proportionally faster but the same band of preload cost
984+
applies on first boot.
985+
986+
**Auto-bump** (`deployments/basilica/lifecycle.py::_apply_ml_preload_startup_floor`):
987+
when the resolved proxy env carries both `LLMTRACE_ML_ENABLED ∈ {1,true,yes}`
988+
and `LLMTRACE_ML_PRELOAD ∈ {1,true,yes}` AND the caller's
989+
`startup_timeout_seconds` is below `ML_PRELOAD_STARTUP_FLOOR_SECONDS`
990+
(currently 1500), the lifecycle library raises the timeout to the floor
991+
and emits a single WARN. This applies to `provision`,
992+
`update(strategy="recreate")`, and `rotate_admin_key`. Callers who set
993+
`startup_timeout_seconds >= 1500` keep their value and see no warning.
994+
995+
**Opt out of preload entirely**: set `LLMTRACE_ML_PRELOAD: "0"` in
996+
`proxy.env` for lazy load. The first request pays the model-load
997+
latency; every subsequent request is fast. Useful when readiness latency
998+
matters more than first-request latency.
999+
1000+
**Silence the WARN**: set `startup_timeout_seconds: 1500` (or higher)
1001+
explicitly in your config. Both example configs already do this.
9811002

9821003
### Cleanup of orphans
9831004

deployments/basilica/configs/examples/pro.yaml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,13 @@ proxy:
1616
memory: 8Gi
1717
replicas: 2
1818
health_check_path: /health
19-
startup_timeout_seconds: 600
19+
# ML preload (LLMTRACE_ML_ENABLED=1 + LLMTRACE_ML_PRELOAD=1) loads the
20+
# prompt_injection / ner / injecguard / piguard weights before /health flips
21+
# to ready. On a cpu=2 shape this regularly exceeds 600s; cpu=4 is faster
22+
# but still well over the library's old 600s default. The lifecycle library
23+
# auto-bumps to 1500s when both flags are on and this value is lower; we set
24+
# 1500 explicitly to silence the WARN and document the cost.
25+
startup_timeout_seconds: 1500
2026
env:
2127
LLMTRACE_UPSTREAM_URL: "${LLMTRACE_UPSTREAM_URL}"
2228
LLMTRACE_STORAGE_PROFILE: "${LLMTRACE_STORAGE_PROFILE:-sqlite}"

deployments/basilica/configs/examples/starter.yaml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,14 @@ proxy:
3232
memory: 4Gi
3333
replicas: 1
3434
health_check_path: /health
35-
startup_timeout_seconds: 600
35+
# ML preload (LLMTRACE_ML_ENABLED=1 + LLMTRACE_ML_PRELOAD=1) loads the
36+
# prompt_injection / ner / injecguard / piguard weights before /health flips
37+
# to ready. On this cpu=2 shape that takes 600-1500s in practice. The
38+
# lifecycle library auto-bumps to 1500s when both flags are on and this
39+
# value is lower; we set 1500 explicitly to silence the WARN and document
40+
# the cost. Set LLMTRACE_ML_PRELOAD=0 below for lazy load (first request
41+
# pays the model-load latency, subsequent requests are fast).
42+
startup_timeout_seconds: 1500
3643
env:
3744
LLMTRACE_UPSTREAM_URL: "${LLMTRACE_UPSTREAM_URL}"
3845
LLMTRACE_STORAGE_PROFILE: memory

deployments/basilica/lifecycle.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,19 @@
5454
API_KEY_PREFIX = "llmt_"
5555
API_KEY_RANDOM_BYTES = 32
5656

57+
# Minimum startup window the lifecycle library will apply when the resolved
58+
# proxy env requests ML preload (`LLMTRACE_ML_ENABLED=1`,
59+
# `LLMTRACE_ML_PRELOAD=1`). On a cpu=2 shape, loading the prompt_injection +
60+
# ner + injecguard + piguard weights regularly exceeds the
61+
# `ComponentSpec.startup_timeout_seconds` default of 600s, so callers who
62+
# leave it at the default would silently time out. 1500s covers the observed
63+
# worst case with margin. Callers who explicitly set a higher value win.
64+
ML_PRELOAD_STARTUP_FLOOR_SECONDS = 1500
65+
66+
# Env values that count as "enabled" for the ML preload flags. Matches the
67+
# truthiness rules in `crates/llmtrace-proxy/src/config.rs::env_flag`.
68+
_ML_FLAG_TRUTHY: frozenset[str] = frozenset({"1", "true", "yes"})
69+
5770
# Proxy admin API endpoints (see `crates/llmtrace-proxy/src/main.rs` routing).
5871
TENANTS_PATH = "/api/v1/tenants"
5972
AUTH_KEYS_PATH = "/api/v1/auth/keys"
@@ -426,6 +439,42 @@ def _apply_proxy_auth(
426439
)
427440

428441

442+
def _apply_ml_preload_startup_floor(
443+
spec: ComponentSpec,
444+
*,
445+
tenant_id: Optional[str] = None,
446+
floor: int = ML_PRELOAD_STARTUP_FLOOR_SECONDS,
447+
) -> ComponentSpec:
448+
"""Raise `startup_timeout_seconds` when the proxy env requests ML preload.
449+
450+
When both `LLMTRACE_ML_ENABLED` and `LLMTRACE_ML_PRELOAD` resolve to a
451+
truthy value in `spec.env`, the proxy will block readiness on model
452+
weight load before flipping `/health` to ready. On small CPU shapes that
453+
can take well over the library's 600s default, so we floor the timeout
454+
at `floor` (default `ML_PRELOAD_STARTUP_FLOOR_SECONDS`).
455+
456+
Callers who explicitly set `startup_timeout_seconds >= floor` keep their
457+
value and no warning is emitted. The bump is a pure function on the spec
458+
— no SDK calls — so it's safe to apply in any code path that creates the
459+
proxy deployment.
460+
"""
461+
ml_enabled = spec.env.get("LLMTRACE_ML_ENABLED", "").strip().lower() in _ML_FLAG_TRUTHY
462+
ml_preload = spec.env.get("LLMTRACE_ML_PRELOAD", "").strip().lower() in _ML_FLAG_TRUTHY
463+
if not (ml_enabled and ml_preload):
464+
return spec
465+
if spec.startup_timeout_seconds >= floor:
466+
return spec
467+
LOGGER.warning(
468+
"ML preload detected (LLMTRACE_ML_ENABLED=1, LLMTRACE_ML_PRELOAD=1) "
469+
"on tenant=%s; raising startup_timeout_seconds from %d to %d to "
470+
"accommodate model load. Set explicitly to silence.",
471+
tenant_id or "<unknown>",
472+
spec.startup_timeout_seconds,
473+
floor,
474+
)
475+
return dataclasses.replace(spec, startup_timeout_seconds=floor)
476+
477+
429478
def _apply_rate_limit(
430479
proxy_spec: ComponentSpec, rate_limit: RateLimitSpec
431480
) -> ComponentSpec:
@@ -667,6 +716,7 @@ def rotate_admin_key(
667716
new_env = {**proxy_spec.env, "LLMTRACE_AUTH_ADMIN_KEY": rotated_key}
668717
new_env.setdefault("LLMTRACE_AUTH_ENABLED", "true")
669718
rotated_spec = dataclasses.replace(proxy_spec, env=new_env)
719+
rotated_spec = _apply_ml_preload_startup_floor(rotated_spec, tenant_id=tenant_id)
670720

671721
proxy_name = proxy_name_template.format(tenant_id=tenant_id)
672722
LOGGER.info(
@@ -721,6 +771,7 @@ def provision(
721771
)
722772
if spec.rate_limit is not None:
723773
proxy_spec = _apply_rate_limit(proxy_spec, spec.rate_limit)
774+
proxy_spec = _apply_ml_preload_startup_floor(proxy_spec, tenant_id=tenant_id)
724775

725776
proxy = _create_component(client, proxy_name, proxy_spec)
726777

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
"""Unit tests for the ML-preload startup-timeout floor.
2+
3+
The bump is implemented as a pure function (`_apply_ml_preload_startup_floor`)
4+
on `ComponentSpec`, so the tests exercise it directly — no SDK mocking, no
5+
provision flow stubbing. This matches the production code path: every
6+
`_create_component` call site for the proxy runs the spec through the same
7+
helper before handing it to the Basilica SDK.
8+
"""
9+
10+
from __future__ import annotations
11+
12+
import logging
13+
import unittest
14+
from typing import Mapping
15+
16+
from deployments.basilica import lifecycle
17+
18+
19+
def _proxy_spec(
20+
*, env: Mapping[str, str], startup_timeout_seconds: int
21+
) -> lifecycle.ComponentSpec:
22+
return lifecycle.ComponentSpec(
23+
image="ghcr.io/techlab-innov/llmtrace-proxy:latest",
24+
port=8080,
25+
cpu="2",
26+
memory="4Gi",
27+
replicas=1,
28+
env=env,
29+
startup_timeout_seconds=startup_timeout_seconds,
30+
)
31+
32+
33+
class MLPreloadStartupFloorTests(unittest.TestCase):
34+
def test_bumps_when_below_floor_and_ml_preload_on(self) -> None:
35+
spec = _proxy_spec(
36+
env={"LLMTRACE_ML_ENABLED": "1", "LLMTRACE_ML_PRELOAD": "1"},
37+
startup_timeout_seconds=600,
38+
)
39+
with self.assertLogs(lifecycle.LOGGER, level=logging.WARNING) as captured:
40+
bumped = lifecycle._apply_ml_preload_startup_floor(
41+
spec, tenant_id="acme"
42+
)
43+
44+
self.assertEqual(
45+
bumped.startup_timeout_seconds,
46+
lifecycle.ML_PRELOAD_STARTUP_FLOOR_SECONDS,
47+
)
48+
# All other fields preserved.
49+
self.assertEqual(bumped.image, spec.image)
50+
self.assertEqual(bumped.env, spec.env)
51+
self.assertEqual(bumped.port, spec.port)
52+
# Exactly one warning, mentioning the tenant + the floor.
53+
self.assertEqual(len(captured.records), 1)
54+
message = captured.records[0].getMessage()
55+
self.assertIn("acme", message)
56+
self.assertIn("600", message)
57+
self.assertIn(str(lifecycle.ML_PRELOAD_STARTUP_FLOOR_SECONDS), message)
58+
59+
def test_no_bump_when_caller_value_meets_floor(self) -> None:
60+
explicit = lifecycle.ML_PRELOAD_STARTUP_FLOOR_SECONDS + 300
61+
spec = _proxy_spec(
62+
env={"LLMTRACE_ML_ENABLED": "true", "LLMTRACE_ML_PRELOAD": "yes"},
63+
startup_timeout_seconds=explicit,
64+
)
65+
# `assertNoLogs` is Python 3.10+. Capture and assert empty as a portable
66+
# alternative that still fails loudly if a warning leaks.
67+
with self.assertLogs(lifecycle.LOGGER, level=logging.DEBUG) as captured:
68+
lifecycle.LOGGER.debug("anchor") # ensure the context has >=1 record
69+
result = lifecycle._apply_ml_preload_startup_floor(
70+
spec, tenant_id="acme"
71+
)
72+
73+
self.assertIs(result, spec)
74+
self.assertEqual(result.startup_timeout_seconds, explicit)
75+
warnings = [r for r in captured.records if r.levelno >= logging.WARNING]
76+
self.assertEqual(warnings, [])
77+
78+
def test_no_bump_when_caller_value_equals_floor(self) -> None:
79+
spec = _proxy_spec(
80+
env={"LLMTRACE_ML_ENABLED": "1", "LLMTRACE_ML_PRELOAD": "1"},
81+
startup_timeout_seconds=lifecycle.ML_PRELOAD_STARTUP_FLOOR_SECONDS,
82+
)
83+
with self.assertLogs(lifecycle.LOGGER, level=logging.DEBUG) as captured:
84+
lifecycle.LOGGER.debug("anchor")
85+
result = lifecycle._apply_ml_preload_startup_floor(
86+
spec, tenant_id="acme"
87+
)
88+
89+
self.assertIs(result, spec)
90+
warnings = [r for r in captured.records if r.levelno >= logging.WARNING]
91+
self.assertEqual(warnings, [])
92+
93+
def test_no_bump_when_ml_preload_off(self) -> None:
94+
spec = _proxy_spec(
95+
env={"LLMTRACE_ML_ENABLED": "1", "LLMTRACE_ML_PRELOAD": "0"},
96+
startup_timeout_seconds=300,
97+
)
98+
with self.assertLogs(lifecycle.LOGGER, level=logging.DEBUG) as captured:
99+
lifecycle.LOGGER.debug("anchor")
100+
result = lifecycle._apply_ml_preload_startup_floor(
101+
spec, tenant_id="acme"
102+
)
103+
104+
# Spec untouched, no warning — even though startup_timeout is well
105+
# below the floor (caller's choice when preload is off).
106+
self.assertIs(result, spec)
107+
self.assertEqual(result.startup_timeout_seconds, 300)
108+
warnings = [r for r in captured.records if r.levelno >= logging.WARNING]
109+
self.assertEqual(warnings, [])
110+
111+
def test_no_bump_when_ml_disabled(self) -> None:
112+
spec = _proxy_spec(
113+
env={"LLMTRACE_ML_ENABLED": "0", "LLMTRACE_ML_PRELOAD": "1"},
114+
startup_timeout_seconds=300,
115+
)
116+
with self.assertLogs(lifecycle.LOGGER, level=logging.DEBUG) as captured:
117+
lifecycle.LOGGER.debug("anchor")
118+
result = lifecycle._apply_ml_preload_startup_floor(
119+
spec, tenant_id="acme"
120+
)
121+
122+
self.assertIs(result, spec)
123+
self.assertEqual(result.startup_timeout_seconds, 300)
124+
warnings = [r for r in captured.records if r.levelno >= logging.WARNING]
125+
self.assertEqual(warnings, [])
126+
127+
def test_no_bump_when_both_flags_absent(self) -> None:
128+
spec = _proxy_spec(env={}, startup_timeout_seconds=120)
129+
with self.assertLogs(lifecycle.LOGGER, level=logging.DEBUG) as captured:
130+
lifecycle.LOGGER.debug("anchor")
131+
result = lifecycle._apply_ml_preload_startup_floor(
132+
spec, tenant_id="acme"
133+
)
134+
135+
self.assertIs(result, spec)
136+
self.assertEqual(result.startup_timeout_seconds, 120)
137+
warnings = [r for r in captured.records if r.levelno >= logging.WARNING]
138+
self.assertEqual(warnings, [])
139+
140+
def test_custom_floor_override(self) -> None:
141+
spec = _proxy_spec(
142+
env={"LLMTRACE_ML_ENABLED": "1", "LLMTRACE_ML_PRELOAD": "1"},
143+
startup_timeout_seconds=100,
144+
)
145+
with self.assertLogs(lifecycle.LOGGER, level=logging.WARNING):
146+
bumped = lifecycle._apply_ml_preload_startup_floor(
147+
spec, tenant_id="acme", floor=2400
148+
)
149+
150+
self.assertEqual(bumped.startup_timeout_seconds, 2400)
151+
152+
def test_truthy_variants_recognised(self) -> None:
153+
# Any value in the truthy set on BOTH flags triggers the bump.
154+
for value in ("1", "true", "TRUE", "True", "yes", "YES"):
155+
with self.subTest(value=value):
156+
spec = _proxy_spec(
157+
env={
158+
"LLMTRACE_ML_ENABLED": value,
159+
"LLMTRACE_ML_PRELOAD": value,
160+
},
161+
startup_timeout_seconds=600,
162+
)
163+
bumped = lifecycle._apply_ml_preload_startup_floor(
164+
spec, tenant_id="acme"
165+
)
166+
self.assertEqual(
167+
bumped.startup_timeout_seconds,
168+
lifecycle.ML_PRELOAD_STARTUP_FLOOR_SECONDS,
169+
)
170+
171+
def test_falsy_variants_do_not_trigger(self) -> None:
172+
for value in ("0", "false", "no", "", "off", "FALSE"):
173+
with self.subTest(value=value):
174+
spec = _proxy_spec(
175+
env={
176+
"LLMTRACE_ML_ENABLED": "1",
177+
"LLMTRACE_ML_PRELOAD": value,
178+
},
179+
startup_timeout_seconds=300,
180+
)
181+
result = lifecycle._apply_ml_preload_startup_floor(
182+
spec, tenant_id="acme"
183+
)
184+
self.assertIs(result, spec)
185+
186+
187+
if __name__ == "__main__":
188+
unittest.main()

0 commit comments

Comments
 (0)