Skip to content

fix(preload): actually warm Kompress/Magika/Code-Aware models at startup#432

Open
chopratejas wants to merge 1 commit into
mainfrom
fix/preload-cold-start-tax
Open

fix(preload): actually warm Kompress/Magika/Code-Aware models at startup#432
chopratejas wants to merge 1 commit into
mainfrom
fix/preload-cold-start-tax

Conversation

@chopratejas
Copy link
Copy Markdown
Owner

eager_load_compressors only instantiated wrapper classes; the heavy work (ONNX session build, model download, AST/JIT warmup) was deferred until the first real request. Logs confirmed: "Kompress model pre-loaded at startup" fired ~3s into startup, but the actual ONNX load happened ~2min later inside the first compress() call, costing opt_ms=9633 to save 67 tokens on the first request.

Add a tiny dummy forward pass for each preloaded component so the cold- start tax is paid once at startup, not on user traffic. Each warmup is guarded so a failure doesn't kill startup; the lazy path remains as fallback.

Also fix the inverted Bash docstring in DEFAULT_EXCLUDE_TOOLS — Bash IS excluded by design (RTK handles Bash output compression upstream of headroom). Previous comment claimed the opposite, setting a trap for contributors who'd "fix" the exclusion.

eager_load_compressors only instantiated wrapper classes; the heavy work
(ONNX session build, model download, AST/JIT warmup) was deferred until
the first real request. Logs confirmed: "Kompress model pre-loaded at
startup" fired ~3s into startup, but the actual ONNX load happened ~2min
later inside the first compress() call, costing opt_ms=9633 to save 67
tokens on the first request.

Add a tiny dummy forward pass for each preloaded component so the cold-
start tax is paid once at startup, not on user traffic. Each warmup is
guarded so a failure doesn't kill startup; the lazy path remains as
fallback.

Also fix the inverted Bash docstring in DEFAULT_EXCLUDE_TOOLS — Bash IS
excluded by design (RTK handles Bash output compression upstream of
headroom). Previous comment claimed the opposite, setting a trap for
contributors who'd "fix" the exclusion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant