fix(preload): actually warm Kompress/Magika/Code-Aware models at startup#432
Open
chopratejas wants to merge 1 commit into
Open
fix(preload): actually warm Kompress/Magika/Code-Aware models at startup#432chopratejas wants to merge 1 commit into
chopratejas wants to merge 1 commit into
Conversation
eager_load_compressors only instantiated wrapper classes; the heavy work (ONNX session build, model download, AST/JIT warmup) was deferred until the first real request. Logs confirmed: "Kompress model pre-loaded at startup" fired ~3s into startup, but the actual ONNX load happened ~2min later inside the first compress() call, costing opt_ms=9633 to save 67 tokens on the first request. Add a tiny dummy forward pass for each preloaded component so the cold- start tax is paid once at startup, not on user traffic. Each warmup is guarded so a failure doesn't kill startup; the lazy path remains as fallback. Also fix the inverted Bash docstring in DEFAULT_EXCLUDE_TOOLS — Bash IS excluded by design (RTK handles Bash output compression upstream of headroom). Previous comment claimed the opposite, setting a trap for contributors who'd "fix" the exclusion.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
eager_load_compressors only instantiated wrapper classes; the heavy work (ONNX session build, model download, AST/JIT warmup) was deferred until the first real request. Logs confirmed: "Kompress model pre-loaded at startup" fired ~3s into startup, but the actual ONNX load happened ~2min later inside the first compress() call, costing opt_ms=9633 to save 67 tokens on the first request.
Add a tiny dummy forward pass for each preloaded component so the cold- start tax is paid once at startup, not on user traffic. Each warmup is guarded so a failure doesn't kill startup; the lazy path remains as fallback.
Also fix the inverted Bash docstring in DEFAULT_EXCLUDE_TOOLS — Bash IS excluded by design (RTK handles Bash output compression upstream of headroom). Previous comment claimed the opposite, setting a trap for contributors who'd "fix" the exclusion.