fix(functions): avoid quadratic-time debug logging in CleanupLLMResult / ParseFunctionCall#10592
Open
pos-ei-don wants to merge 1 commit into
Open
fix(functions): avoid quadratic-time debug logging in CleanupLLMResult / ParseFunctionCall#10592pos-ei-don wants to merge 1 commit into
pos-ei-don wants to merge 1 commit into
Conversation
…t/ParseFunctionCall The streaming chat path (core/http/endpoints/openai/chat_stream_workers.go) calls CleanupLLMResult / ParseFunctionCall once per delta chunk with the *full accumulated* LLM result so far. Both functions xlog.Debug the entire argument on entry and exit, so a single N-chunk stream emits roughly chunk_size * N^2 bytes of debug output. Under LOG_LEVEL=debug this was observed in a recent SGLang-via-LocalAI session on a DGX Spark host (about 50K tokens, long streaming generation) to drive container logs to ~96 GiB, which interacted with the streaming hot loop on the same filesystem and contributed to a host-wide hard hang once disk pressure built up. Workaround was setting LOG_LEVEL=info, but the quadratic shape remains a foot-gun for anyone intentionally enabling debug. Replace the four result-content debug arguments with len(...) plus a fixed-size head (200 bytes via a new truncForLog helper), bounding per- call output to a constant. The debug signal stays useful: the first 200 chars are enough to identify which generation is in flight, and the length lets you observe growth without paying for the payload itself. No API change. No behaviour change for LOG_LEVEL != debug. Signed-off-by: Poseidon <philipp.wacker@ibf-solutions.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
pkg/functions/parse.go::CleanupLLMResultandParseFunctionCallbothxlog.Debugthe fullllmresultstring twice per call (on entry and after the regex replace loop). The streaming chat path (core/http/endpoints/openai/chat_stream_workers.go:359) callsCleanupLLMResultonce per streaming delta chunk with the full accumulated result so far. For an N-chunk generation this means roughlychunk_size * N^2bytes of debug output total — quadratic in the number of chunks.Why this matters
Under
LOG_LEVEL=debugI observed this drive a LocalAI container's log volume to about 96 GiB during a single ~50K-token streaming session (SGLang-via-LocalAI backend on a DGX Spark / GB10, sm_121). The resulting disk pressure interacted with the streaming hot loop on the same filesystem and contributed to a host-wide hard hang. Workaround was settingLOG_LEVEL=info, but the quadratic shape is a foot-gun for anyone enabling debug intentionally for field diagnostics — it's not obvious from the code that a single Debug-level field grows superlinearly with response length.The fix
Replace the four result-content debug arguments with
len(...)plus a fixed-size head (200 bytes via a new localtruncForLoghelper), bounding per-call output to a constant. The debug signal stays useful in practice:The
Replacingdebug entries inside the replacement loop are unchanged — they are linear in the number of configuredReplaceLLMResultentries, not in the result length, so they don't accumulate.Same fix applied to
ParseFunctionCall(mirrors the same pattern, called from the same hot streaming path).Compatibility
No API change. No behaviour change for
LOG_LEVEL != debug(the default). Only the form of two log records changes when debug is enabled.Verification
I don't have a Go toolchain on the host where I wrote this, so I haven't run
go test ./pkg/functions/...locally — the change is intentionally small (4 logger arg-tuples + one helper, all in one file) and CI here will catch anything I missed. Happy to add a regression test that asserts the per-call log payload is bounded if a maintainer thinks that's worth it.