[Fix] Prevent atexit flush hangs and guard proxy_server_request header lookup#26470
[Fix] Prevent atexit flush hangs and guard proxy_server_request header lookup#26470yuneng-berri wants to merge 7 commits intolitellm_internal_stagingfrom
Conversation
[Infra] Promote internal staging to main
* feat(openai): day-0 support for GPT-5.5 and GPT-5.5 Pro
Add pricing + capability entries for the new GPT-5.5 family launched by
OpenAI on 2026-04-24:
- gpt-5.5 / gpt-5.5-2026-04-23 (chat): $5/$30/$0.50 per 1M
input/output/cached input
- gpt-5.5-pro / gpt-5.5-pro-2026-04-23 (responses-only): $60/$360/$6
per 1M input/output/cached input
Other fees (long-context >272k, flex, batches, priority, cache
discounts) follow the same ratios as GPT-5.4, with context window
retained at 1.05M input / 128K output.
No transformation / classifier code changes are required:
OpenAIGPT5Config.is_model_gpt_5_4_plus_model() already matches 5.5+ via
numeric version parsing, and model registration is driven from the
JSON. The existing responses-API bridge for tools + reasoning_effort
(litellm/main.py:970) already covers gpt-5.5-pro.
Tests:
- GPT5_MODELS regression list now covers gpt-5.5-pro and dated variants
- New test_generic_cost_per_token_gpt55_pro cost-calc test
- Updated test_generic_cost_per_token_gpt55 for long-context fields
* fix(openai): mirror reasoning_effort flags onto gpt-5.5 dated variants
gpt-5.5-2026-04-23 and gpt-5.5-pro-2026-04-23 were missing the
supports_none_reasoning_effort, supports_xhigh_reasoning_effort, and
supports_minimal_reasoning_effort flags that their non-dated
counterparts define. Reasoning-effort routing in OpenAIGPT5Config is
fully capability-driven from these JSON flags — since an absent flag
is treated as False for opt-in levels (xhigh), users pinning to a
dated snapshot would silently lose xhigh support and diverge from the
base alias on logprobs + flexible temperature handling.
Copy the flags onto both dated variants so every dated snapshot
inherits the base model's reasoning-effort capability profile.
Adds a parametrized regression test that asserts
supports_{none,minimal,xhigh}_reasoning_effort parity between each
dated variant and its non-dated counterpart, preventing future drift
when new snapshots are added.
- get_proxy_server_request_headers raised AttributeError when
litellm_params["proxy_server_request"] was set to None (the default
for SDK callers). Guard the helper so it returns {} for None or
non-dict values.
- LoggingWorker._flush_on_exit now wraps each queued coroutine in
asyncio.wait_for(..., timeout=2.0) so a single hung coroutine cannot
block the atexit handler past the outer 5s budget.
- Drop _turn_on_debug() from test_vertex_schema_test to reduce noise on
a network-dependent test.
|
|
Low: No security issues foundThis PR adds defensive type-checking to proxy server request header lookup, adds a per-task timeout to the atexit flush to prevent hangs, and adjusts CI test retry settings. None of the changes introduce new attack surface or weaken existing security controls. Status: 0 open Posted by Veria AI · 2026-04-25T06:00:22.256Z |
Greptile SummaryThis PR addresses three independent issues contributing to CI flakiness: an Confidence Score: 5/5Safe to merge; all P-level findings are P2 (misleading PR description, no runtime regressions introduced). Both code-level bug fixes are correct and narrowly scoped. The CI config change uses an already-declared dependency. The only concern is that the PR description overstates test hermeticity — a P2 documentation gap that does not affect runtime behavior. tests/local_testing/test_amazing_vertex_completion.py — still makes real Vertex AI calls contrary to PR description.
|
| Filename | Overview |
|---|---|
| litellm/litellm_core_utils/llm_request_utils.py | Correctly fixes AttributeError when proxy_server_request is None by guarding the chained .get() with a dict check before accessing headers. |
| litellm/litellm_core_utils/logging_worker.py | Wraps each queued coroutine in asyncio.wait_for(..., timeout=2.0) to prevent a single hung task from blocking the entire atexit flush past the outer budget; hardcoded 2.0s was flagged in a prior comment. |
| .circleci/config.yml | Adds --retries 2 --retry-delay 5 --max-worker-restart=5 to both local_testing_part1 and local_testing_part2 jobs; pytest-retry==1.7.0 is a declared dependency so the new flags are valid. |
| tests/local_testing/test_amazing_vertex_completion.py | Only removes litellm._turn_on_debug() from test_vertex_schema_test; despite the PR description claiming tests are mocked, both target tests still make real Vertex AI network calls. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[atexit triggered] --> B[_flush_on_exit]
B --> C{Queue empty?}
C -- Yes --> Z[return]
C -- No --> D[new event loop]
D --> E{time < MAX_TIME_TO_CLEAR_QUEUE AND iterations < MAX_ITERATIONS}
E -- No --> F[log warning & break]
E -- Yes --> G[queue.get_nowait]
G --> H[asyncio.wait_for coroutine timeout=2.0]
H -- success --> I[processed += 1]
H -- TimeoutError/Exception --> J[silent failure]
I --> E
J --> E
subgraph llm_request_utils fix
K[litellm_params.get proxy_server_request] --> L{is None or not dict?}
L -- Yes --> M[return empty dict]
L -- No --> N[.get headers or empty]
end
Reviews (4): Last reviewed commit: "[Infra] add worker restart and retry fla..." | Re-trigger Greptile
| try: | ||
| loop.run_until_complete(task["coroutine"]) | ||
| loop.run_until_complete( | ||
| asyncio.wait_for(task["coroutine"], timeout=2.0) |
There was a problem hiding this comment.
Hardcoded per-task timeout may exceed configurable outer budget
The per-task timeout of 2.0 s is a magic number that doesn't track the configurable MAX_TIME_TO_CLEAR_QUEUE constant (defaults to 5.0 s, overridable via env). If an operator sets MAX_TIME_TO_CLEAR_QUEUE=1.0, every single task would consume more than the entire outer budget before the loop-level time check could fire, effectively turning the outer limit into dead code for any queue with at least one slow coroutine.
Consider deriving the per-task limit from the outer budget or at least capping it:
_per_task_timeout = min(2.0, MAX_TIME_TO_CLEAR_QUEUE)
loop.run_until_complete(
asyncio.wait_for(task["coroutine"], timeout=_per_task_timeout)
)
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…itellm_fix_atexit_flush_attribute_error
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 31539530 | Triggered | Generic Password | 88faae2 | .github/workflows/_test-unit-services-base.yml | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
test_vertex_schema_test and test_gemini_nullable_object_tool_schema_httpx were calling vertex_ai/gemini-2.5-flash for real. When Vertex was slow, the calls hung past pytest's 300s timeout, pytest-timeout fired os._exit(1), and the xdist worker disappeared mid-run as "Not properly terminated". The actual purpose of these tests is to exercise the schema-conversion code path (anyOf with null types, nullable object tool params). Mocking the HTTP layer and the auth token retrieval makes them hermetic and removes the worker-crash failure mode entirely.
…rk calls" This reverts commit eddae66.
Match the resilience config that the llm_translation job already uses: --max-worker-restart=5 lets xdist replace workers killed by pytest-timeout's os._exit(1) instead of failing the whole job, and --retries 2 --retry-delay 5 self-heals transient network flakes. Preserves real integration test coverage; no test code changes needed.
Relevant issues
Summary
Reduce flakiness in
local_testing_part1/part2. Workers were intermittently dying with[gw3] node down: Not properly terminatedwhile runningtests/local_testing/test_amazing_vertex_completion.py, causing the whole 8-minute job to fail and need re-running.Failure path (before fix)
vertex_ai/gemini-2.5-flashcalls in the test suite occasionally hung past CI's--timeout=300.pytest-timeoutinthreadmode then callsos._exit(1)(seepytest_timeout.py:542), bypassing every atexit handler and removing the xdist worker mid-run. xdist had no restart budget, so the job failed.get_proxy_server_request_headersdidlitellm_params.get("proxy_server_request", {}).get("headers", {}). The chained.getonly falls back to{}when the key is missing — every SDK call setsproxy_server_requestto literalNone, so the second.getraisedAttributeError: 'NoneType' object has no attribute 'get'. The exception was caught and silently logged on every non-proxy completion.LoggingWorker._flush_on_exitran each queued coroutine withloop.run_until_complete(task["coroutine"])and no per-coroutine timeout. The outer 5s budget was only checked between iterations, so one hung coroutine could block the whole atexit handler.Fix
--max-worker-restart=5 --retries 2 --retry-delay 5tolocal_testing_part1andlocal_testing_part2in.circleci/config.yml. xdist now replaces crashed workers instead of failing the job, and the affected tests get re-run automatically. Matches the resilience config thellm_translationjob already uses. Preserves all real integration test coverage.get_proxy_server_request_headersnow treatsNoneand non-dict values as empty and always returns a dict._flush_on_exitwraps each task inasyncio.wait_for(coro, timeout=2.0)so no single coroutine can stall the flush past the outer budget.Testing
tests/test_litellm/litellm_core_utils/test_logging_worker.pyandtests/test_litellm/enterprise/enterprise_callbacks/test_callback_controls.py— 30 passed.get_proxy_server_request_headersacross 6 input shapes (None, missing key, explicit None, headers None, non-dict, real headers) — all return sensible dicts, no exceptions.sleep(60)coroutine plus two fast ones — flush returned in 2.0s instead of blocking.Type
🐛 Bug Fix
🚄 Infrastructure
Screenshots