Skip to content

[Fix] Prevent atexit flush hangs and guard proxy_server_request header lookup#26470

Open
yuneng-berri wants to merge 7 commits intolitellm_internal_stagingfrom
litellm_fix_atexit_flush_attribute_error
Open

[Fix] Prevent atexit flush hangs and guard proxy_server_request header lookup#26470
yuneng-berri wants to merge 7 commits intolitellm_internal_stagingfrom
litellm_fix_atexit_flush_attribute_error

Conversation

@yuneng-berri
Copy link
Copy Markdown
Collaborator

@yuneng-berri yuneng-berri commented Apr 25, 2026

Relevant issues

Summary

Reduce flakiness in local_testing_part1/part2. Workers were intermittently dying with [gw3] node down: Not properly terminated while running tests/local_testing/test_amazing_vertex_completion.py, causing the whole 8-minute job to fail and need re-running.

Failure path (before fix)

  1. Worker crash: real vertex_ai/gemini-2.5-flash calls in the test suite occasionally hung past CI's --timeout=300. pytest-timeout in thread mode then calls os._exit(1) (see pytest_timeout.py:542), bypassing every atexit handler and removing the xdist worker mid-run. xdist had no restart budget, so the job failed.
  2. AttributeError on every SDK call: get_proxy_server_request_headers did litellm_params.get("proxy_server_request", {}).get("headers", {}). The chained .get only falls back to {} when the key is missing — every SDK call sets proxy_server_request to literal None, so the second .get raised AttributeError: 'NoneType' object has no attribute 'get'. The exception was caught and silently logged on every non-proxy completion.
  3. Atexit flush could block indefinitely: LoggingWorker._flush_on_exit ran each queued coroutine with loop.run_until_complete(task["coroutine"]) and no per-coroutine timeout. The outer 5s budget was only checked between iterations, so one hung coroutine could block the whole atexit handler.

Fix

  • Added --max-worker-restart=5 --retries 2 --retry-delay 5 to local_testing_part1 and local_testing_part2 in .circleci/config.yml. xdist now replaces crashed workers instead of failing the job, and the affected tests get re-run automatically. Matches the resilience config the llm_translation job already uses. Preserves all real integration test coverage.
  • get_proxy_server_request_headers now treats None and non-dict values as empty and always returns a dict.
  • _flush_on_exit wraps each task in asyncio.wait_for(coro, timeout=2.0) so no single coroutine can stall the flush past the outer budget.

Testing

  • tests/test_litellm/litellm_core_utils/test_logging_worker.py and tests/test_litellm/enterprise/enterprise_callbacks/test_callback_controls.py — 30 passed.
  • Local repro of get_proxy_server_request_headers across 6 input shapes (None, missing key, explicit None, headers None, non-dict, real headers) — all return sensible dicts, no exceptions.
  • Local repro of the flush hang: a queue containing a sleep(60) coroutine plus two fast ones — flush returned in 2.0s instead of blocking.
  • Black formatting clean on all touched files.

Type

🐛 Bug Fix
🚄 Infrastructure

Screenshots

yuneng-berri and others added 3 commits April 23, 2026 17:55
[Infra] Promote internal staging to main
* feat(openai): day-0 support for GPT-5.5 and GPT-5.5 Pro

Add pricing + capability entries for the new GPT-5.5 family launched by
OpenAI on 2026-04-24:

- gpt-5.5 / gpt-5.5-2026-04-23 (chat): $5/$30/$0.50 per 1M
  input/output/cached input
- gpt-5.5-pro / gpt-5.5-pro-2026-04-23 (responses-only): $60/$360/$6
  per 1M input/output/cached input

Other fees (long-context >272k, flex, batches, priority, cache
discounts) follow the same ratios as GPT-5.4, with context window
retained at 1.05M input / 128K output.

No transformation / classifier code changes are required:
OpenAIGPT5Config.is_model_gpt_5_4_plus_model() already matches 5.5+ via
numeric version parsing, and model registration is driven from the
JSON. The existing responses-API bridge for tools + reasoning_effort
(litellm/main.py:970) already covers gpt-5.5-pro.

Tests:
- GPT5_MODELS regression list now covers gpt-5.5-pro and dated variants
- New test_generic_cost_per_token_gpt55_pro cost-calc test
- Updated test_generic_cost_per_token_gpt55 for long-context fields

* fix(openai): mirror reasoning_effort flags onto gpt-5.5 dated variants

gpt-5.5-2026-04-23 and gpt-5.5-pro-2026-04-23 were missing the
supports_none_reasoning_effort, supports_xhigh_reasoning_effort, and
supports_minimal_reasoning_effort flags that their non-dated
counterparts define. Reasoning-effort routing in OpenAIGPT5Config is
fully capability-driven from these JSON flags — since an absent flag
is treated as False for opt-in levels (xhigh), users pinning to a
dated snapshot would silently lose xhigh support and diverge from the
base alias on logprobs + flexible temperature handling.

Copy the flags onto both dated variants so every dated snapshot
inherits the base model's reasoning-effort capability profile.

Adds a parametrized regression test that asserts
supports_{none,minimal,xhigh}_reasoning_effort parity between each
dated variant and its non-dated counterpart, preventing future drift
when new snapshots are added.
- get_proxy_server_request_headers raised AttributeError when
  litellm_params["proxy_server_request"] was set to None (the default
  for SDK callers). Guard the helper so it returns {} for None or
  non-dict values.
- LoggingWorker._flush_on_exit now wraps each queued coroutine in
  asyncio.wait_for(..., timeout=2.0) so a single hung coroutine cannot
  block the atexit handler past the outer 5s budget.
- Drop _turn_on_debug() from test_vertex_schema_test to reduce noise on
  a network-dependent test.
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ yuneng-berri
❌ mateo-berri
You have signed the CLA already but the status is still pending? Let us recheck it.

@veria-ai
Copy link
Copy Markdown

veria-ai Bot commented Apr 25, 2026

Low: No security issues found

This PR adds defensive type-checking to proxy server request header lookup, adds a per-task timeout to the atexit flush to prevent hangs, and adjusts CI test retry settings. None of the changes introduce new attack surface or weaken existing security controls.


Status: 0 open
Risk: 1/10

Posted by Veria AI · 2026-04-25T06:00:22.256Z

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

This PR addresses three independent issues contributing to CI flakiness: an AttributeError in get_proxy_server_request_headers when proxy_server_request is None, unbounded coroutine execution in LoggingWorker._flush_on_exit, and test worker crashes in xdist. The core code fixes (llm_request_utils.py and logging_worker.py) are correct and well-scoped. Note that the PR description claims both schema-test functions are now mocked and hermetic, but the diff shows only litellm._turn_on_debug() was removed — both tests still issue real Vertex AI HTTP calls, so CI resilience actually comes from the added retry/worker-restart flags rather than test isolation.

Confidence Score: 5/5

Safe to merge; all P-level findings are P2 (misleading PR description, no runtime regressions introduced).

Both code-level bug fixes are correct and narrowly scoped. The CI config change uses an already-declared dependency. The only concern is that the PR description overstates test hermeticity — a P2 documentation gap that does not affect runtime behavior.

tests/local_testing/test_amazing_vertex_completion.py — still makes real Vertex AI calls contrary to PR description.

Important Files Changed

Filename Overview
litellm/litellm_core_utils/llm_request_utils.py Correctly fixes AttributeError when proxy_server_request is None by guarding the chained .get() with a dict check before accessing headers.
litellm/litellm_core_utils/logging_worker.py Wraps each queued coroutine in asyncio.wait_for(..., timeout=2.0) to prevent a single hung task from blocking the entire atexit flush past the outer budget; hardcoded 2.0s was flagged in a prior comment.
.circleci/config.yml Adds --retries 2 --retry-delay 5 --max-worker-restart=5 to both local_testing_part1 and local_testing_part2 jobs; pytest-retry==1.7.0 is a declared dependency so the new flags are valid.
tests/local_testing/test_amazing_vertex_completion.py Only removes litellm._turn_on_debug() from test_vertex_schema_test; despite the PR description claiming tests are mocked, both target tests still make real Vertex AI network calls.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[atexit triggered] --> B[_flush_on_exit]
    B --> C{Queue empty?}
    C -- Yes --> Z[return]
    C -- No --> D[new event loop]
    D --> E{time < MAX_TIME_TO_CLEAR_QUEUE AND iterations < MAX_ITERATIONS}
    E -- No --> F[log warning & break]
    E -- Yes --> G[queue.get_nowait]
    G --> H[asyncio.wait_for coroutine timeout=2.0]
    H -- success --> I[processed += 1]
    H -- TimeoutError/Exception --> J[silent failure]
    I --> E
    J --> E

    subgraph llm_request_utils fix
    K[litellm_params.get proxy_server_request] --> L{is None or not dict?}
    L -- Yes --> M[return empty dict]
    L -- No --> N[.get headers or empty]
    end
Loading

Reviews (4): Last reviewed commit: "[Infra] add worker restart and retry fla..." | Re-trigger Greptile

try:
loop.run_until_complete(task["coroutine"])
loop.run_until_complete(
asyncio.wait_for(task["coroutine"], timeout=2.0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded per-task timeout may exceed configurable outer budget

The per-task timeout of 2.0 s is a magic number that doesn't track the configurable MAX_TIME_TO_CLEAR_QUEUE constant (defaults to 5.0 s, overridable via env). If an operator sets MAX_TIME_TO_CLEAR_QUEUE=1.0, every single task would consume more than the entire outer budget before the loop-level time check could fire, effectively turning the outer limit into dead code for any queue with at least one slow coroutine.

Consider deriving the per-task limit from the outer budget or at least capping it:

    _per_task_timeout = min(2.0, MAX_TIME_TO_CLEAR_QUEUE)
    loop.run_until_complete(
        asyncio.wait_for(task["coroutine"], timeout=_per_task_timeout)
    )

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 25, 2026

Codecov Report

❌ Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/litellm_core_utils/llm_request_utils.py 75.00% 1 Missing ⚠️
litellm/litellm_core_utils/logging_worker.py 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented Apr 25, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
31539530 Triggered Generic Password 88faae2 .github/workflows/_test-unit-services-base.yml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

test_vertex_schema_test and test_gemini_nullable_object_tool_schema_httpx
were calling vertex_ai/gemini-2.5-flash for real. When Vertex was slow,
the calls hung past pytest's 300s timeout, pytest-timeout fired
os._exit(1), and the xdist worker disappeared mid-run as
"Not properly terminated".

The actual purpose of these tests is to exercise the schema-conversion
code path (anyOf with null types, nullable object tool params). Mocking
the HTTP layer and the auth token retrieval makes them hermetic and
removes the worker-crash failure mode entirely.
Match the resilience config that the llm_translation job already uses:
--max-worker-restart=5 lets xdist replace workers killed by
pytest-timeout's os._exit(1) instead of failing the whole job, and
--retries 2 --retry-delay 5 self-heals transient network flakes.

Preserves real integration test coverage; no test code changes needed.
@yuneng-berri yuneng-berri requested a review from a team April 25, 2026 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants