|
| 1 | +# BerriAI/litellm PR #26845 — chore(proxy): tighten budget spend admission |
| 2 | + |
| 3 | +- PR: https://github.com/BerriAI/litellm/pull/26845 |
| 4 | +- Head SHA: `926de696a11bf60fce682c3e68933f7e81418855` |
| 5 | +- Files touched: 10 files including `litellm/proxy/_types.py`, `litellm/proxy/auth/user_api_key_auth.py`, `litellm/proxy/db/spend_counter_reseed.py` (+core changes), `litellm/proxy/hooks/proxy_track_cost_callback.py`, `litellm/proxy/proxy_server.py`, `litellm/proxy/spend_tracking/budget_reservation.py` (new module), and four parameterized regression test files. +2012/-110. |
| 6 | + |
| 7 | +## Specific citations |
| 8 | + |
| 9 | +- New schema field `budget_reservation: Optional[Dict[str, Any]] = None` added to `UserAPIKeyAuth` at `_types.py:2570` — carries the in-flight budget reservation handle through the request lifecycle so the post-call accounting hook can release/finalize it against the same row that admission used. |
| 10 | +- New gating helper `_should_skip_budget_checks(request_data, route, llm_router)` at `user_api_key_auth.py:1898-1907` — extracts the previously-inline `_is_model_cost_zero(model, llm_router)` check into a named function. Same logic, cleaner call site at `:1843-1847`. |
| 11 | +- New reservation step `_reserve_budget_after_common_checks(...)` at `user_api_key_auth.py:1885-1895` — runs *after* `common_checks`, calls into the new `litellm/proxy/spend_tracking/budget_reservation.py` module, and stores the reservation handle on the auth object: `user_api_key_auth_obj.budget_reservation = await reserve_budget_for_request(...)`. Skipped when `skip_budget_checks` is true (zero-cost models). |
| 12 | +- `SpendCounterReseed.coalesced` at `spend_counter_reseed.py:118-175` gains a `require_cache_warm: bool = False` flag. When true and Redis is configured, the reseed uses `redis_cache.async_increment(key, value=db_spend)` and mirrors the resulting value into the in-memory cache via `in_memory_cache.set_cache(key=counter_key, value=current_value)` — this is the critical change: it ensures the cache reflects the *post-increment* Redis state atomically, rather than the prior pattern of `async_increment_cache(key, value=db_spend)` which races with concurrent reservers. The `require_cache_warm` branch propagates failure (via the `if require_cache_warm:` re-raise at `:150`) so callers can fail-closed on reseed failure. |
| 13 | +- New module `budget_reservation.py` (referenced via `from litellm.proxy.spend_tracking.budget_reservation import reserve_budget_for_request` at `user_api_key_auth.py:1893`) — owns the reserve-on-admission, release-on-failure, finalize-on-success state machine. The `proxy_track_cost_callback.py` change at `:288-289` adds `await _release_budget_reservation(budget_reservation=user_api_key_dict.budget_reservation)` at the post-call hook so failed/aborted requests don't leak reserved budget. |
| 14 | +- The regression covers four surfaces: `tests/test_litellm/proxy/auth/test_user_api_key_auth.py` (admission), `tests/test_litellm/proxy/hooks/test_proxy_track_cost_callback.py` (release), `tests/test_litellm/proxy/test_budget_reservation.py` (the new module's unit suite), and `tests/test_litellm/proxy/test_proxy_server.py` (end-to-end happy + failure paths). |
| 15 | + |
| 16 | +## Verdict: merge-after-nits |
| 17 | + |
| 18 | +## Concerns / nits |
| 19 | + |
| 20 | +1. **Real fix for a real bug class**: without a reservation step, two concurrent in-flight requests both pass admission against the same cached spend counter, both run, and the budget is overshot by up to the per-call cost on the slower of the two. The new reserve-on-admission/release-on-failure/finalize-on-success state machine is the correct shape, and the `require_cache_warm + redis.async_increment(...) → in_memory_cache.set_cache(post_increment_value)` pattern at `:130-138` is the right primitive — it makes the reseed atomic w.r.t. the increment Redis itself sees, eliminating the read-then-write race that the previous `async_increment_cache(key, value=db_spend)` had between concurrent admissions. |
| 21 | +2. **Reservation handle stored as `Optional[Dict[str, Any]]` at `_types.py:2570`** — this is a typing regression. The handle has known shape (`{"counter_key": str, "reserved_amount": float, "reservation_id": str, "ttl_seconds": int, ...}`); making it `Dict[str, Any]` defeats type-checking at every consumer (`proxy_track_cost_callback.py:288` has to trust the dict shape). Define a `BudgetReservation` Pydantic model alongside `UserAPIKeyAuth` and type the field as `Optional[BudgetReservation]`. Cheap fix, structural protection. |
| 22 | +3. **Reservation TTL / orphan recovery**: if the proxy crashes between admission and the post-call hook, the reservation in Redis leaks. The new module presumably has a TTL on the reservation key, but that TTL must be ≥ the maximum streaming-response latency (which can be minutes for long generations). A reservation with a 60s TTL would be released back into the pool while the request was still streaming — silent budget overrun. Confirm the TTL math and document it. |
| 23 | +4. **Coalescing semantics**: `SpendCounterReseed.coalesced` already coalesces concurrent reseeds for the same key; the `require_cache_warm` path adds a fast-path Redis increment but does not appear to coalesce concurrent *reservations* against the same counter. Two concurrent admissions against a cold counter could both trigger `redis_cache.async_increment(key, value=db_spend)` resulting in a 2× overcount until the next reseed window. Worth confirming the increment is `db_spend` *only on the first* coalesced reseed (the existing coalescer should handle this, but the behavior under the new branch needs an explicit test). |
| 24 | +5. **`require_cache_warm: bool = False` default at `:120`** is conservative (preserves existing callers' behavior) but means the new fail-closed semantics only activate when callers explicitly opt in. `_run_centralized_common_checks` is presumably wired to pass `True` — confirm in the diff slice that no admission path is left in the `False`-default fail-open mode. |
| 25 | +6. **+2012 LOC PR with a new module + a 4-file test sweep** is at the upper edge of reviewable-in-one-pass. The module split (`budget_reservation.py` standalone) is the right call; consider whether the `spend_counter_reseed.py` `require_cache_warm` change is independently mergeable as a stand-alone preparatory PR so the diff for the actual reservation logic is smaller. |
| 26 | +7. **Failure-mode log volume**: the `verbose_proxy_logger.exception(...)` at `:155` will fire on every reseed failure under the new fail-closed mode. If Redis flaps, this becomes a flood. Rate-limit the log line or aggregate at the warm path. |
| 27 | +8. **No CHANGELOG / migration entry visible in the diff**. Operators upgrading to this build will see budget-rejection behavior they didn't see before (concurrent requests previously slipped through; now they admit serially against the reservation). Worth a clear release note plus a config flag (`disable_budget_reservation: bool = False`) for operators who need the old behavior during rollout. |
0 commit comments