review(drip-195): litellm #26854 #26845, gemini-cli #26247, goose #8925 — authz/budget/MCP-config/ACP-discovery

Bojun-Vvibe · Bojun-Vvibe · commit 80c02f832e1b · 2026-04-30T12:09:14.000+08:00
diff --git a/reviews/2026-W18/drip-195/BerriAI-litellm-pr-26845.md b/reviews/2026-W18/drip-195/BerriAI-litellm-pr-26845.md
@@ -0,0 +1,27 @@
+# BerriAI/litellm PR #26845 — chore(proxy): tighten budget spend admission
+
+- PR: https://github.com/BerriAI/litellm/pull/26845
+- Head SHA: `926de696a11bf60fce682c3e68933f7e81418855`
+- Files touched: 10 files including `litellm/proxy/_types.py`, `litellm/proxy/auth/user_api_key_auth.py`, `litellm/proxy/db/spend_counter_reseed.py` (+core changes), `litellm/proxy/hooks/proxy_track_cost_callback.py`, `litellm/proxy/proxy_server.py`, `litellm/proxy/spend_tracking/budget_reservation.py` (new module), and four parameterized regression test files. +2012/-110.
+
+## Specific citations
+
+- New schema field `budget_reservation: Optional[Dict[str, Any]] = None` added to `UserAPIKeyAuth` at `_types.py:2570` — carries the in-flight budget reservation handle through the request lifecycle so the post-call accounting hook can release/finalize it against the same row that admission used.
+- New gating helper `_should_skip_budget_checks(request_data, route, llm_router)` at `user_api_key_auth.py:1898-1907` — extracts the previously-inline `_is_model_cost_zero(model, llm_router)` check into a named function. Same logic, cleaner call site at `:1843-1847`.
+- New reservation step `_reserve_budget_after_common_checks(...)` at `user_api_key_auth.py:1885-1895` — runs *after* `common_checks`, calls into the new `litellm/proxy/spend_tracking/budget_reservation.py` module, and stores the reservation handle on the auth object: `user_api_key_auth_obj.budget_reservation = await reserve_budget_for_request(...)`. Skipped when `skip_budget_checks` is true (zero-cost models).
+- `SpendCounterReseed.coalesced` at `spend_counter_reseed.py:118-175` gains a `require_cache_warm: bool = False` flag. When true and Redis is configured, the reseed uses `redis_cache.async_increment(key, value=db_spend)` and mirrors the resulting value into the in-memory cache via `in_memory_cache.set_cache(key=counter_key, value=current_value)` — this is the critical change: it ensures the cache reflects the *post-increment* Redis state atomically, rather than the prior pattern of `async_increment_cache(key, value=db_spend)` which races with concurrent reservers. The `require_cache_warm` branch propagates failure (via the `if require_cache_warm:` re-raise at `:150`) so callers can fail-closed on reseed failure.
+- New module `budget_reservation.py` (referenced via `from litellm.proxy.spend_tracking.budget_reservation import reserve_budget_for_request` at `user_api_key_auth.py:1893`) — owns the reserve-on-admission, release-on-failure, finalize-on-success state machine. The `proxy_track_cost_callback.py` change at `:288-289` adds `await _release_budget_reservation(budget_reservation=user_api_key_dict.budget_reservation)` at the post-call hook so failed/aborted requests don't leak reserved budget.
+- The regression covers four surfaces: `tests/test_litellm/proxy/auth/test_user_api_key_auth.py` (admission), `tests/test_litellm/proxy/hooks/test_proxy_track_cost_callback.py` (release), `tests/test_litellm/proxy/test_budget_reservation.py` (the new module's unit suite), and `tests/test_litellm/proxy/test_proxy_server.py` (end-to-end happy + failure paths).
+
+## Verdict: merge-after-nits
+
+## Concerns / nits
+
+1. **Real fix for a real bug class**: without a reservation step, two concurrent in-flight requests both pass admission against the same cached spend counter, both run, and the budget is overshot by up to the per-call cost on the slower of the two. The new reserve-on-admission/release-on-failure/finalize-on-success state machine is the correct shape, and the `require_cache_warm + redis.async_increment(...) → in_memory_cache.set_cache(post_increment_value)` pattern at `:130-138` is the right primitive — it makes the reseed atomic w.r.t. the increment Redis itself sees, eliminating the read-then-write race that the previous `async_increment_cache(key, value=db_spend)` had between concurrent admissions.
+2. **Reservation handle stored as `Optional[Dict[str, Any]]` at `_types.py:2570`** — this is a typing regression. The handle has known shape (`{"counter_key": str, "reserved_amount": float, "reservation_id": str, "ttl_seconds": int, ...}`); making it `Dict[str, Any]` defeats type-checking at every consumer (`proxy_track_cost_callback.py:288` has to trust the dict shape). Define a `BudgetReservation` Pydantic model alongside `UserAPIKeyAuth` and type the field as `Optional[BudgetReservation]`. Cheap fix, structural protection.
+3. **Reservation TTL / orphan recovery**: if the proxy crashes between admission and the post-call hook, the reservation in Redis leaks. The new module presumably has a TTL on the reservation key, but that TTL must be ≥ the maximum streaming-response latency (which can be minutes for long generations). A reservation with a 60s TTL would be released back into the pool while the request was still streaming — silent budget overrun. Confirm the TTL math and document it.
+4. **Coalescing semantics**: `SpendCounterReseed.coalesced` already coalesces concurrent reseeds for the same key; the `require_cache_warm` path adds a fast-path Redis increment but does not appear to coalesce concurrent *reservations* against the same counter. Two concurrent admissions against a cold counter could both trigger `redis_cache.async_increment(key, value=db_spend)` resulting in a 2× overcount until the next reseed window. Worth confirming the increment is `db_spend` *only on the first* coalesced reseed (the existing coalescer should handle this, but the behavior under the new branch needs an explicit test).
+5. **`require_cache_warm: bool = False` default at `:120`** is conservative (preserves existing callers' behavior) but means the new fail-closed semantics only activate when callers explicitly opt in. `_run_centralized_common_checks` is presumably wired to pass `True` — confirm in the diff slice that no admission path is left in the `False`-default fail-open mode.
+6. **+2012 LOC PR with a new module + a 4-file test sweep** is at the upper edge of reviewable-in-one-pass. The module split (`budget_reservation.py` standalone) is the right call; consider whether the `spend_counter_reseed.py` `require_cache_warm` change is independently mergeable as a stand-alone preparatory PR so the diff for the actual reservation logic is smaller.
+7. **Failure-mode log volume**: the `verbose_proxy_logger.exception(...)` at `:155` will fire on every reseed failure under the new fail-closed mode. If Redis flaps, this becomes a flood. Rate-limit the log line or aggregate at the warm path.
+8. **No CHANGELOG / migration entry visible in the diff**. Operators upgrading to this build will see budget-rejection behavior they didn't see before (concurrent requests previously slipped through; now they admit serially against the reservation). Worth a clear release note plus a config flag (`disable_budget_reservation: bool = False`) for operators who need the old behavior during rollout.
diff --git a/reviews/2026-W18/drip-195/BerriAI-litellm-pr-26854.md b/reviews/2026-W18/drip-195/BerriAI-litellm-pr-26854.md
@@ -0,0 +1,26 @@
+# BerriAI/litellm PR #26854 — chore(team): close authz bypass via the available-team check
+
+- PR: https://github.com/BerriAI/litellm/pull/26854
+- Head SHA: `72674b06b5aff2737c8801dc7305336fb4b62c99`
+- Files touched: `litellm/proxy/management_endpoints/team_endpoints.py`, `tests/test_litellm/proxy/management_endpoints/test_team_endpoints.py` (+360/-25, 2 files)
+
+## Specific citations
+
+- Bug class: the available-team self-join bypass at `team_endpoints.py:_validate_team_member_add_permissions` (~`:2003-2031` pre-fix) was a four-way `or`-chain that granted *any* `team_member_add` request through if `_is_available_team(team_id, user_api_key_dict)` returned true — including requests that (a) added a different `user_id` than the caller's, or (b) granted `role="admin"` on the new member. This is a textbook horizontal-privilege-escalation primitive: any holder of a low-trust key can join an available team *and inject themselves as admin*, or inject *another user* as admin.
+- Fix shape at `team_endpoints.py:2003-2071`: refactors the `or`-chain into early-returns for the three legitimate admin paths (`PROXY_ADMIN`, `_is_user_team_admin`, `_is_user_org_admin_for_team`), then on the available-team path adds two explicit checks per member at `:2046-2070`:
+  - `if getattr(member, "role", "user") != "user": raise HTTPException(403, "Available-team self-join cannot assign 'admin' role...")` at `:2049-2058`
+  - `if not caller_user_id or not member_user_id or member_user_id != caller_user_id: raise HTTPException(403, "Available-team self-join can only add the caller (user_id must match the authenticated user's user_id).")` at `:2059-2070`
+- The signature gains `data: TeamMemberAddRequest` at `:2006` so the caller-vs-member check has the request body to inspect — this is propagated at the call site in `team_member_add` at `:2428` (`data=data`).
+- Companion fix at `update_team_member_permissions` `:4697-4757`: removes the `_is_available_team` clause from the *permissions update* admin gate (lines deleted at `:4709-4713`). Available-team membership must not grant write access to team-wide permission policies — this was the same bypass class on a different endpoint. Also fixes a stale error-message route name from `"/team/member_add"` to `"/team/permissions_update"` at `:4762`.
+- New test helper `_make_team_member_add_request(member_user_id, role, team_id)` at `test_team_endpoints.py:971-983` parameterizes the regression. The diff slice shows the existing `test_validate_team_member_add_permissions_admin` being updated; full regression coverage is presumably parameterized across (caller=admin, caller=team-admin, caller=org-admin, caller=available-team-self-join with matching user_id and role=user, caller=available-team-self-join with non-matching user_id, caller=available-team-self-join with role=admin, caller=outsider).
+
+## Verdict: merge-as-is
+
+## Concerns / nits
+
+1. **Strong fix shape**: this is the correct minimal-blast-radius patch. The bypass is preserved for the legitimate "user joins their own team" use case, but its scope is now constrained to exactly that — same `user_id`, same `role="user"`. The `data` parameter threading is the cost; cheap and well-contained. The deleted `_is_available_team` clause on the permissions-update endpoint is the right call (no legitimate caller path through that endpoint should rely on the available-team escape).
+2. **Severity / disclosure**: this is a real cross-tenant trust-boundary fix on a shipping management endpoint — same severity class as #26851 (env callback refs in key metadata) and #26849 (SSRF on OAuth metadata discovery) from drip-194. Worth a clear release-note entry naming the bypass and that any operator who exposed `/team/member_add` to non-admin callers should audit team-membership rows for unexpected admins/cross-user injections since the available-team feature shipped.
+3. **`getattr(member, "role", "user") != "user"` at `:2049`** silently treats "no role attribute on the Member object" as `"user"` — that's the right default for a self-join, but a future schema change that renames the field would silently degrade the gate. Consider asserting the attribute exists (or pinning to the schema directly: `member.role` if `Member.role` is non-optional in the Pydantic model).
+4. **Two separate `HTTPException(403, ...)` at `:2050-2058` and `:2061-2069`** carry different error messages but the same status code. Some downstream auth-event consumers (SIEM, audit pipelines) key off message strings — fine to keep distinct, but log the rejection reason at WARN with structured fields (`reason="role_escalation"` vs `reason="cross_user_injection"`) so the bypass attempts are visible in monitoring, not just returned to the caller.
+5. **The fix only touches `team_member_add` and `update_team_member_permissions`**. Worth a follow-up audit grep for *every* other use of `_is_available_team` in the codebase to confirm none of them grant a write/admin escalation through the same loophole — there may be sibling endpoints (`team_member_delete`, `team_update`, key-creation gated by team membership) that need parallel fixes.
+6. **Co-authored-by claudie at `:body`** (per the PR title pattern) — unverified from this diff slice but worth confirming the AI-generated test coverage actually exercises the negative cases (admin-role attempt, cross-user-id attempt, missing-user-id attempt) and not just the positive happy path.
diff --git a/reviews/2026-W18/drip-195/block-goose-pr-8925.md b/reviews/2026-W18/drip-195/block-goose-pr-8925.md
@@ -0,0 +1,27 @@
+# block/goose PR #8925 — Added recipe discovery / execution to ACP server
+
+- PR: https://github.com/block/goose/pull/8925
+- Head SHA: `88955a934d59b2b3611b0d3200d633f9155265d9`
+- Files touched: `crates/goose/src/acp/server.rs` (+95/-16, 1 file)
+
+## Specific citations
+
+- New `sacp::schema` imports at `server.rs:40-58`: `AvailableCommand`, `AvailableCommandInput`, `AvailableCommandsUpdate`, `UnstructuredCommandInput` — these come with the latest `sacp` schema bump and are the protocol surface for "agent-available slash commands" that ACP clients (Zed, etc.) render in their command palette.
+- New helper `command_input_hint(recipe: &crate::recipe::Recipe) -> Option<String>` at `:1568-1576`: extracts the "args" parameter description from a recipe's `parameters` list, with a fallback chain — first param named `"args"`, else first param with `default.is_none()` (i.e., a required positional), else the first param at all. This produces the hint string the client renders inline next to the command name.
+- New builder `build_available_commands_from_slash_commands() -> Vec<AvailableCommand>` at `:1577-1604`: iterates `crate::slash_commands::list_commands()`, opens each `recipe_path` from disk, parses with `Recipe::from_content(...)`, builds an `AvailableCommand::new(mapping.command, recipe.description.clone())`, and conditionally attaches an `AvailableCommandInput::Unstructured(UnstructuredCommandInput::new(hint))` if `command_input_hint(&recipe)` returned `Some`. Recipes whose path doesn't exist or whose content fails to parse are silently filtered via `?` chaining inside `filter_map`.
+- New send method `send_available_commands_update(&self, cx: &ConnectionTo<Client>, session_id: &SessionId)` at `:1606-1620`: wraps the built list in a `SessionNotification` with `SessionUpdate::AvailableCommandsUpdate(AvailableCommandsUpdate::new(commands))` and pushes it to the client.
+- Wired into two session-lifecycle entry points: (a) the initial `new_session` flow at `:1819-1828` (called right after the initial `UsageUpdate` notification, ensuring the client's command palette is populated before the first user prompt), and (b) the `load_session` flow at `:2236-2238` (so reloaded sessions also get the command list — important for clients that re-render the palette per session).
+- New per-prompt routing at `:2282-2300`: parses `args.prompt`, takes the first whitespace-split token, and if it starts with `/` and `crate::slash_commands::get_recipe_for_command(command).is_some()`, sends an `AgentMessageChunk` notification to the client with `format!("Running recipe: {}", command)` as the body. This is a UI signal — does *not* actually execute the recipe in this PR.
+
+## Verdict: merge-after-nits
+
+## Concerns / nits
+
+1. **Title oversells the diff**: PR is titled "Added recipe discovery / execution to ACP server" but only the **discovery** side ships in this diff slice — the per-prompt detection at `:2282-2300` only emits an `AgentMessageChunk` ("Running recipe: /foo") and does *not* invoke the recipe. The actual execution path (translating recipe parameters → a tool-call sequence, running it under the agent's approval pipeline, surfacing recipe output as agent messages) is not visible. Either retitle to "Added recipe discovery to ACP server" or split the execution change into a follow-up PR clearly tagged as such, so reviewers know exactly what's landing.
+2. **Silent filtering on recipe load failure** at `:1582-1597`: `recipe_path.exists()` → skip, `fs::read_to_string(...).ok()?` → skip, `Recipe::from_content(...).ok()?` → skip. Three failure modes that all produce identical user-visible behavior ("command doesn't appear in palette") with zero diagnostic output. A `tracing::warn!(?recipe_path, error = ?e, "failed to load recipe for slash command discovery")` for each branch would make this debuggable for users who configure a slash command and silently see nothing in the palette. Cheap fix.
+3. **Per-session disk re-read** at `build_available_commands_from_slash_commands`: every `new_session` and `load_session` call walks `slash_commands::list_commands()` and reads every recipe file from disk synchronously inside an async context (`std::fs::read_to_string` at `:1587`). For users with many recipes (10+) on a slow filesystem (network mount, encrypted volume), this is an avoidable per-session latency hit. Cache the parsed list with a filesystem-watch invalidation, or at minimum hoist the read into a `tokio::task::spawn_blocking`.
+4. **`get_recipe_for_command(command).is_some()` at `:2293`** is called per-prompt; same disk-walk class, but per-prompt instead of per-session. Cache shape from nit 3 fixes both.
+5. **Hardcoded user-facing string `"Running recipe: {}"` at `:2295`** — not localized. Goose's CLI surface generally ships English-only so this is fine for parity, but if there's an existing localization pattern (e.g., `t!("acp.recipe.running", command = command)`) the new message should follow it.
+6. **Doc/test coverage**: the diff is +95 LOC across one file with zero new tests. At minimum: (a) a unit test for `command_input_hint(recipe)` covering the three fallback paths and the all-defaults case (returns None), (b) a unit test for `build_available_commands_from_slash_commands` against a tmpdir of fixture recipes covering the silent-skip cases, (c) an integration test that asserts the `AvailableCommandsUpdate` notification is emitted with the right shape during `new_session`. The new ACP wire-format change is a contract worth locking before it ships.
+7. **`SessionUpdate::AvailableCommandsUpdate` shape** depends on the `sacp` crate version — confirm the `Cargo.toml` bump for `sacp` is in this PR or a sibling PR; the diff slice shows new schema types being used but the version constraint isn't visible. If the bump is implicit via a workspace `cargo update`, lock it explicitly for reproducibility.
+8. **`message_text = user_message.as_concat_text()` at `:2282`** materializes the entire prompt as a single string just to check the first token. For prompts with image/file parts, `as_concat_text` may flatten attachments into placeholder text — confirm the first-token check doesn't false-positive on, e.g., an image-data-URI prefix that starts with `/`. A safer check would walk `user_message.parts` for the first `Text` part and inspect that.
diff --git a/reviews/2026-W18/drip-195/google-gemini-gemini-cli-pr-26247.md b/reviews/2026-W18/drip-195/google-gemini-gemini-cli-pr-26247.md