Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR adds team-scoped default models ( Key changes:
Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| litellm/proxy/auth/auth_checks.py | Adds compute_effective_models and get_effective_team_models to the hot auth path; can_team_access_model now computes per-member effective models, but the access_group_ids fallback bypasses those restrictions silently and redundant os.getenv() call occurs on every auth check when flag is false. |
| litellm/proxy/management_endpoints/team_endpoints.py | Adds default_models subset validation, auto-pruning on team update, and member model override validation; team_member_update now resolves stored_models but returns raw tpm_limit/rpm_limit in response inconsistently; guaranteed DB hit after cache invalidation in several paths. |
| litellm/proxy/management_endpoints/key_management_endpoints.py | Enforces effective-model subset validation at key-generation time using the existing get_team_membership helper; all-team-models correctly resolves to effective set; double env-var check and service-key all-team-models scope are pre-existing concerns noted in prior threads. |
| litellm/proxy/management_endpoints/common_utils.py | Extends _upsert_budget_and_membership to accept a models parameter; early-return correctly guards against no-op calls; the new upsert correctly handles models-only, budget-only, and combined updates; each budget update still creates a new row rather than updating existing. |
| litellm/proxy/utils.py | Adds t.default_models and tm.models to the raw hot-path SQL join; these columns don't exist pre-migration and will cause the proxy to hard-fail on every authenticated request until the migration is applied — no column-guard or graceful fallback. |
| litellm/proxy/management_helpers/utils.py | Extends add_new_member to persist member model overrides and validate them against team_models; every budget update creates a new orphaned row rather than updating existing; cache invalidation is correctly added after membership creation. |
| litellm/proxy/_types.py | Adds models, tpm_limit, rpm_limit to Member; default_models to TeamBase and UpdateTeamRequest; team_member_models and team_default_models to LiteLLM_VerificationTokenView; models to LiteLLM_TeamMembership and TeamMemberUpdateRequest/Response; all cleanly typed with Optional/List defaults. |
| litellm/init.py | Adds team_model_overrides_enabled module-level flag initialised from env var at import time; straightforward and correct. |
| litellm/proxy/schema.prisma | Adds default_models String[] @default([]) to LiteLLM_TeamTable and models String[] @default([]) to LiteLLM_TeamMembership; schema changes are consistent across all three schema.prisma files. |
| tests/test_litellm/proxy/auth/test_team_model_overrides.py | New test file with comprehensive unit and integration coverage of effective-model computation, access denial, flag-off backward compat, cross-user isolation, stale-override fallback, and service-key behavior; all tests are pure mocks (no network calls), satisfying the no-network-call rule. |
| tests/test_litellm/proxy/common_utils/test_upsert_budget_membership.py | Updated existing test to reflect new no-op early-return behavior (now correct: only disconnects when budget exists); adds new tests for models-only, models+budget, and models=[] clearing paths; all mock-only. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Incoming Request / Auth Check] --> B{team_model_overrides_enabled?}
B -- No --> C[Use team.models as-is]
B -- Yes --> D{valid_token.user_id set?}
D -- No service key --> C
D -- Yes --> E[get_effective_team_models]
E --> F[team_defaults = team_object.default_models]
E --> G[member_models = valid_token.team_member_models from SQL JOIN]
F & G --> H[compute_effective_models]
H --> I{effective empty?}
I -- Yes --> J[fallback to team_pool]
I -- No --> K{team_pool empty?}
K -- Yes allow-all --> L[return effective as-is]
K -- No --> M{effective ∩ team_pool empty?}
M -- Yes stale overrides --> J
M -- No --> N[return capped effective]
J & L & N --> O[_can_object_call_model with effective_models]
O -- Denied --> P{team has access_group_ids?}
P -- No --> Q[Raise ProxyException 401]
P -- Yes --> R[_get_models_from_access_groups bypass]
R --> S{model in group?}
S -- No --> Q
S -- Yes --> T[Allow - bypasses per-member restrictions]
O -- Allowed --> T
Comments Outside Diff (1)
-
litellm/proxy/management_endpoints/team_endpoints.py, line 2610-2618 (link)tpm_limit/rpm_limitin response inconsistent withmodelsThe response for
modelswas correctly updated to usestored_models(the authoritative resolved value), buttpm_limitandrpm_limitstill return the raw request values (data.tpm_limit,data.rpm_limit), which may beNonewhen only models were updated.This creates a misleading response: if a caller updates only
models, the response showstpm_limit: nulleven if the member has an existing limit of, say, 500. Callers that inspect the response to build a reconciliation view (e.g., dashboards, audit logging) will see incorrect zero/null limits.resolved_tpmandresolved_rpmare already computed above for writing intomembers_with_roles— the response should use them too:
Last reviewed commit: "fix: req changes by ..."
| _membership = await prisma_client.db.litellm_teammembership.find_unique( | ||
| where={ | ||
| "user_id_team_id": { | ||
| "user_id": data.user_id, | ||
| "team_id": team_table.team_id, | ||
| } | ||
| } | ||
| ) | ||
| if _membership is not None: | ||
| member_models = _membership.models or [] |
There was a problem hiding this comment.
Direct DB query bypasses established helper-function pattern
A raw prisma_client.db.litellm_teammembership.find_unique(...) call is made here to resolve member models at key-generation time. The codebase convention (enforced in the db-rules) is that all DB reads must go through the existing get_team/get_user/get_key helper functions rather than direct Prisma calls.
Even though the code comment notes this is a management endpoint (not the hot auth path), making a direct DB read here:
- Bypasses any caching that the helper functions may apply.
- Creates an inconsistent access pattern that can be harder to audit for regressions.
- Adds an extra round-trip that isn't guarded by retry/error-handling wrappers used in the helpers.
Consider wrapping this lookup in a get_team_membership (or equivalent) helper, similar to how get_team_object is used elsewhere in this file.
Rule Used: What: In critical path of request, there should be... (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| if not ( | ||
| litellm.team_model_overrides_enabled | ||
| or os.getenv("TEAM_MODEL_OVERRIDES", "").lower() == "true" | ||
| ): |
There was a problem hiding this comment.
os.getenv() called on every auth check when feature flag is False
litellm.team_model_overrides_enabled is already initialised at module load time from the same environment variable. When the flag is False, the short-circuit or still falls through and calls os.getenv("TEAM_MODEL_OVERRIDES", "") on every request that reaches can_team_access_model.
This is the hot auth path — even a cheap os.getenv call multiplied across thousands of requests per second adds up. Consider reading the env-var only once and storing it on litellm.team_model_overrides_enabled (which already exists), then checking only litellm.team_model_overrides_enabled here:
| if not ( | |
| litellm.team_model_overrides_enabled | |
| or os.getenv("TEAM_MODEL_OVERRIDES", "").lower() == "true" | |
| ): | |
| if not litellm.team_model_overrides_enabled: |
If dynamic env-var toggling at runtime is intentionally supported, document it explicitly so the cost is clearly understood.
| if ( | ||
| litellm.team_model_overrides_enabled | ||
| or os.getenv("TEAM_MODEL_OVERRIDES", "").lower() == "true" |
There was a problem hiding this comment.
Same double env-var check also present in key generation
This mirrors the same pattern in get_effective_team_models: litellm.team_model_overrides_enabled is checked first, but if False the code still reads os.getenv("TEAM_MODEL_OVERRIDES", "") on every /key/generate call. Consolidating the flag check to litellm.team_model_overrides_enabled (updated at import time and settable programmatically) keeps the logic consistent with auth_checks.py and avoids the redundant env read.
litellm/proxy/auth/auth_checks.py
Outdated
| - If cap empties the list (all stale), falls back to team_pool (NOT [] which = allow-all). | ||
| - team_pool=[] means "allow all" — cap is skipped. | ||
| """ | ||
| effective = list(set(team_defaults + member_models)) |
There was a problem hiding this comment.
Non-deterministic model order from
set()
list(set(team_defaults + member_models)) does not preserve insertion order. Across different Python invocations or even across calls in the same process, set iteration order is not guaranteed. This means two calls with identical inputs may produce differently-ordered model lists, so keys generated for the same effective model set can look different when inspected via the API or in the DB.
Consider using dict.fromkeys() which preserves first-occurrence order and de-duplicates:
| effective = list(set(team_defaults + member_models)) | |
| effective = list(dict.fromkeys(team_defaults + member_models)) |
| import sys | ||
| import os | ||
| import pytest | ||
|
|
||
| # Add the parent directory to the system path to import litellm | ||
| sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../.."))) | ||
|
|
||
| import litellm | ||
| from litellm.proxy._types import UserAPIKeyAuth, LiteLLM_TeamTable | ||
| from litellm.proxy.auth.auth_checks import ( |
There was a problem hiding this comment.
Duplicate test coverage across two files
tests/proxy_unit_tests/test_team_model_overrides.py (2 tests) and tests/test_litellm/proxy/auth/test_team_model_overrides.py (12+ tests) cover largely the same logic. The proxy_unit_tests file is a strict subset — every scenario it covers is also tested by the more comprehensive test_litellm file. Having duplicate, partially-overlapping test files causes maintenance overhead and makes it unclear which is canonical.
Consider removing this file and relying solely on tests/test_litellm/proxy/auth/test_team_model_overrides.py for coverage.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| @@ -2435,18 +2525,39 @@ async def team_member_update( | |||
| user_api_key_dict=user_api_key_dict, | |||
| tpm_limit=data.tpm_limit, | |||
| rpm_limit=data.rpm_limit, | |||
There was a problem hiding this comment.
get_team_membership cache not invalidated after model write
_upsert_budget_and_membership writes the new models value into LiteLLM_TeamMembership via an upsert, but the get_team_membership helper in auth_checks.py caches results in user_api_key_cache and this cache entry is never invalidated after the membership update.
Any /key/generate call that fires for the same user before the cache TTL expires will invoke get_team_membership, receive the old membership row from cache, and embed the stale effective model list into the newly-generated key (see key_management_endpoints.py lines 652–666). The key's model scope will not reflect the update until the cache naturally expires.
Consider adding an explicit cache eviction for the membership entry immediately after the transaction commits — auth_checks.get_team_membership uses a well-known cache key format, so it can be targeted for deletion. Alternatively, expose a helper function alongside get_team_membership that clears its cache entry and call it from team_member_update after the write.
| stored_models = data.models | ||
| if stored_models is None: | ||
| from litellm.proxy.auth.auth_checks import get_team_membership | ||
| from litellm.proxy.proxy_server import user_api_key_cache | ||
|
|
||
| _tm_row = await get_team_membership( | ||
| user_id=received_user_id, | ||
| team_id=data.team_id, | ||
| prisma_client=prisma_client, | ||
| user_api_key_cache=user_api_key_cache, | ||
| ) | ||
| stored_models = (_tm_row.models or []) if _tm_row is not None else [] |
There was a problem hiding this comment.
Extra DB round-trip when
data.models not updated
When a caller updates only role, max_budget_in_team, tpm_limit, or rpm_limit — without touching models — stored_models is None and the code makes a fresh get_team_membership DB call (bypassing cache since it was just invalidated) purely to populate the models field in the response and to conditionally sync members_with_roles.
However, members_with_roles is only written when data.role is not None or data.models is not None (line 2556). So when neither is set, this extra DB call feeds only the models field of the return value. This is unnecessary overhead on a management endpoint for every budget/tpm/rpm-only update.
Consider returning stored_models = None (or omitting it) when models weren't part of the update, or lazily fetching only when data.role is not None or data.models is not None.
| if not effective: | ||
| return team_pool | ||
|
|
||
| if team_pool: | ||
| effective = [m for m in effective if m in set(team_pool)] | ||
| if not effective: | ||
| return team_pool | ||
|
|
There was a problem hiding this comment.
Stale member overrides silently escalate to full team access
When a member has per-user model overrides set (e.g. ["gpt-4o"]), but ALL of those models are subsequently removed from team.models, compute_effective_models falls back to team_pool — the full team model list. This means the member quietly gains access to every model the team has, which is broader than their originally intended restricted set.
Concretely:
team.models = ["gpt-4", "gpt-3.5"]- member override:
["gpt-4o"](now stale —gpt-4oremoved from team) effective = ["gpt-4o"]→ capped by team_pool →[]→ falls back to["gpt-4", "gpt-3.5"]
The comment justifies this as "NOT [] which = allow-all", but team_pool can itself be a large set. An admin revoking a specific model override may expect the member to lose all team access until explicitly reassigned, not to receive a broad promotion.
If this fallback is intentional, add a clear doc comment explaining the privilege trade-off and consider whether team_pool fallback should be configurable (e.g. a stricter mode that returns [] = deny vs fall-through-to-team-pool).
| _cache_key = f"team_membership:{received_user_id}:{data.team_id}" | ||
| user_api_key_cache.delete_cache(key=_cache_key) |
There was a problem hiding this comment.
Sync
delete_cache blocks event loop in async endpoint
user_api_key_cache.delete_cache calls DualCache.delete_cache which (when Redis is configured) invokes the synchronous Redis client (redis_client.delete()). Running a sync network I/O call inside an async FastAPI endpoint blocks the event loop for the duration of the Redis round-trip.
DualCache exposes async_delete_cache which properly awaits the async Redis client. The async variant should be used here instead of the sync one.
| if effective_models: | ||
| # if 'all-team-models' was requested, restrict it to the effective models | ||
| if "all-team-models" in (data.models or []): | ||
| data_json["models"] = effective_models | ||
| # if explicit models were requested, validate they're a subset of effective set | ||
| elif data.models: | ||
| disallowed = set(data.models) - set(effective_models) | ||
| if disallowed: | ||
| raise HTTPException( | ||
| status_code=403, | ||
| detail={ | ||
| "error": f"Requested models not in user's effective team models. " | ||
| f"Disallowed: {sorted(disallowed)}. " | ||
| f"Effective models: {sorted(effective_models)}" | ||
| }, | ||
| ) | ||
| # if NO models was requested, runtime auth will compute effective models | ||
| # from the SQL view join (tm.models + t.default_models), so nothing to store here |
There was a problem hiding this comment.
all-team-models request restricted to team_defaults when user_id is absent
When a caller generates a team-level key without supplying user_id (e.g. a service/bot key), member_models = [] and effective_models reduces to just team_default_models. If an admin has set default_models on the team, a key generated with "all-team-models" is silently stored with only those defaults rather than the full team.models pool.
Before this feature was introduced, "all-team-models" always resolved to the full team model pool. With the feature enabled and default_models configured, the semantics change for service keys.
Consider documenting this explicitly in the code comment and in model_access.md, or — if full team access is desired for keyless (service-account) generation — short-circuiting the restriction when data.user_id is None:
# If no user_id, skip per-user restriction (service key gets full team access)
if data.user_id and prisma_client is not None:
_membership = await get_team_membership(...)
...| if data.role is not None or data.models is not None: | ||
| team_members: List[Member] = [] | ||
| for member in team_table.members_with_roles: | ||
| if member.user_id == received_user_id: | ||
| team_members.append( | ||
| Member( | ||
| user_id=member.user_id, | ||
| role=data.role, | ||
| role=data.role or member.role, | ||
| user_email=data.user_email or member.user_email, | ||
| models=stored_models, | ||
| tpm_limit=data.tpm_limit if data.tpm_limit is not None else getattr(member, "tpm_limit", None), | ||
| rpm_limit=data.rpm_limit if data.rpm_limit is not None else getattr(member, "rpm_limit", None), |
There was a problem hiding this comment.
tpm_limit/rpm_limit in members_with_roles can silently become stale
The PR adds tpm_limit and rpm_limit to the Member type and now serialises them into members_with_roles. However, this JSON blob is only rewritten when data.role is not None or data.models is not None. This means a tpm_limit-only update (data.role=None, data.models=None) never writes to members_with_roles.
The problematic sequence:
/team/member_updatewith{"user_id": "alice", "tpm_limit": 500}→ budget table getstpm_limit=500;members_with_rolesis not touched (condition is false)/team/member_updatewith{"user_id": "alice", "models": ["gpt-4o"]}→ enters the block;getattr(member, "tpm_limit", None)reads from the still-stalemembers_with_roles→ writesMember(tpm_limit=None)back to JSON
After step 2, members_with_roles reports tpm_limit: null, while the budget table correctly enforces 500. Any downstream consumer of /team/info (e.g., a dashboard) would show wrong rate limits.
Enforcement comes from the budget table so there is no correctness issue, but the display inconsistency could confuse operators.
Consider widening the condition to also trigger when tpm_limit or rpm_limit changes:
if data.role is not None or data.models is not None or data.tpm_limit is not None or data.rpm_limit is not None:Or, since stored_models is already resolved above regardless of the condition, you could always rewrite members_with_roles when any field changes — the team_table.update call is cheap relative to the surrounding DB work already done.
|
@greptile can you please review this PR? |
| m.aliases AS team_model_aliases, | ||
| -- Added comma to separate b.* columns |
There was a problem hiding this comment.
Raw SQL query adds columns that don't exist on un-migrated databases
t.default_models AS team_default_models and tm.models AS team_member_models are added directly to the raw SQL join query. This query is executed on every authenticated request (the hot auth path).
If a deployment upgrades the application code but has not yet run the Prisma migration (which adds the default_models and models columns), every single request will fail with a PostgreSQL column "default_models" does not exist error, taking down the proxy entirely.
Unlike Prisma model files, raw SQL strings have no migration-awareness. The migration uses ADD COLUMN IF NOT EXISTS which is safe, but the application SQL does not.
A safeguard pattern used elsewhere in litellm is to guard new column reads behind a feature flag or to catch the PrismaError/psycopg2.errors.UndefinedColumn and fall back to None. At minimum, the migration notes in the PR description should make it explicit that the migration must be applied before deploying this code version.
| stored_models = data.models | ||
| if stored_models is None: | ||
| from litellm.proxy.auth.auth_checks import get_team_membership | ||
| from litellm.proxy.proxy_server import user_api_key_cache | ||
|
|
||
| _tm_row = await get_team_membership( | ||
| user_id=received_user_id, | ||
| team_id=data.team_id, | ||
| prisma_client=prisma_client, | ||
| user_api_key_cache=user_api_key_cache, | ||
| ) | ||
| stored_models = (_tm_row.models or []) if _tm_row is not None else [] |
There was a problem hiding this comment.
Guaranteed DB hit immediately after cache invalidation
The membership cache entry is deleted at line 2536, and then get_team_membership is called at line 2548. Because the cache was just invalidated, this call always hits the DB — the cache miss is 100% guaranteed on this code path.
This creates an unconditional extra DB round-trip on every /team/member_update call where data.models is None, even when only role/tpm_limit/rpm_limit were updated and the model list didn't change. The fresh read is only used to populate stored_models for the members_with_roles JSON update (line 2556), which only runs when data.role is not None or data.models is not None.
Consider scoping the DB re-fetch to only the cases where it is actually needed (i.e., inside the if data.role is not None or data.models is not None: block), and only invalidating the cache when a models write actually occurred. This avoids the unnecessary round-trip for budget-only or tpm/rpm-only updates.
| def get_effective_team_models( | ||
| team_object: Optional[LiteLLM_TeamTable], | ||
| valid_token: Optional[UserAPIKeyAuth] = None, | ||
| ) -> List[str]: | ||
| """ | ||
| Returns the effective list of models for a team member. | ||
| The union of: | ||
| - team_object.default_models (OR valid_token.team_default_models if available) | ||
| - team_membership.models (OR valid_token.team_member_models if available) | ||
|
|
||
| Capped by team_object.models. Falls back to team_object.models when empty. | ||
| """ | ||
| if not ( | ||
| litellm.team_model_overrides_enabled | ||
| or os.getenv("TEAM_MODEL_OVERRIDES", "").lower() == "true" | ||
| ): | ||
| return team_object.models if team_object else [] | ||
|
|
||
| # Get from team defaults — prefer team_object (authoritative, fresh from DB/cache) | ||
| # over valid_token (snapshot from key creation time, may be stale). | ||
| # Use `is not None` instead of truthiness so that an explicit empty list [] | ||
| # (meaning "no defaults") is not confused with "field missing". | ||
| team_defaults: List[str] = [] | ||
| if team_object and team_object.default_models is not None: | ||
| team_defaults = team_object.default_models | ||
| elif valid_token and valid_token.team_default_models is not None: | ||
| team_defaults = valid_token.team_default_models | ||
|
|
||
| # Get from member specific overrides | ||
| member_models: List[str] = [] | ||
| if valid_token and valid_token.team_member_models is not None: | ||
| member_models = valid_token.team_member_models | ||
|
|
||
| team_pool = team_object.models if team_object else [] | ||
|
|
||
| return compute_effective_models(team_defaults, member_models, team_pool) |
There was a problem hiding this comment.
Service keys (no
user_id) silently restricted by default_models at runtime
get_effective_team_models has no exemption for service/bot keys (team keys without a user_id). When team_model_overrides_enabled=True and a team has default_models configured (e.g. ["gpt-4o-mini"]), the auth path computes:
team_defaults = team_object.default_models→["gpt-4o-mini"]member_models = []— because the SQL JOIN ontm.user_id = v.user_iddoesn't match for keys wherev.user_id IS NULL, sovalid_token.team_member_models = Noneeffective = compute_effective_models(["gpt-4o-mini"], [], full_team_pool) = ["gpt-4o-mini"]
This means service keys are silently restricted to default_models only, even though they were intended to have full team.models access. The key-generation path correctly exempts service keys (and data.user_id guard), but the runtime auth path in can_team_access_model calls get_effective_team_models(team_object, valid_token) unconditionally.
Contrast with the comment at the key-generation call-site:
service/bot keys (no user_id) use the full team.models pool, preserving pre-feature behavior
The same logic needs to be applied in get_effective_team_models. When valid_token has no user_id and no membership row, fall back to team_pool:
# In get_effective_team_models, after the feature-flag check:
# Service keys (no user_id) have no membership row, so skip per-member logic
# and use team_pool directly for backward compatibility.
if valid_token is not None and valid_token.user_id is None:
return team_object.models if team_object else []This is a backward-incompatible regression (per repo rules) that activates the moment an admin enables the feature flag and sets default_models on any team that also uses service keys.
Rule Used: What: avoid backwards-incompatible changes without... (source)
| -H 'Authorization: Bearer <your-master-key>' \ | ||
| -H 'Content-Type: application/json' \ | ||
| -d '{ | ||
| "team_alias": "engineering", | ||
| "models": ["gpt-4", "gpt-4o-mini", "gpt-4o"], | ||
| "default_models": ["gpt-4o-mini"] | ||
| }' | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Docs omit service-key behaviour and capping semantics
The docs state:
A member's effective models =
default_models∪member.models. If neither is set, falls back toteam.models.
Two behaviours are not documented:
- Capping: the union is further intersected with
team.models(the team pool). Ifdefault_models = ["gpt-4"]butteam.models = ["gpt-4o-mini"], the effective set is[]→ falls back to["gpt-4o-mini"]. - Service/bot keys (team-level keys with no
user_id): current code restricts them todefault_modelsat runtime (see inline comment onget_effective_team_models). The docs claim "Zero extra database queries on the auth hot path" and "full backward compatibility", which is only accurate for keys that are user-scoped. Service key users enabling this feature will experience a silent regression (see related comment onauth_checks.py).
|
|
||
| updated_kv = data.json(exclude_unset=True) | ||
|
|
||
| # When team.models is being changed, prune existing default_models | ||
| # to prevent stale over-permissive defaults (privilege escalation). | ||
| # Must inject into updated_kv directly (not data) because | ||
| # data.json(exclude_unset=True) skips fields not set during __init__. | ||
| if "models" in updated_kv and "default_models" not in updated_kv: | ||
| existing_defaults = existing_team_row.default_models or [] | ||
| if existing_defaults: | ||
| new_models = updated_kv["models"] or [] | ||
| if not new_models: | ||
| # team.models=[] means "allow all" — clear default_models | ||
| # so get_effective_team_models falls back to [] (allow all) | ||
| updated_kv["default_models"] = [] | ||
| else: | ||
| pruned = [m for m in existing_defaults if m in set(new_models)] | ||
| if pruned != existing_defaults: | ||
| updated_kv["default_models"] = pruned |
There was a problem hiding this comment.
Auto-pruning always writes to DB even when no models were removed
The current pruning guard is:
if pruned != existing_defaults:
updated_kv["default_models"] = prunedThis correctly skips the write when models are only added to team.models (i.e., nothing was pruned from default_models). However, when new_models is empty (team.models=[] → allow-all), default_models is unconditionally set to []:
if not new_models:
updated_kv["default_models"] = [] # Always writes, even if existing_defaults was already []If existing_defaults is already [], this writes an unnecessary default_models=[] into the update dict, causing a no-op DB write on every team.models=[] update. While harmless, add the same short-circuit:
| updated_kv = data.json(exclude_unset=True) | |
| # When team.models is being changed, prune existing default_models | |
| # to prevent stale over-permissive defaults (privilege escalation). | |
| # Must inject into updated_kv directly (not data) because | |
| # data.json(exclude_unset=True) skips fields not set during __init__. | |
| if "models" in updated_kv and "default_models" not in updated_kv: | |
| existing_defaults = existing_team_row.default_models or [] | |
| if existing_defaults: | |
| new_models = updated_kv["models"] or [] | |
| if not new_models: | |
| # team.models=[] means "allow all" — clear default_models | |
| # so get_effective_team_models falls back to [] (allow all) | |
| updated_kv["default_models"] = [] | |
| else: | |
| pruned = [m for m in existing_defaults if m in set(new_models)] | |
| if pruned != existing_defaults: | |
| updated_kv["default_models"] = pruned | |
| if not new_models: | |
| # team.models=[] means "allow all" — clear default_models | |
| # so get_effective_team_models falls back to [] (allow all) | |
| if existing_defaults: | |
| updated_kv["default_models"] = [] |
| team_members.append( | ||
| Member( | ||
| user_id=member.user_id, | ||
| role=data.role, | ||
| role=data.role or member.role, | ||
| user_email=data.user_email or member.user_email, | ||
| models=stored_models, | ||
| tpm_limit=data.tpm_limit if data.tpm_limit is not None else getattr(member, "tpm_limit", None), | ||
| rpm_limit=data.rpm_limit if data.rpm_limit is not None else getattr(member, "rpm_limit", None), | ||
| ) |
There was a problem hiding this comment.
tpm_limit/rpm_limit written to members_with_roles from stale JSON source
getattr(member, "tpm_limit", None) reads from the Member object deserialized from the existing members_with_roles JSON blob. Since Member.tpm_limit and Member.rpm_limit are new fields added by this PR, all existing members_with_roles entries were serialized without them. On deserialization, member.tpm_limit will be None (Pydantic default) even if the budget table has a real value.
The consequence: the first models-only (or role-only) update after deployment will write tpm_limit: null and rpm_limit: null back into members_with_roles, overriding any budget values that had been set previously. Consumers of /team/info (e.g. dashboards) will then show incorrect limits.
The authoritative source for tpm_limit/rpm_limit is LiteLLM_BudgetTable, so the fix is to read from the DB (via the already-fetched _tm_row or budget join) rather than getattr(member, ...):
# Read tpm/rpm from the budget row (authoritative) if available,
# falling back to the request value and then to None.
_membership_tpm = None
_membership_rpm = None
if data.models is None and _tm_row is not None and _tm_row.litellm_budget_table is not None:
_membership_tpm = _tm_row.litellm_budget_table.tpm_limit
_membership_rpm = _tm_row.litellm_budget_table.rpm_limit
team_members.append(
Member(
...
tpm_limit=data.tpm_limit if data.tpm_limit is not None else _membership_tpm,
rpm_limit=data.rpm_limit if data.rpm_limit is not None else _membership_rpm,
)
)| if effective_models: | ||
| # if 'all-team-models' was requested, restrict it to the effective models | ||
| if "all-team-models" in (data.models or []): | ||
| data_json["models"] = effective_models | ||
| # if explicit models were requested, validate they're a subset of effective set | ||
| elif data.models: | ||
| disallowed = set(data.models) - set(effective_models) | ||
| if disallowed: | ||
| raise HTTPException( | ||
| status_code=403, | ||
| detail={ | ||
| "error": f"Requested models not in user's effective team models. " | ||
| f"Disallowed: {sorted(disallowed)}. " | ||
| f"Effective models: {sorted(effective_models)}" | ||
| }, | ||
| ) | ||
| # if NO models was requested, runtime auth will compute effective models | ||
| # from the SQL view join (tm.models + t.default_models), so nothing to store here | ||
|
|
There was a problem hiding this comment.
all-team-models and explicit models silently pass through when effective_models=[]
When team.models=[] (allow-all) and default_models=[] and member_models=[]:
compute_effective_models([], [], []) → [] (falls back to team_pool=[])
effective_models = []The if effective_models: guard is then False, so the entire validation block is skipped. For "all-team-models" requests, data_json["models"] remains ["all-team-models"] and the runtime resolves it to the full team pool — this is the correct allow-all behavior.
However, the if effective_models: check also silently skips the case where effective_models=[] because all member overrides/defaults were stale and were capped out — for example:
team.models=["gpt-4"],default_models=["stale-model"],member_models=[]compute_effective_models(["stale-model"], [], ["gpt-4"])caps to[]→ falls back to["gpt-4"]- Returns
["gpt-4"](non-empty), so this case is actually caught.
The only genuinely unchecked case is the team_pool=[] (allow-all) path, which is intentionally permit-all. The current check is correct but consider adding an inline comment to document why effective_models=[] is safely skipped:
| if effective_models: | |
| # if 'all-team-models' was requested, restrict it to the effective models | |
| if "all-team-models" in (data.models or []): | |
| data_json["models"] = effective_models | |
| # if explicit models were requested, validate they're a subset of effective set | |
| elif data.models: | |
| disallowed = set(data.models) - set(effective_models) | |
| if disallowed: | |
| raise HTTPException( | |
| status_code=403, | |
| detail={ | |
| "error": f"Requested models not in user's effective team models. " | |
| f"Disallowed: {sorted(disallowed)}. " | |
| f"Effective models: {sorted(effective_models)}" | |
| }, | |
| ) | |
| # if NO models was requested, runtime auth will compute effective models | |
| # from the SQL view join (tm.models + t.default_models), so nothing to store here | |
| if effective_models: | |
| # effective_models=[] means the team allows all models (team.models=[]), | |
| # so no restriction to enforce at key-gen time; runtime auth handles it. | |
| # if 'all-team-models' was requested, restrict it to the effective models | |
| if "all-team-models" in (data.models or []): |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| else: | ||
| # Models were explicitly changed — invalidate cache and fetch fresh membership | ||
| from litellm.proxy.auth.auth_checks import get_team_membership | ||
| from litellm.proxy.proxy_server import user_api_key_cache | ||
|
|
||
| _cache_key = f"team_membership:{received_user_id}:{data.team_id}" | ||
| await user_api_key_cache.async_delete_cache(key=_cache_key) | ||
| _tm_row = await get_team_membership( | ||
| user_id=received_user_id, | ||
| team_id=data.team_id, | ||
| prisma_client=prisma_client, | ||
| user_api_key_cache=user_api_key_cache, | ||
| ) |
There was a problem hiding this comment.
Duplicate imports and unconditional DB read in
else branch
Both the if data.models is None and else branches contain identical from litellm.proxy.auth.auth_checks import get_team_membership / from litellm.proxy.proxy_server import user_api_key_cache import statements, as well as identical cache-invalidation and get_team_membership calls. These could be hoisted outside the if/else to avoid repetition.
Additionally, when all three of data.models, data.tpm_limit, and data.rpm_limit are explicitly provided by the caller, _tm_row is fetched in the else branch solely to populate _budget for resolved_tpm/resolved_rpm. Since resolved_tpm = data.tpm_limit and resolved_rpm = data.rpm_limit in that scenario, the budget lookup from _tm_row is never used — making the DB round-trip entirely unnecessary for that request path.
|
@greptileai All inline comments reference code from previous commits that has been fixed. The line numbers
Can you re-review against the latest commit and clear stale inline comments? |
| -H 'Authorization: Bearer <your-master-key>' \ | ||
| -H 'Content-Type: application/json' \ | ||
| -d '{ | ||
| "team_alias": "engineering", | ||
| "models": ["gpt-4", "gpt-4o-mini", "gpt-4o"], | ||
| "default_models": ["gpt-4o-mini"] | ||
| }' |
There was a problem hiding this comment.
access_group_ids silently bypasses per-member model restrictions
The docs describe the feature as providing fine-grained per-member model access control. However, can_team_access_model in auth_checks.py falls back to the team's access_group_ids after the per-member effective-models check fails — without any restriction from the user's effective_models set. The code comment acknowledges this:
access groups are a team-level concept and are NOT restricted by per-member model overrides. If a team has
access_group_idsconfigured, any member can access models from those groups regardless of their effective_models set.
This means:
- An admin grants User A only
["gpt-4o-mini"]via per-member overrides - The team also has
access_group_ids = ["premium-models"]which includes["gpt-4", "claude-3"] - User A can access
gpt-4andclaude-3through the access-group bypass even though they're not in their effective model set
The docs should explicitly call this out so admins understand that access_group_ids is an additive team-wide grant that overrides per-member restrictions. Without this note, an admin who configures per-member restrictions for compliance or cost-control may unknowingly leave a bypass path open.
|
|
||
| # Check if trying to set a budget for team member | ||
|
|
||
| if max_budget_in_team is not None: | ||
| if ( | ||
| max_budget_in_team is not None | ||
| or new_member.tpm_limit is not None | ||
| or new_member.rpm_limit is not None | ||
| ): | ||
| # create a new budget item for this member | ||
| _budget_create_data = { | ||
| "created_by": user_api_key_dict.user_id or litellm_proxy_admin_name, | ||
| "updated_by": user_api_key_dict.user_id or litellm_proxy_admin_name, | ||
| } | ||
| if max_budget_in_team is not None: | ||
| _budget_create_data["max_budget"] = max_budget_in_team | ||
| if new_member.tpm_limit is not None: | ||
| _budget_create_data["tpm_limit"] = new_member.tpm_limit | ||
| if new_member.rpm_limit is not None: | ||
| _budget_create_data["rpm_limit"] = new_member.rpm_limit | ||
|
|
||
| response = await prisma_client.db.litellm_budgettable.create( | ||
| data={ | ||
| "max_budget": max_budget_in_team, | ||
| "created_by": user_api_key_dict.user_id or litellm_proxy_admin_name, | ||
| "updated_by": user_api_key_dict.user_id or litellm_proxy_admin_name, | ||
| } | ||
| data=_budget_create_data # type: ignore | ||
| ) | ||
|
|
||
| _budget_id = response.budget_id | ||
| else: | ||
| _budget_id = default_team_budget_id |
There was a problem hiding this comment.
New budget created on every
tpm_limit/rpm_limit update, leaving old rows orphaned
The PR extends budget creation to cover tpm_limit and rpm_limit updates (not just max_budget), but it always creates a new LiteLLM_BudgetTable row and links it, without deleting the previous budget. The existing_budget_id parameter is accepted but is only used in the early-return disconnect path — it is never used to update the existing budget in-place.
This means every /team/member_update call that touches tpm_limit or rpm_limit orphans the previous budget row. In teams where member limits are adjusted frequently, this silently accumulates unreferenced rows in LiteLLM_BudgetTable.
This was already the pre-existing behavior for max_budget, but the PR widens the impact to two more fields. Consider either:
- Updating the existing budget via
litellm_budgettable.update(where={"budget_id": existing_budget_id}, ...)whenexisting_budget_idis set, or - Deleting the old budget row after unlinking it, to prevent unbounded table growth.
|
@greptileai How about you give impenation plan to fix all these iusses right wya |
Implementation Plan to Fix All IssuesPhase 1: Critical (P0) - Deploy Immediately 🚨1. Migration Safety Issue (
Phase 2: High Priority (P1) - Critical Functionality2. Service Key Restriction Regression (
3. Cache Invalidation (
4. Async/Sync Blocking (
5. Stale Override Escalation (
6. Direct DB Query (
7. Stale
8.
9. Auto-Pruning No-Op (
10. Access Groups Bypass Docs (
11. Service Key
Phase 3: Optimizations (P2)12. Double Env-Var Checks (
13. Non-Deterministic Order (
14. Duplicate Tests (
15. Orphaned Budget Rows (
16. Duplicate Imports (
17. Documentation Gaps (
18. Unnecessary DB Calls (
Execution Order:
Total Time Estimate: ~6-8 hours focused work + testing = 1 day |
feat: team-scoped default models + per-user model overrides
Summary
default_modelsfield to teams andmodelsfield to team memberships, enabling fine-grained per-member model access control within a single team.union(team.default_models, membership.models), falling back toteam.modelswhen empty for full backward compatibility.TEAM_MODEL_OVERRIDES=trueenv var /litellm.team_model_overrides_enabledflag.default_modelsmust be a subset ofteam.models; member overrides must be a subset ofteam.models; key-gen models must be a subset of the effective set.default_modelswhenteam.modelsis updated.Verified by hitting curl commands