Skip to content

feat: team override model#24330

Open
Harshit28j wants to merge 9 commits intoBerriAI:mainfrom
Harshit28j:litellm_feat_team-model-override
Open

feat: team override model#24330
Harshit28j wants to merge 9 commits intoBerriAI:mainfrom
Harshit28j:litellm_feat_team-model-override

Conversation

@Harshit28j
Copy link
Collaborator

@Harshit28j Harshit28j commented Mar 21, 2026

feat: team-scoped default models + per-user model overrides

Summary

  • Adds default_models field to teams and models field to team memberships, enabling fine-grained per-member model access control within a single team.
  • Effective models for a member = union(team.default_models, membership.models), falling back to team.models when empty for full backward compatibility.
  • Gated behind TEAM_MODEL_OVERRIDES=true env var / litellm.team_model_overrides_enabled flag.
  • Enforces subset validations at every write path: default_models must be a subset of team.models; member overrides must be a subset of team.models; key-gen models must be a subset of the effective set.
  • Auto-prunes stale default_models when team.models is updated.

Verified by hitting curl commands

harshitjain@Harshits-MacBook-Air  % python3 verify_team_model_overrides.py

============================================================
  PREFLIGHT
============================================================

  Proxy is healthy.


============================================================
  SETUP
============================================================

  Creating team with models=[gpt-4, gpt-5-mini, gpt-4o], default_models=[gpt-5-mini]
  Team created: 58a47361-b14d-4285-9b34-930e56ac4d3c
    models:         ['gpt-4', 'gpt-5-mini', 'gpt-4o']
    default_models: ['gpt-5-mini']

  Creating users...
  Created: verify-user-A, verify-user-B

  Adding User A to team with models=[gpt-4o] override
    Membership models: ['gpt-4o']
  Adding User B to team with no override

  Generating keys...
    Key A: sk-loZJGJDE7ZR9ySOk2ZiZMw
    Key B: sk-ZUmwubtilnSiPL9zBjLsgg

  Expected effective models:
    User A: default_models(gpt-5-mini) + override(gpt-4o) = {gpt-5-mini, gpt-4o}
    User B: default_models(gpt-5-mini) only               = {gpt-5-mini}


============================================================
  CORE ACCESS TESTS
============================================================

  [PASS] User A -> gpt-5-mini (default model) -> HTTP 200
  [PASS] User A -> gpt-4o (override model) -> HTTP 200
  [PASS] User A -> gpt-4 (NOT in effective set) -> HTTP 401  |  team not allowed to access model. This team can only access models=['gpt-5-mini'
  [PASS] User B -> gpt-5-mini (default model) -> HTTP 200
  [PASS] User B -> gpt-4o (NOT in B's effective set) -> HTTP 401  |  team not allowed to access model. This team can only access models=['gpt-5-mini'
  [PASS] User B -> gpt-4 (NOT in B's effective set) -> HTTP 401  |  team not allowed to access model. This team can only access models=['gpt-5-mini'

============================================================
  CROSS-USER ISOLATION
============================================================

  [PASS] User A can access gpt-4o (their override) -> HTTP 200
  [PASS] User B cannot access gpt-4o (A's override doesn't leak) -> HTTP 401

============================================================
  KEY CREATION VALIDATION
============================================================

  [PASS] Key with gpt-4 (outside effective set) -> HTTP 403  |  {'error': "Requested models not in user's effective team models. Disallowed: ['gpt-4']. Effective mo
  [PASS] Key with gpt-5-mini (valid subset) -> HTTP 200
  [PASS] all-team-models resolved to ['gpt-4o', 'gpt-5-mini'] (expected ['gpt-4o', 'gpt-5-mini'])

============================================================
  MEMBER MODEL VALIDATION
============================================================

  [PASS] Member override with model not in team.models -> HTTP 400  |  Member model overrides must be a subset of team models. Disallowed: ['claude-sonnet-4']. Team models
  [PASS] Member override with valid model (gpt-4) -> HTTP 200

============================================================
  TEAM-LEVEL VALIDATION
============================================================

  [PASS] default_models not subset of team.models in /team/new -> HTTP 400
  [PASS] default_models valid subset in /team/new -> HTTP 200
  [PASS] default_models not subset in /team/update -> HTTP 400

============================================================
  AUTO-PRUNING
============================================================

  Created prune-test team: b6b4cc97-d7e2-4a51-994c-e715577fdc12
    models:         ['gpt-4', 'gpt-5-mini', 'gpt-4o']
    default_models: ['gpt-5-mini', 'gpt-4o']
  [PASS] After narrowing to [gpt-4], default_models pruned to []

============================================================
  REVOCATION
============================================================

  Removing User A's override (models=[])...
  [PASS] Remove override via models=[] -> HTTP 200
  [PASS] Response models: [] (override cleared)
  New key for User A: sk-s08M7dFaERwPp-HB-ZcBSA
  [PASS] After revocation, User A -> gpt-4o (was override) -> HTTP 401  |  team not allowed to access model. This team can only access models=['gpt-5-mini'
  [PASS] After revocation, User A -> gpt-5-mini (still a default) -> HTTP 200

============================================================
  ROLE-ONLY UPDATE PRESERVES MODELS
============================================================

  Set User A override to [gpt-4o]
  [PASS] After role-only update, response models: ['gpt-4o'] (expected ['gpt-4o'])
  [PASS] After role-only update, gpt-4o still accessible -> HTTP 200

============================================================
  BACKWARD COMPATIBILITY (feature flag behavior)
============================================================

  Note: Cannot toggle feature flag at runtime via API.
  This is tested in unit tests (test_team_model_overrides.py).
  When flag is OFF: get_effective_team_models returns team.models unchanged.

============================================================
  RESULTS
============================================================

  24 passed, 0 failed out of 24 checks

  ALL TESTS PASSED!

@vercel
Copy link

vercel bot commented Mar 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 22, 2026 2:01am

Request Review

@Harshit28j Harshit28j marked this pull request as ready for review March 21, 2026 22:19
@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 21, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing Harshit28j:litellm_feat_team-model-override (a2ecbd9) with main (f5194b5)

Open in CodSpeed

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This PR adds team-scoped default models (default_models on LiteLLM_TeamTable) and per-member model overrides (models on LiteLLM_TeamMembership), gated behind a TEAM_MODEL_OVERRIDES feature flag. Effective model sets are computed as union(default_models, member.models) capped by team.models, with auto-pruning, subset validation at every write path, and cache invalidation after membership changes.

Key changes:

  • compute_effective_models / get_effective_team_models added to auth hot path in auth_checks.py; can_team_access_model now receives the per-member effective set
  • Raw SQL hot-path query extended with t.default_models and tm.models columns — requires migration to be applied before code deployment or every auth request will fail
  • _upsert_budget_and_membership refactored to support models-only updates with correct None vs [] semantics
  • TeamMemberUpdateResponse correctly returns resolved models but still returns raw request values for tpm_limit/rpm_limit, creating an inconsistency
  • access_group_ids on a team silently bypasses per-member effective model restrictions (documented in code comment, but not in user-facing docs)
  • Every tpm_limit/rpm_limit update creates a new LiteLLM_BudgetTable row rather than updating the existing one, orphaning previous budget rows — this extends a pre-existing pattern to two new fields
  • Comprehensive mock-only test coverage added in tests/test_litellm/proxy/auth/test_team_model_overrides.py

Confidence Score: 2/5

  • Not safe to merge without confirming migration ordering — the raw SQL change will hard-crash every authenticated request if code ships before the DB migration runs.
  • Several critical concerns were identified in prior review threads (raw SQL column guards, event-loop blocking sync cache delete, service-key silent restriction, stale member override privilege escalation, cache not invalidated after write). This round adds the access_group_ids bypass being undocumented, the tpm_limit/rpm_limit response inconsistency, and the orphaned budget row growth. The raw SQL risk is the most immediate hard-blocker for deployment safety.
  • litellm/proxy/utils.py (raw SQL with new columns — migration ordering hard-blocker), litellm/proxy/auth/auth_checks.py (hot-path env-var read, access_group_ids bypass), litellm/proxy/management_endpoints/team_endpoints.py (response inconsistency, guaranteed DB hit after cache eviction)

Important Files Changed

Filename Overview
litellm/proxy/auth/auth_checks.py Adds compute_effective_models and get_effective_team_models to the hot auth path; can_team_access_model now computes per-member effective models, but the access_group_ids fallback bypasses those restrictions silently and redundant os.getenv() call occurs on every auth check when flag is false.
litellm/proxy/management_endpoints/team_endpoints.py Adds default_models subset validation, auto-pruning on team update, and member model override validation; team_member_update now resolves stored_models but returns raw tpm_limit/rpm_limit in response inconsistently; guaranteed DB hit after cache invalidation in several paths.
litellm/proxy/management_endpoints/key_management_endpoints.py Enforces effective-model subset validation at key-generation time using the existing get_team_membership helper; all-team-models correctly resolves to effective set; double env-var check and service-key all-team-models scope are pre-existing concerns noted in prior threads.
litellm/proxy/management_endpoints/common_utils.py Extends _upsert_budget_and_membership to accept a models parameter; early-return correctly guards against no-op calls; the new upsert correctly handles models-only, budget-only, and combined updates; each budget update still creates a new row rather than updating existing.
litellm/proxy/utils.py Adds t.default_models and tm.models to the raw hot-path SQL join; these columns don't exist pre-migration and will cause the proxy to hard-fail on every authenticated request until the migration is applied — no column-guard or graceful fallback.
litellm/proxy/management_helpers/utils.py Extends add_new_member to persist member model overrides and validate them against team_models; every budget update creates a new orphaned row rather than updating existing; cache invalidation is correctly added after membership creation.
litellm/proxy/_types.py Adds models, tpm_limit, rpm_limit to Member; default_models to TeamBase and UpdateTeamRequest; team_member_models and team_default_models to LiteLLM_VerificationTokenView; models to LiteLLM_TeamMembership and TeamMemberUpdateRequest/Response; all cleanly typed with Optional/List defaults.
litellm/init.py Adds team_model_overrides_enabled module-level flag initialised from env var at import time; straightforward and correct.
litellm/proxy/schema.prisma Adds default_models String[] @default([]) to LiteLLM_TeamTable and models String[] @default([]) to LiteLLM_TeamMembership; schema changes are consistent across all three schema.prisma files.
tests/test_litellm/proxy/auth/test_team_model_overrides.py New test file with comprehensive unit and integration coverage of effective-model computation, access denial, flag-off backward compat, cross-user isolation, stale-override fallback, and service-key behavior; all tests are pure mocks (no network calls), satisfying the no-network-call rule.
tests/test_litellm/proxy/common_utils/test_upsert_budget_membership.py Updated existing test to reflect new no-op early-return behavior (now correct: only disconnects when budget exists); adds new tests for models-only, models+budget, and models=[] clearing paths; all mock-only.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Request / Auth Check] --> B{team_model_overrides_enabled?}
    B -- No --> C[Use team.models as-is]
    B -- Yes --> D{valid_token.user_id set?}
    D -- No service key --> C
    D -- Yes --> E[get_effective_team_models]

    E --> F[team_defaults = team_object.default_models]
    E --> G[member_models = valid_token.team_member_models from SQL JOIN]
    F & G --> H[compute_effective_models]

    H --> I{effective empty?}
    I -- Yes --> J[fallback to team_pool]
    I -- No --> K{team_pool empty?}
    K -- Yes allow-all --> L[return effective as-is]
    K -- No --> M{effective ∩ team_pool empty?}
    M -- Yes stale overrides --> J
    M -- No --> N[return capped effective]

    J & L & N --> O[_can_object_call_model with effective_models]
    O -- Denied --> P{team has access_group_ids?}
    P -- No --> Q[Raise ProxyException 401]
    P -- Yes --> R[_get_models_from_access_groups bypass]
    R --> S{model in group?}
    S -- No --> Q
    S -- Yes --> T[Allow - bypasses per-member restrictions]
    O -- Allowed --> T
Loading

Comments Outside Diff (1)

  1. litellm/proxy/management_endpoints/team_endpoints.py, line 2610-2618 (link)

    P2 tpm_limit/rpm_limit in response inconsistent with models

    The response for models was correctly updated to use stored_models (the authoritative resolved value), but tpm_limit and rpm_limit still return the raw request values (data.tpm_limit, data.rpm_limit), which may be None when only models were updated.

    This creates a misleading response: if a caller updates only models, the response shows tpm_limit: null even if the member has an existing limit of, say, 500. Callers that inspect the response to build a reconciliation view (e.g., dashboards, audit logging) will see incorrect zero/null limits.

    resolved_tpm and resolved_rpm are already computed above for writing into members_with_roles — the response should use them too:

Last reviewed commit: "fix: req changes by ..."

Comment on lines +651 to +660
_membership = await prisma_client.db.litellm_teammembership.find_unique(
where={
"user_id_team_id": {
"user_id": data.user_id,
"team_id": team_table.team_id,
}
}
)
if _membership is not None:
member_models = _membership.models or []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Direct DB query bypasses established helper-function pattern

A raw prisma_client.db.litellm_teammembership.find_unique(...) call is made here to resolve member models at key-generation time. The codebase convention (enforced in the db-rules) is that all DB reads must go through the existing get_team/get_user/get_key helper functions rather than direct Prisma calls.

Even though the code comment notes this is a management endpoint (not the hot auth path), making a direct DB read here:

  1. Bypasses any caching that the helper functions may apply.
  2. Creates an inconsistent access pattern that can be harder to audit for regressions.
  3. Adds an extra round-trip that isn't guarded by retry/error-handling wrappers used in the helpers.

Consider wrapping this lookup in a get_team_membership (or equivalent) helper, similar to how get_team_object is used elsewhere in this file.

Rule Used: What: In critical path of request, there should be... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +2728 to +2731
if not (
litellm.team_model_overrides_enabled
or os.getenv("TEAM_MODEL_OVERRIDES", "").lower() == "true"
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 os.getenv() called on every auth check when feature flag is False

litellm.team_model_overrides_enabled is already initialised at module load time from the same environment variable. When the flag is False, the short-circuit or still falls through and calls os.getenv("TEAM_MODEL_OVERRIDES", "") on every request that reaches can_team_access_model.

This is the hot auth path — even a cheap os.getenv call multiplied across thousands of requests per second adds up. Consider reading the env-var only once and storing it on litellm.team_model_overrides_enabled (which already exists), then checking only litellm.team_model_overrides_enabled here:

Suggested change
if not (
litellm.team_model_overrides_enabled
or os.getenv("TEAM_MODEL_OVERRIDES", "").lower() == "true"
):
if not litellm.team_model_overrides_enabled:

If dynamic env-var toggling at runtime is intentionally supported, document it explicitly so the cost is clearly understood.

Comment on lines +641 to +643
if (
litellm.team_model_overrides_enabled
or os.getenv("TEAM_MODEL_OVERRIDES", "").lower() == "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Same double env-var check also present in key generation

This mirrors the same pattern in get_effective_team_models: litellm.team_model_overrides_enabled is checked first, but if False the code still reads os.getenv("TEAM_MODEL_OVERRIDES", "") on every /key/generate call. Consolidating the flag check to litellm.team_model_overrides_enabled (updated at import time and settable programmatically) keeps the logic consistent with auth_checks.py and avoids the redundant env read.

- If cap empties the list (all stale), falls back to team_pool (NOT [] which = allow-all).
- team_pool=[] means "allow all" — cap is skipped.
"""
effective = list(set(team_defaults + member_models))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Non-deterministic model order from set()

list(set(team_defaults + member_models)) does not preserve insertion order. Across different Python invocations or even across calls in the same process, set iteration order is not guaranteed. This means two calls with identical inputs may produce differently-ordered model lists, so keys generated for the same effective model set can look different when inspected via the API or in the DB.

Consider using dict.fromkeys() which preserves first-occurrence order and de-duplicates:

Suggested change
effective = list(set(team_defaults + member_models))
effective = list(dict.fromkeys(team_defaults + member_models))

Comment on lines +1 to +10
import sys
import os
import pytest

# Add the parent directory to the system path to import litellm
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../..")))

import litellm
from litellm.proxy._types import UserAPIKeyAuth, LiteLLM_TeamTable
from litellm.proxy.auth.auth_checks import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicate test coverage across two files

tests/proxy_unit_tests/test_team_model_overrides.py (2 tests) and tests/test_litellm/proxy/auth/test_team_model_overrides.py (12+ tests) cover largely the same logic. The proxy_unit_tests file is a strict subset — every scenario it covers is also tested by the more comprehensive test_litellm file. Having duplicate, partially-overlapping test files causes maintenance overhead and makes it unclear which is canonical.

Consider removing this file and relying solely on tests/test_litellm/proxy/auth/test_team_model_overrides.py for coverage.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@@ -2435,18 +2525,39 @@ async def team_member_update(
user_api_key_dict=user_api_key_dict,
tpm_limit=data.tpm_limit,
rpm_limit=data.rpm_limit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 get_team_membership cache not invalidated after model write

_upsert_budget_and_membership writes the new models value into LiteLLM_TeamMembership via an upsert, but the get_team_membership helper in auth_checks.py caches results in user_api_key_cache and this cache entry is never invalidated after the membership update.

Any /key/generate call that fires for the same user before the cache TTL expires will invoke get_team_membership, receive the old membership row from cache, and embed the stale effective model list into the newly-generated key (see key_management_endpoints.py lines 652–666). The key's model scope will not reflect the update until the cache naturally expires.

Consider adding an explicit cache eviction for the membership entry immediately after the transaction commits — auth_checks.get_team_membership uses a well-known cache key format, so it can be targeted for deletion. Alternatively, expose a helper function alongside get_team_membership that clears its cache entry and call it from team_member_update after the write.

Comment on lines +2543 to +2554
stored_models = data.models
if stored_models is None:
from litellm.proxy.auth.auth_checks import get_team_membership
from litellm.proxy.proxy_server import user_api_key_cache

_tm_row = await get_team_membership(
user_id=received_user_id,
team_id=data.team_id,
prisma_client=prisma_client,
user_api_key_cache=user_api_key_cache,
)
stored_models = (_tm_row.models or []) if _tm_row is not None else []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Extra DB round-trip when data.models not updated

When a caller updates only role, max_budget_in_team, tpm_limit, or rpm_limit — without touching modelsstored_models is None and the code makes a fresh get_team_membership DB call (bypassing cache since it was just invalidated) purely to populate the models field in the response and to conditionally sync members_with_roles.

However, members_with_roles is only written when data.role is not None or data.models is not None (line 2556). So when neither is set, this extra DB call feeds only the models field of the return value. This is unnecessary overhead on a management endpoint for every budget/tpm/rpm-only update.

Consider returning stored_models = None (or omitting it) when models weren't part of the update, or lazily fetching only when data.role is not None or data.models is not None.

Comment on lines +2706 to +2713
if not effective:
return team_pool

if team_pool:
effective = [m for m in effective if m in set(team_pool)]
if not effective:
return team_pool

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Stale member overrides silently escalate to full team access

When a member has per-user model overrides set (e.g. ["gpt-4o"]), but ALL of those models are subsequently removed from team.models, compute_effective_models falls back to team_pool — the full team model list. This means the member quietly gains access to every model the team has, which is broader than their originally intended restricted set.

Concretely:

  • team.models = ["gpt-4", "gpt-3.5"]
  • member override: ["gpt-4o"] (now stale — gpt-4o removed from team)
  • effective = ["gpt-4o"] → capped by team_pool → [] → falls back to ["gpt-4", "gpt-3.5"]

The comment justifies this as "NOT [] which = allow-all", but team_pool can itself be a large set. An admin revoking a specific model override may expect the member to lose all team access until explicitly reassigned, not to receive a broad promotion.

If this fallback is intentional, add a clear doc comment explaining the privilege trade-off and consider whether team_pool fallback should be configurable (e.g. a stricter mode that returns [] = deny vs fall-through-to-team-pool).

Comment on lines +2535 to +2536
_cache_key = f"team_membership:{received_user_id}:{data.team_id}"
user_api_key_cache.delete_cache(key=_cache_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Sync delete_cache blocks event loop in async endpoint

user_api_key_cache.delete_cache calls DualCache.delete_cache which (when Redis is configured) invokes the synchronous Redis client (redis_client.delete()). Running a sync network I/O call inside an async FastAPI endpoint blocks the event loop for the duration of the Redis round-trip.

DualCache exposes async_delete_cache which properly awaits the async Redis client. The async variant should be used here instead of the sync one.

Comment on lines +668 to +685
if effective_models:
# if 'all-team-models' was requested, restrict it to the effective models
if "all-team-models" in (data.models or []):
data_json["models"] = effective_models
# if explicit models were requested, validate they're a subset of effective set
elif data.models:
disallowed = set(data.models) - set(effective_models)
if disallowed:
raise HTTPException(
status_code=403,
detail={
"error": f"Requested models not in user's effective team models. "
f"Disallowed: {sorted(disallowed)}. "
f"Effective models: {sorted(effective_models)}"
},
)
# if NO models was requested, runtime auth will compute effective models
# from the SQL view join (tm.models + t.default_models), so nothing to store here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 all-team-models request restricted to team_defaults when user_id is absent

When a caller generates a team-level key without supplying user_id (e.g. a service/bot key), member_models = [] and effective_models reduces to just team_default_models. If an admin has set default_models on the team, a key generated with "all-team-models" is silently stored with only those defaults rather than the full team.models pool.

Before this feature was introduced, "all-team-models" always resolved to the full team model pool. With the feature enabled and default_models configured, the semantics change for service keys.

Consider documenting this explicitly in the code comment and in model_access.md, or — if full team access is desired for keyless (service-account) generation — short-circuiting the restriction when data.user_id is None:

# If no user_id, skip per-user restriction (service key gets full team access)
if data.user_id and prisma_client is not None:
    _membership = await get_team_membership(...)
    ...

Comment on lines +2556 to +2567
if data.role is not None or data.models is not None:
team_members: List[Member] = []
for member in team_table.members_with_roles:
if member.user_id == received_user_id:
team_members.append(
Member(
user_id=member.user_id,
role=data.role,
role=data.role or member.role,
user_email=data.user_email or member.user_email,
models=stored_models,
tpm_limit=data.tpm_limit if data.tpm_limit is not None else getattr(member, "tpm_limit", None),
rpm_limit=data.rpm_limit if data.rpm_limit is not None else getattr(member, "rpm_limit", None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 tpm_limit/rpm_limit in members_with_roles can silently become stale

The PR adds tpm_limit and rpm_limit to the Member type and now serialises them into members_with_roles. However, this JSON blob is only rewritten when data.role is not None or data.models is not None. This means a tpm_limit-only update (data.role=None, data.models=None) never writes to members_with_roles.

The problematic sequence:

  1. /team/member_update with {"user_id": "alice", "tpm_limit": 500} → budget table gets tpm_limit=500; members_with_roles is not touched (condition is false)
  2. /team/member_update with {"user_id": "alice", "models": ["gpt-4o"]} → enters the block; getattr(member, "tpm_limit", None) reads from the still-stale members_with_roles → writes Member(tpm_limit=None) back to JSON

After step 2, members_with_roles reports tpm_limit: null, while the budget table correctly enforces 500. Any downstream consumer of /team/info (e.g., a dashboard) would show wrong rate limits.

Enforcement comes from the budget table so there is no correctness issue, but the display inconsistency could confuse operators.

Consider widening the condition to also trigger when tpm_limit or rpm_limit changes:

if data.role is not None or data.models is not None or data.tpm_limit is not None or data.rpm_limit is not None:

Or, since stored_models is already resolved above regardless of the condition, you could always rewrite members_with_roles when any field changes — the team_table.update call is cheap relative to the surrounding DB work already done.

@Harshit28j
Copy link
Collaborator Author

@greptile can you please review this PR?

Comment on lines 2974 to 2975
m.aliases AS team_model_aliases,
-- Added comma to separate b.* columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Raw SQL query adds columns that don't exist on un-migrated databases

t.default_models AS team_default_models and tm.models AS team_member_models are added directly to the raw SQL join query. This query is executed on every authenticated request (the hot auth path).

If a deployment upgrades the application code but has not yet run the Prisma migration (which adds the default_models and models columns), every single request will fail with a PostgreSQL column "default_models" does not exist error, taking down the proxy entirely.

Unlike Prisma model files, raw SQL strings have no migration-awareness. The migration uses ADD COLUMN IF NOT EXISTS which is safe, but the application SQL does not.

A safeguard pattern used elsewhere in litellm is to guard new column reads behind a feature flag or to catch the PrismaError/psycopg2.errors.UndefinedColumn and fall back to None. At minimum, the migration notes in the PR description should make it explicit that the migration must be applied before deploying this code version.

Comment on lines +2543 to +2554
stored_models = data.models
if stored_models is None:
from litellm.proxy.auth.auth_checks import get_team_membership
from litellm.proxy.proxy_server import user_api_key_cache

_tm_row = await get_team_membership(
user_id=received_user_id,
team_id=data.team_id,
prisma_client=prisma_client,
user_api_key_cache=user_api_key_cache,
)
stored_models = (_tm_row.models or []) if _tm_row is not None else []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Guaranteed DB hit immediately after cache invalidation

The membership cache entry is deleted at line 2536, and then get_team_membership is called at line 2548. Because the cache was just invalidated, this call always hits the DB — the cache miss is 100% guaranteed on this code path.

This creates an unconditional extra DB round-trip on every /team/member_update call where data.models is None, even when only role/tpm_limit/rpm_limit were updated and the model list didn't change. The fresh read is only used to populate stored_models for the members_with_roles JSON update (line 2556), which only runs when data.role is not None or data.models is not None.

Consider scoping the DB re-fetch to only the cases where it is actually needed (i.e., inside the if data.role is not None or data.models is not None: block), and only invalidating the cache when a models write actually occurred. This avoids the unnecessary round-trip for budget-only or tpm/rpm-only updates.

Comment on lines +2721 to +2756
def get_effective_team_models(
team_object: Optional[LiteLLM_TeamTable],
valid_token: Optional[UserAPIKeyAuth] = None,
) -> List[str]:
"""
Returns the effective list of models for a team member.
The union of:
- team_object.default_models (OR valid_token.team_default_models if available)
- team_membership.models (OR valid_token.team_member_models if available)

Capped by team_object.models. Falls back to team_object.models when empty.
"""
if not (
litellm.team_model_overrides_enabled
or os.getenv("TEAM_MODEL_OVERRIDES", "").lower() == "true"
):
return team_object.models if team_object else []

# Get from team defaults — prefer team_object (authoritative, fresh from DB/cache)
# over valid_token (snapshot from key creation time, may be stale).
# Use `is not None` instead of truthiness so that an explicit empty list []
# (meaning "no defaults") is not confused with "field missing".
team_defaults: List[str] = []
if team_object and team_object.default_models is not None:
team_defaults = team_object.default_models
elif valid_token and valid_token.team_default_models is not None:
team_defaults = valid_token.team_default_models

# Get from member specific overrides
member_models: List[str] = []
if valid_token and valid_token.team_member_models is not None:
member_models = valid_token.team_member_models

team_pool = team_object.models if team_object else []

return compute_effective_models(team_defaults, member_models, team_pool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Service keys (no user_id) silently restricted by default_models at runtime

get_effective_team_models has no exemption for service/bot keys (team keys without a user_id). When team_model_overrides_enabled=True and a team has default_models configured (e.g. ["gpt-4o-mini"]), the auth path computes:

  • team_defaults = team_object.default_models["gpt-4o-mini"]
  • member_models = [] — because the SQL JOIN on tm.user_id = v.user_id doesn't match for keys where v.user_id IS NULL, so valid_token.team_member_models = None
  • effective = compute_effective_models(["gpt-4o-mini"], [], full_team_pool) = ["gpt-4o-mini"]

This means service keys are silently restricted to default_models only, even though they were intended to have full team.models access. The key-generation path correctly exempts service keys (and data.user_id guard), but the runtime auth path in can_team_access_model calls get_effective_team_models(team_object, valid_token) unconditionally.

Contrast with the comment at the key-generation call-site:

service/bot keys (no user_id) use the full team.models pool, preserving pre-feature behavior

The same logic needs to be applied in get_effective_team_models. When valid_token has no user_id and no membership row, fall back to team_pool:

# In get_effective_team_models, after the feature-flag check:
# Service keys (no user_id) have no membership row, so skip per-member logic
# and use team_pool directly for backward compatibility.
if valid_token is not None and valid_token.user_id is None:
    return team_object.models if team_object else []

This is a backward-incompatible regression (per repo rules) that activates the moment an admin enables the feature flag and sets default_models on any team that also uses service keys.

Rule Used: What: avoid backwards-incompatible changes without... (source)

Comment on lines +144 to +152
-H 'Authorization: Bearer <your-master-key>' \
-H 'Content-Type: application/json' \
-d '{
"team_alias": "engineering",
"models": ["gpt-4", "gpt-4o-mini", "gpt-4o"],
"default_models": ["gpt-4o-mini"]
}'
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Docs omit service-key behaviour and capping semantics

The docs state:

A member's effective models = default_modelsmember.models. If neither is set, falls back to team.models.

Two behaviours are not documented:

  1. Capping: the union is further intersected with team.models (the team pool). If default_models = ["gpt-4"] but team.models = ["gpt-4o-mini"], the effective set is [] → falls back to ["gpt-4o-mini"].
  2. Service/bot keys (team-level keys with no user_id): current code restricts them to default_models at runtime (see inline comment on get_effective_team_models). The docs claim "Zero extra database queries on the auth hot path" and "full backward compatibility", which is only accurate for keys that are user-scoped. Service key users enabling this feature will experience a silent regression (see related comment on auth_checks.py).

Comment on lines 1529 to +1547

updated_kv = data.json(exclude_unset=True)

# When team.models is being changed, prune existing default_models
# to prevent stale over-permissive defaults (privilege escalation).
# Must inject into updated_kv directly (not data) because
# data.json(exclude_unset=True) skips fields not set during __init__.
if "models" in updated_kv and "default_models" not in updated_kv:
existing_defaults = existing_team_row.default_models or []
if existing_defaults:
new_models = updated_kv["models"] or []
if not new_models:
# team.models=[] means "allow all" — clear default_models
# so get_effective_team_models falls back to [] (allow all)
updated_kv["default_models"] = []
else:
pruned = [m for m in existing_defaults if m in set(new_models)]
if pruned != existing_defaults:
updated_kv["default_models"] = pruned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Auto-pruning always writes to DB even when no models were removed

The current pruning guard is:

if pruned != existing_defaults:
    updated_kv["default_models"] = pruned

This correctly skips the write when models are only added to team.models (i.e., nothing was pruned from default_models). However, when new_models is empty (team.models=[] → allow-all), default_models is unconditionally set to []:

if not new_models:
    updated_kv["default_models"] = []   # Always writes, even if existing_defaults was already []

If existing_defaults is already [], this writes an unnecessary default_models=[] into the update dict, causing a no-op DB write on every team.models=[] update. While harmless, add the same short-circuit:

Suggested change
updated_kv = data.json(exclude_unset=True)
# When team.models is being changed, prune existing default_models
# to prevent stale over-permissive defaults (privilege escalation).
# Must inject into updated_kv directly (not data) because
# data.json(exclude_unset=True) skips fields not set during __init__.
if "models" in updated_kv and "default_models" not in updated_kv:
existing_defaults = existing_team_row.default_models or []
if existing_defaults:
new_models = updated_kv["models"] or []
if not new_models:
# team.models=[] means "allow all" — clear default_models
# so get_effective_team_models falls back to [] (allow all)
updated_kv["default_models"] = []
else:
pruned = [m for m in existing_defaults if m in set(new_models)]
if pruned != existing_defaults:
updated_kv["default_models"] = pruned
if not new_models:
# team.models=[] means "allow all" — clear default_models
# so get_effective_team_models falls back to [] (allow all)
if existing_defaults:
updated_kv["default_models"] = []

Comment on lines 2570 to 2578
team_members.append(
Member(
user_id=member.user_id,
role=data.role,
role=data.role or member.role,
user_email=data.user_email or member.user_email,
models=stored_models,
tpm_limit=data.tpm_limit if data.tpm_limit is not None else getattr(member, "tpm_limit", None),
rpm_limit=data.rpm_limit if data.rpm_limit is not None else getattr(member, "rpm_limit", None),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 tpm_limit/rpm_limit written to members_with_roles from stale JSON source

getattr(member, "tpm_limit", None) reads from the Member object deserialized from the existing members_with_roles JSON blob. Since Member.tpm_limit and Member.rpm_limit are new fields added by this PR, all existing members_with_roles entries were serialized without them. On deserialization, member.tpm_limit will be None (Pydantic default) even if the budget table has a real value.

The consequence: the first models-only (or role-only) update after deployment will write tpm_limit: null and rpm_limit: null back into members_with_roles, overriding any budget values that had been set previously. Consumers of /team/info (e.g. dashboards) will then show incorrect limits.

The authoritative source for tpm_limit/rpm_limit is LiteLLM_BudgetTable, so the fix is to read from the DB (via the already-fetched _tm_row or budget join) rather than getattr(member, ...):

# Read tpm/rpm from the budget row (authoritative) if available,
# falling back to the request value and then to None.
_membership_tpm = None
_membership_rpm = None
if data.models is None and _tm_row is not None and _tm_row.litellm_budget_table is not None:
    _membership_tpm = _tm_row.litellm_budget_table.tpm_limit
    _membership_rpm = _tm_row.litellm_budget_table.rpm_limit

team_members.append(
    Member(
        ...
        tpm_limit=data.tpm_limit if data.tpm_limit is not None else _membership_tpm,
        rpm_limit=data.rpm_limit if data.rpm_limit is not None else _membership_rpm,
    )
)

Comment on lines +670 to +688
if effective_models:
# if 'all-team-models' was requested, restrict it to the effective models
if "all-team-models" in (data.models or []):
data_json["models"] = effective_models
# if explicit models were requested, validate they're a subset of effective set
elif data.models:
disallowed = set(data.models) - set(effective_models)
if disallowed:
raise HTTPException(
status_code=403,
detail={
"error": f"Requested models not in user's effective team models. "
f"Disallowed: {sorted(disallowed)}. "
f"Effective models: {sorted(effective_models)}"
},
)
# if NO models was requested, runtime auth will compute effective models
# from the SQL view join (tm.models + t.default_models), so nothing to store here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 all-team-models and explicit models silently pass through when effective_models=[]

When team.models=[] (allow-all) and default_models=[] and member_models=[]:

compute_effective_models([], [], [])  →  []  (falls back to team_pool=[])
effective_models = []

The if effective_models: guard is then False, so the entire validation block is skipped. For "all-team-models" requests, data_json["models"] remains ["all-team-models"] and the runtime resolves it to the full team pool — this is the correct allow-all behavior.

However, the if effective_models: check also silently skips the case where effective_models=[] because all member overrides/defaults were stale and were capped out — for example:

  • team.models=["gpt-4"], default_models=["stale-model"], member_models=[]
  • compute_effective_models(["stale-model"], [], ["gpt-4"]) caps to [] → falls back to ["gpt-4"]
  • Returns ["gpt-4"] (non-empty), so this case is actually caught.

The only genuinely unchecked case is the team_pool=[] (allow-all) path, which is intentionally permit-all. The current check is correct but consider adding an inline comment to document why effective_models=[] is safely skipped:

Suggested change
if effective_models:
# if 'all-team-models' was requested, restrict it to the effective models
if "all-team-models" in (data.models or []):
data_json["models"] = effective_models
# if explicit models were requested, validate they're a subset of effective set
elif data.models:
disallowed = set(data.models) - set(effective_models)
if disallowed:
raise HTTPException(
status_code=403,
detail={
"error": f"Requested models not in user's effective team models. "
f"Disallowed: {sorted(disallowed)}. "
f"Effective models: {sorted(effective_models)}"
},
)
# if NO models was requested, runtime auth will compute effective models
# from the SQL view join (tm.models + t.default_models), so nothing to store here
if effective_models:
# effective_models=[] means the team allows all models (team.models=[]),
# so no restriction to enforce at key-gen time; runtime auth handles it.
# if 'all-team-models' was requested, restrict it to the effective models
if "all-team-models" in (data.models or []):

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +2560 to +2572
else:
# Models were explicitly changed — invalidate cache and fetch fresh membership
from litellm.proxy.auth.auth_checks import get_team_membership
from litellm.proxy.proxy_server import user_api_key_cache

_cache_key = f"team_membership:{received_user_id}:{data.team_id}"
await user_api_key_cache.async_delete_cache(key=_cache_key)
_tm_row = await get_team_membership(
user_id=received_user_id,
team_id=data.team_id,
prisma_client=prisma_client,
user_api_key_cache=user_api_key_cache,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicate imports and unconditional DB read in else branch

Both the if data.models is None and else branches contain identical from litellm.proxy.auth.auth_checks import get_team_membership / from litellm.proxy.proxy_server import user_api_key_cache import statements, as well as identical cache-invalidation and get_team_membership calls. These could be hoisted outside the if/else to avoid repetition.

Additionally, when all three of data.models, data.tpm_limit, and data.rpm_limit are explicitly provided by the caller, _tm_row is fetched in the else branch solely to populate _budget for resolved_tpm/resolved_rpm. Since resolved_tpm = data.tpm_limit and resolved_rpm = data.rpm_limit in that scenario, the budget lookup from _tm_row is never used — making the DB round-trip entirely unnecessary for that request path.

@Harshit28j
Copy link
Collaborator Author

@greptileai All inline comments reference code from previous commits that has been fixed. The line numbers
no longer match because the code changed. Specifically:

  1. Direct DB query (line 660) — Now uses get_team_membership cached helper
  2. Cache not invalidated — async_delete_cache called after every membership write
  3. Stale overrides escalate — compute_effective_models caps by team_pool, falls back on empty
  4. Sync delete_cache — Changed to async_delete_cache
  5. Service key restricted — Early return at line 2742 when not valid_token.user_id
  6. Guaranteed DB hit — _needs_db_fetch flag skips DB when all values in request
  7. tpm/rpm stale — Reads from _tm_row.litellm_budget_table (authoritative), not JSON
  8. Pruning no-op — if existing_defaults: guard already prevents it

Can you re-review against the latest commit and clear stale inline comments?

Comment on lines +144 to +150
-H 'Authorization: Bearer <your-master-key>' \
-H 'Content-Type: application/json' \
-d '{
"team_alias": "engineering",
"models": ["gpt-4", "gpt-4o-mini", "gpt-4o"],
"default_models": ["gpt-4o-mini"]
}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 access_group_ids silently bypasses per-member model restrictions

The docs describe the feature as providing fine-grained per-member model access control. However, can_team_access_model in auth_checks.py falls back to the team's access_group_ids after the per-member effective-models check fails — without any restriction from the user's effective_models set. The code comment acknowledges this:

access groups are a team-level concept and are NOT restricted by per-member model overrides. If a team has access_group_ids configured, any member can access models from those groups regardless of their effective_models set.

This means:

  • An admin grants User A only ["gpt-4o-mini"] via per-member overrides
  • The team also has access_group_ids = ["premium-models"] which includes ["gpt-4", "claude-3"]
  • User A can access gpt-4 and claude-3 through the access-group bypass even though they're not in their effective model set

The docs should explicitly call this out so admins understand that access_group_ids is an additive team-wide grant that overrides per-member restrictions. Without this note, an admin who configures per-member restrictions for compliance or cost-control may unknowingly leave a bypass path open.

Comment on lines 209 to 235

# Check if trying to set a budget for team member

if max_budget_in_team is not None:
if (
max_budget_in_team is not None
or new_member.tpm_limit is not None
or new_member.rpm_limit is not None
):
# create a new budget item for this member
_budget_create_data = {
"created_by": user_api_key_dict.user_id or litellm_proxy_admin_name,
"updated_by": user_api_key_dict.user_id or litellm_proxy_admin_name,
}
if max_budget_in_team is not None:
_budget_create_data["max_budget"] = max_budget_in_team
if new_member.tpm_limit is not None:
_budget_create_data["tpm_limit"] = new_member.tpm_limit
if new_member.rpm_limit is not None:
_budget_create_data["rpm_limit"] = new_member.rpm_limit

response = await prisma_client.db.litellm_budgettable.create(
data={
"max_budget": max_budget_in_team,
"created_by": user_api_key_dict.user_id or litellm_proxy_admin_name,
"updated_by": user_api_key_dict.user_id or litellm_proxy_admin_name,
}
data=_budget_create_data # type: ignore
)

_budget_id = response.budget_id
else:
_budget_id = default_team_budget_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 New budget created on every tpm_limit/rpm_limit update, leaving old rows orphaned

The PR extends budget creation to cover tpm_limit and rpm_limit updates (not just max_budget), but it always creates a new LiteLLM_BudgetTable row and links it, without deleting the previous budget. The existing_budget_id parameter is accepted but is only used in the early-return disconnect path — it is never used to update the existing budget in-place.

This means every /team/member_update call that touches tpm_limit or rpm_limit orphans the previous budget row. In teams where member limits are adjusted frequently, this silently accumulates unreferenced rows in LiteLLM_BudgetTable.

This was already the pre-existing behavior for max_budget, but the PR widens the impact to two more fields. Consider either:

  1. Updating the existing budget via litellm_budgettable.update(where={"budget_id": existing_budget_id}, ...) when existing_budget_id is set, or
  2. Deleting the old budget row after unlinking it, to prevent unbounded table growth.

@Harshit28j
Copy link
Collaborator Author

@greptileai How about you give impenation plan to fix all these iusses right wya

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 22, 2026

Implementation Plan to Fix All Issues

Phase 1: Critical (P0) - Deploy Immediately 🚨

1. Migration Safety Issue (litellm/proxy/utils.py:2975)

  • Problem: Raw SQL references default_models/models columns that don't exist pre-migration → crashes all requests
  • Fix: Wrap SQL query in try/catch, fall back to None for new columns on UndefinedColumn error
  • Time: 30 min

Phase 2: High Priority (P1) - Critical Functionality

2. Service Key Restriction Regression (auth_checks.py:2762)

  • Fix: Add guard in get_effective_team_models: if valid_token.user_id is None, return team_pool directly
  • Time: 20 min

3. Cache Invalidation (team_endpoints.py:2536, team_endpoints.py:2548)

  • Fix: Only invalidate cache when models change; fetch before invalidation OR skip when not needed
  • Time: 30 min

4. Async/Sync Blocking (team_endpoints.py)

  • Fix: Use await user_api_key_cache.async_delete_cache() instead of sync version
  • Time: 10 min

5. Stale Override Escalation (auth_checks.py:2717)

  • Fix: Document this behavior OR add configurable strict mode that returns [] instead of falling back
  • Time: 30 min

6. Direct DB Query (key_management_endpoints.py)

  • Fix: Use get_team_membership() helper wrapper instead of direct Prisma call
  • Time: 20 min

7. Stale tpm_limit/rpm_limit (team_endpoints.py:2597)

  • Fix: Read from _tm_row.litellm_budget_table instead of getattr(member, ...)
  • Time: 30 min

8. tpm_limit/rpm_limit Never Sync (team_endpoints.py)

  • Fix: Widen condition to if data.role is not None or data.models is not None or data.tpm_limit is not None or data.rpm_limit is not None:
  • Time: 15 min

9. Auto-Pruning No-Op (team_endpoints.py:1547)

  • Fix: Add guard: if not new_models and existing_defaults:
  • Time: 10 min

10. Access Groups Bypass Docs (model_access.md:150)

  • Fix: Add explicit warning that access groups override member restrictions
  • Time: 15 min

11. Service Key all-team-models (key_management_endpoints.py:690)

  • Fix: Document OR exempt service keys: if data.user_id and prisma_client is not None:
  • Time: 20 min

Phase 3: Optimizations (P2)

12. Double Env-Var Checks (auth_checks.py:2736, key_management_endpoints.py:646)

  • Fix: Check only litellm.team_model_overrides_enabled
  • Time: 15 min

13. Non-Deterministic Order (auth_checks.py)

  • Fix: Use list(dict.fromkeys(team_defaults + member_models))
  • Time: 10 min

14. Duplicate Tests (tests/proxy_unit_tests/test_team_model_overrides.py)

  • Fix: Delete duplicate test file
  • Time: 5 min

15. Orphaned Budget Rows (management_helpers/utils.py:235)

  • Fix: Update existing budget via existing_budget_id instead of creating new
  • Time: 45 min

16. Duplicate Imports (team_endpoints.py)

  • Fix: Hoist imports; skip unnecessary DB reads
  • Time: 20 min

17. Documentation Gaps (model_access.md:152)

  • Fix: Add sections for capping semantics and service-key behavior
  • Time: 20 min

18. Unnecessary DB Calls (team_endpoints.py)

  • Fix: Only fetch membership when data.role or data.models is set
  • Time: 25 min

Execution Order:

  1. P0 migration safety (blocks deployment)
  2. Service key regressions (Enable model / call timeouts #2, return usage for all providers - As OpenAI does #11) - backward compatibility
  3. Cache issues (Guarantee format of exceptions #3, Add support for replicate [code in ticket] #4) - performance + correctness
  4. Stale data (Error feedback - malformed input #7, Added OpenRouter support #8) - data integrity
  5. Privilege issues (Update main.py #5, Track pricing per request #10) - security
  6. Remaining P1 (handle max tokens #6, Add state handlers / observability #9)
  7. P2 optimizations (Pushing data to posthog, sentry, slack, etc.  #12-18)

Total Time Estimate: ~6-8 hours focused work + testing = 1 day

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant