fix: batch-limit stale object cleanup + bump litellm-enterprise to 0.1.37#25264
fix: batch-limit stale object cleanup + bump litellm-enterprise to 0.1.37#25264ishaan-berri wants to merge 4 commits intomainfrom
Conversation
Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once.
Two fixes to _cleanup_stale_managed_objects: 1. Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma). 2. Extract _expire_stale_rows as a separate method for testability. Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
This reverts commit 5a76335.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR fixes an unbounded stale-object cleanup query that could lock or overwhelm the database when a large backlog of
Confidence Score: 4/5Safe to merge with minor concerns — SQL logic is correct and the performance fix is valid, but raw SQL regresses coding standards and no tests were added All findings are P2, but execute_raw directly regresses a rule this same file previously followed correctly, and the missing tests violate a stated hard requirement in CLAUDE.md — warranting a 4 rather than 5 enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py
|
| Filename | Overview |
|---|---|
| enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py | Replaces update_many with bounded execute_raw SQL to add LIMIT support; regresses no-raw-SQL rule and lacks unit tests |
| litellm/constants.py | Adds configurable STALE_OBJECT_CLEANUP_BATCH_SIZE constant with env-var override and lower bound of 1 |
| pyproject.toml | Bumps litellm-enterprise optional dependency from 0.1.36 to 0.1.37 |
| requirements.txt | Bumps litellm-enterprise pin from 0.1.36 to 0.1.37 |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[check_responses_cost] --> B[_cleanup_stale_managed_objects]
B --> C[cutoff = now - STALENESS_CUTOFF_DAYS]
C --> D[_expire_stale_rows\ncutoff, BATCH_SIZE]
D --> E[execute_raw\nUPDATE ... WHERE id IN\nSELECT id ... LIMIT N]
E --> F{rows updated > 0?}
F -- yes --> G[Log warning with count]
F -- no --> H[Return]
G --> I
H --> I[find_many\nLIMIT MAX_OBJECTS_PER_POLL_CYCLE]
I --> J[For each job:\naget_responses]
J --> K{Terminal status?}
K -- completed/failed/cancelled --> L[Append to completed_jobs]
K -- still active --> M[Skip]
L --> N[update_many\ncompleted_jobs as 'completed']
Comments Outside Diff (1)
-
enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py, line 36-83 (link)Missing unit tests for the new cleanup logic
CLAUDE.md states "Adding at least 1 test is a hard requirement," and the PR's pre-submission checklist has this item unchecked. No tests for
_expire_stale_rowsor the modified_cleanup_stale_managed_objectsappear intests/test_litellm/.The
_expire_stale_rowsisolation boundary was designed exactly for mocking — a minimal test suite should cover:_expire_stale_rowsis called with the correctcutoffandSTALE_OBJECT_CLEANUP_BATCH_SIZE.- When the return value is
> 0, the warning is logged. _cleanup_stale_managed_objectsdoes not raise when_expire_stale_rowsreturns0.
Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)
Reviews (1): Last reviewed commit: "Revert "bump litellm-enterprise to 0.1.3..." | Re-trigger Greptile
| return await self.prisma_client.db.execute_raw( | ||
| """ | ||
| UPDATE "LiteLLM_ManagedObjectTable" | ||
| SET "status" = 'stale_expired' | ||
| WHERE "id" IN ( | ||
| SELECT "id" FROM "LiteLLM_ManagedObjectTable" | ||
| WHERE "file_purpose" = 'response' | ||
| AND "status" NOT IN ('completed', 'complete', 'failed', 'expired', 'cancelled', 'stale_expired') | ||
| AND "created_at" < $1::timestamptz | ||
| ORDER BY "created_at" ASC | ||
| LIMIT $2 | ||
| ) | ||
| """, | ||
| cutoff, | ||
| batch_size, | ||
| ) |
There was a problem hiding this comment.
execute_raw regresses the no-raw-SQL coding standard
CLAUDE.md is explicit: "Do not write raw SQL for proxy DB operations. Use Prisma model methods instead of execute_raw / query_raw." The previous code in this same file correctly used Prisma's update_many model method — this PR regresses a file that was already compliant.
The PR cites spend_log_cleanup.py as precedent, but that file predates the rule and is not a justification for new code to follow the same pattern.
Conforming two-query approach — fetch a bounded set of IDs with find_many, then update_many on that set:
stale_rows = await self.prisma_client.db.litellm_managedobjecttable.find_many(
where={
"file_purpose": "response",
"status": {"not_in": ["completed", "complete", "failed", "expired", "cancelled", "stale_expired"]},
"created_at": {"lt": cutoff},
},
take=batch_size,
select={"id": True},
order={"created_at": "asc"},
)
stale_ids = [row.id for row in stale_rows]
if not stale_ids:
return 0
result = await self.prisma_client.db.litellm_managedobjecttable.update_many(
where={"id": {"in": stale_ids}},
data={"status": "stale_expired"},
)
return resultThis preserves the batch-size bound, avoids raw SQL, and is mockable with simple Prisma stubs.
Context Used: CLAUDE.md (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Relevant issues
Pre-Submission checklist
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
🚄 Infrastructure
Changes
STALE_OBJECT_CLEANUP_BATCH_SIZEconstant (default 1000) tolitellm/constants.pycheck_responses_cost.pyto use a single bounded SQL query (DELETE ... WHERE id IN (SELECT id ... LIMIT batch_size)) instead of fetching all rows then deleting one-by-one — prevents unbounded memory usage and N+1 deletes on large tableslitellm-enterpriseto0.1.37inpyproject.tomlandrequirements.txt