Skip to content

fix: batch-limit stale object cleanup + bump litellm-enterprise to 0.1.37#25264

Open
ishaan-berri wants to merge 4 commits intomainfrom
fix/stale-object-cleanup-batch-limit
Open

fix: batch-limit stale object cleanup + bump litellm-enterprise to 0.1.37#25264
ishaan-berri wants to merge 4 commits intomainfrom
fix/stale-object-cleanup-batch-limit

Conversation

@ishaan-berri
Copy link
Copy Markdown
Contributor

Relevant issues

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix
🚄 Infrastructure

Changes

  • Adds STALE_OBJECT_CLEANUP_BATCH_SIZE constant (default 1000) to litellm/constants.py
  • Rewrites the stale managed object cleanup in check_responses_cost.py to use a single bounded SQL query (DELETE ... WHERE id IN (SELECT id ... LIMIT batch_size)) instead of fetching all rows then deleting one-by-one — prevents unbounded memory usage and N+1 deletes on large tables
  • Bumps litellm-enterprise to 0.1.37 in pyproject.toml and requirements.txt

Configurable batch limit (default 1000) for stale managed object cleanup,
preventing unbounded UPDATE queries from hitting 300K+ rows at once.
Two fixes to _cleanup_stale_managed_objects:

1. Replace unbounded update_many with a single execute_raw using a
   subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE
   rows. Zero rows loaded into Python memory — everything stays in Postgres.
   Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py
   (the proxy requires PostgreSQL per schema.prisma).

2. Extract _expire_stale_rows as a separate method for testability.

Keeps the file_purpose='response' filter to avoid incorrectly expiring
long-running batch or fine-tune jobs that legitimately exceed the
staleness cutoff.
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 7, 2026 4:02am

Request Review

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Apr 7, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing fix/stale-object-cleanup-batch-limit (caa4b96) with main (7a9a9f0)

Open in CodSpeed

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 7, 2026

Greptile Summary

This PR fixes an unbounded stale-object cleanup query that could lock or overwhelm the database when a large backlog of LiteLLM_ManagedObjectTable rows exists. It replaces the previous update_many (no row limit) with a bounded cleanup capped at STALE_OBJECT_CLEANUP_BATCH_SIZE rows per invocation, adds a configurable constant (default 1,000), and bumps litellm-enterprise to 0.1.37.

  • The performance motivation is valid: the old update_many had no LIMIT and could touch hundreds of thousands of rows in a single query
  • The new _expire_stale_rows() method is correctly isolated for testability, and STALE_OBJECT_CLEANUP_BATCH_SIZE is properly bounded with max(1, ...) and configurable via env var
  • The implementation uses execute_raw with hand-written SQL, which regresses from the Prisma model-methods rule that this file previously followed — a conforming two-query approach (find_many(..., take=batch_size, select={"id": True}) + update_many) is viable and avoids raw SQL
  • No unit tests were added for the new method despite this being a stated hard requirement in CLAUDE.md

Confidence Score: 4/5

Safe to merge with minor concerns — SQL logic is correct and the performance fix is valid, but raw SQL regresses coding standards and no tests were added

All findings are P2, but execute_raw directly regresses a rule this same file previously followed correctly, and the missing tests violate a stated hard requirement in CLAUDE.md — warranting a 4 rather than 5

enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py

Important Files Changed

Filename Overview
enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py Replaces update_many with bounded execute_raw SQL to add LIMIT support; regresses no-raw-SQL rule and lacks unit tests
litellm/constants.py Adds configurable STALE_OBJECT_CLEANUP_BATCH_SIZE constant with env-var override and lower bound of 1
pyproject.toml Bumps litellm-enterprise optional dependency from 0.1.36 to 0.1.37
requirements.txt Bumps litellm-enterprise pin from 0.1.36 to 0.1.37

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[check_responses_cost] --> B[_cleanup_stale_managed_objects]
    B --> C[cutoff = now - STALENESS_CUTOFF_DAYS]
    C --> D[_expire_stale_rows\ncutoff, BATCH_SIZE]
    D --> E[execute_raw\nUPDATE ... WHERE id IN\nSELECT id ... LIMIT N]
    E --> F{rows updated > 0?}
    F -- yes --> G[Log warning with count]
    F -- no --> H[Return]
    G --> I
    H --> I[find_many\nLIMIT MAX_OBJECTS_PER_POLL_CYCLE]
    I --> J[For each job:\naget_responses]
    J --> K{Terminal status?}
    K -- completed/failed/cancelled --> L[Append to completed_jobs]
    K -- still active --> M[Skip]
    L --> N[update_many\ncompleted_jobs as 'completed']
Loading

Comments Outside Diff (1)

  1. enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py, line 36-83 (link)

    P2 Missing unit tests for the new cleanup logic

    CLAUDE.md states "Adding at least 1 test is a hard requirement," and the PR's pre-submission checklist has this item unchecked. No tests for _expire_stale_rows or the modified _cleanup_stale_managed_objects appear in tests/test_litellm/.

    The _expire_stale_rows isolation boundary was designed exactly for mocking — a minimal test suite should cover:

    1. _expire_stale_rows is called with the correct cutoff and STALE_OBJECT_CLEANUP_BATCH_SIZE.
    2. When the return value is > 0, the warning is logged.
    3. _cleanup_stale_managed_objects does not raise when _expire_stale_rows returns 0.

    Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)

Reviews (1): Last reviewed commit: "Revert "bump litellm-enterprise to 0.1.3..." | Re-trigger Greptile

Comment on lines +49 to +64
return await self.prisma_client.db.execute_raw(
"""
UPDATE "LiteLLM_ManagedObjectTable"
SET "status" = 'stale_expired'
WHERE "id" IN (
SELECT "id" FROM "LiteLLM_ManagedObjectTable"
WHERE "file_purpose" = 'response'
AND "status" NOT IN ('completed', 'complete', 'failed', 'expired', 'cancelled', 'stale_expired')
AND "created_at" < $1::timestamptz
ORDER BY "created_at" ASC
LIMIT $2
)
""",
cutoff,
batch_size,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 execute_raw regresses the no-raw-SQL coding standard

CLAUDE.md is explicit: "Do not write raw SQL for proxy DB operations. Use Prisma model methods instead of execute_raw / query_raw." The previous code in this same file correctly used Prisma's update_many model method — this PR regresses a file that was already compliant.

The PR cites spend_log_cleanup.py as precedent, but that file predates the rule and is not a justification for new code to follow the same pattern.

Conforming two-query approach — fetch a bounded set of IDs with find_many, then update_many on that set:

stale_rows = await self.prisma_client.db.litellm_managedobjecttable.find_many(
    where={
        "file_purpose": "response",
        "status": {"not_in": ["completed", "complete", "failed", "expired", "cancelled", "stale_expired"]},
        "created_at": {"lt": cutoff},
    },
    take=batch_size,
    select={"id": True},
    order={"created_at": "asc"},
)
stale_ids = [row.id for row in stale_rows]
if not stale_ids:
    return 0
result = await self.prisma_client.db.litellm_managedobjecttable.update_many(
    where={"id": {"in": stale_ids}},
    data={"status": "stale_expired"},
)
return result

This preserves the batch-size bound, avoids raw SQL, and is mockable with simple Prisma stubs.

Context Used: CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants