fix: guard llm_mini cost with goal check, dedup, and rate limiting#5621
fix: guard llm_mini cost with goal check, dedup, and rate limiting#5621
Conversation
Greptile SummaryThis PR adds cost-control guards to prevent the LLM spending spike seen on Mar 9 (~5x OpenAI spend). It introduces a per-user Redis rate limit (5 min cooldown) on chat-triggered goal extraction and fixes
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant ChatRouter as chat.py (send_message)
participant Redis as redis_db
participant Thread as Background Thread
participant Goals as goals.py
User->>ChatRouter: POST /v2/messages
ChatRouter->>ChatRouter: Save message to DB
ChatRouter->>Redis: try_acquire_goal_extraction_lock(uid)
alt Lock acquired (first call in 5 min)
Redis-->>ChatRouter: True
ChatRouter->>Thread: Start extract_and_update_goal_progress
Thread->>Goals: extract_and_update_goal_progress(uid, text)
Goals->>Goals: Check user goals, invoke LLM
else Rate-limited (called within 5 min)
Redis-->>ChatRouter: False
Note over ChatRouter: Goal extraction skipped
end
ChatRouter->>ChatRouter: Continue with chat response
ChatRouter-->>User: Response
Last reviewed commit: 4f8942c |
| def try_acquire_goal_extraction_lock(uid: str, ttl: int = 300) -> bool: | ||
| """Rate-limit goal progress extraction to once per TTL seconds per user. | ||
| Returns True if acquired (caller should proceed), False if rate-limited.""" | ||
| result = r.set(f'users:{uid}:goal_extraction_lock', '1', ex=ttl, nx=True) | ||
| return result is not None |
There was a problem hiding this comment.
Missing @try_catch_decorator on new Redis functions
Every other Redis function in this file is decorated with @try_catch_decorator (see lines 32, 41, 831, 838, 847, etc.), which catches Redis connection errors and returns None gracefully. Without it, if Redis is temporarily unreachable, try_acquire_goal_extraction_lock will raise an unhandled exception that propagates up to send_message in chat.py, causing the entire chat endpoint to return a 500 error to the user.
The same issue applies to try_acquire_conversation_goal_lock below.
| def try_acquire_goal_extraction_lock(uid: str, ttl: int = 300) -> bool: | |
| """Rate-limit goal progress extraction to once per TTL seconds per user. | |
| Returns True if acquired (caller should proceed), False if rate-limited.""" | |
| result = r.set(f'users:{uid}:goal_extraction_lock', '1', ex=ttl, nx=True) | |
| return result is not None | |
| @try_catch_decorator | |
| def try_acquire_goal_extraction_lock(uid: str, ttl: int = 300) -> bool: | |
| """Rate-limit goal progress extraction to once per TTL seconds per user. | |
| Returns True if acquired (caller should proceed), False if rate-limited.""" | |
| result = r.set(f'users:{uid}:goal_extraction_lock', '1', ex=ttl, nx=True) | |
| return result is not None |
| def try_acquire_conversation_goal_lock(uid: str, conversation_id: str, ttl: int = 3600) -> bool: | ||
| """Idempotency gate — prevent duplicate goal extraction for the same conversation. | ||
| Returns True if this is the first attempt (caller should proceed), False if already processed.""" | ||
| result = r.set(f'users:{uid}:conv_goal:{conversation_id}', '1', ex=ttl, nx=True) | ||
| return result is not None |
There was a problem hiding this comment.
try_acquire_conversation_goal_lock is defined but never called
This function is added but has no callers anywhere in the codebase — neither in this PR nor elsewhere. The PR description mentions it as an "idempotency gate preventing duplicate goal extraction for the same conversation," but _update_goal_progress in process_conversation.py:361 still calls extract_and_update_goal_progress without this guard. Was this intended to be wired in somewhere?
4f8942c to
948efac
Compare
Summary
Adds structural guards to prevent the LLM cost spike that occurred on Mar 9 (~5x OpenAI spend).
Changes
backend/database/redis_db.py— Added two Redis guard functions:try_acquire_goal_extraction_lock(uid)— per-user rate limit (5 min cooldown) for chat-triggered goal extractiontry_acquire_conversation_goal_lock(uid, conversation_id)— idempotency gate preventing duplicate goal extraction for the same conversationbackend/routers/chat.py— Goal extraction now gated behind rate limit check; only fires once per 5 min per user instead of every messagebackend/utils/llm/chat.py— Fixedextract_question_from_conversationsending the full message history twice;previous_messagesnow excludes already-includeduser_last_messagesFixes #5530