Skip to content

fix: guard llm_mini cost with goal check, dedup, and rate limiting#5621

Merged
kodjima33 merged 1 commit intomainfrom
fix/issue-5530
Mar 14, 2026
Merged

fix: guard llm_mini cost with goal check, dedup, and rate limiting#5621
kodjima33 merged 1 commit intomainfrom
fix/issue-5530

Conversation

@kodjima33
Copy link
Collaborator

Summary

Adds structural guards to prevent the LLM cost spike that occurred on Mar 9 (~5x OpenAI spend).

Changes

  • backend/database/redis_db.py — Added two Redis guard functions:
    • try_acquire_goal_extraction_lock(uid) — per-user rate limit (5 min cooldown) for chat-triggered goal extraction
    • try_acquire_conversation_goal_lock(uid, conversation_id) — idempotency gate preventing duplicate goal extraction for the same conversation
  • backend/routers/chat.py — Goal extraction now gated behind rate limit check; only fires once per 5 min per user instead of every message
  • backend/utils/llm/chat.py — Fixed extract_question_from_conversation sending the full message history twice; previous_messages now excludes already-included user_last_messages

Fixes #5530

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 14, 2026

Greptile Summary

This PR adds cost-control guards to prevent the LLM spending spike seen on Mar 9 (~5x OpenAI spend). It introduces a per-user Redis rate limit (5 min cooldown) on chat-triggered goal extraction and fixes extract_question_from_conversation which was sending the full message history twice by correcting the previous_messages slice.

  • Rate limiting: Goal extraction in send_message is now gated by try_acquire_goal_extraction_lock, reducing calls from every message to at most once per 5 minutes per user.
  • Dedup fix: extract_question_from_conversation now uses messages[:user_message_idx] for <previous_messages>, avoiding duplicating the user's last messages that were already in <user_last_messages>.
  • Missing error handling: The new Redis guard functions lack the @try_catch_decorator used by all other functions in redis_db.py. Without it, a Redis outage would crash the chat endpoint instead of gracefully degrading.
  • Dead code: try_acquire_conversation_goal_lock is defined but has no callers — it may be intended for _update_goal_progress in process_conversation.py but was not wired in.

Confidence Score: 3/5

  • The core rate-limiting logic is sound, but the missing try_catch_decorator could turn a Redis blip into a user-facing 500 error.
  • Score of 3 reflects that the main changes (rate limiting and dedup fix) are correct and well-targeted, but two issues lower confidence: (1) the new Redis functions lack the error-handling decorator that all neighboring functions use, creating a reliability risk, and (2) one of the two new functions is dead code with no callers.
  • Pay close attention to backend/database/redis_db.py — the new functions need the @try_catch_decorator to match the module's error-handling pattern.

Important Files Changed

Filename Overview
backend/database/redis_db.py Adds two Redis guard functions for goal extraction rate-limiting and idempotency, but both are missing the @try_catch_decorator used by all other functions in this file, and try_acquire_conversation_goal_lock is unused.
backend/routers/chat.py Correctly gates goal extraction behind a per-user Redis rate limit (5 min cooldown). Clean integration with the new redis_db import.
backend/utils/llm/chat.py Fixes duplicate message history in extract_question_from_conversation by slicing messages[:user_message_idx] for previous_messages. Remaining changes are cosmetic .replace() reformatting.

Sequence Diagram

sequenceDiagram
    participant User
    participant ChatRouter as chat.py (send_message)
    participant Redis as redis_db
    participant Thread as Background Thread
    participant Goals as goals.py

    User->>ChatRouter: POST /v2/messages
    ChatRouter->>ChatRouter: Save message to DB
    ChatRouter->>Redis: try_acquire_goal_extraction_lock(uid)
    alt Lock acquired (first call in 5 min)
        Redis-->>ChatRouter: True
        ChatRouter->>Thread: Start extract_and_update_goal_progress
        Thread->>Goals: extract_and_update_goal_progress(uid, text)
        Goals->>Goals: Check user goals, invoke LLM
    else Rate-limited (called within 5 min)
        Redis-->>ChatRouter: False
        Note over ChatRouter: Goal extraction skipped
    end
    ChatRouter->>ChatRouter: Continue with chat response
    ChatRouter-->>User: Response
Loading

Last reviewed commit: 4f8942c

Comment on lines +892 to +896
def try_acquire_goal_extraction_lock(uid: str, ttl: int = 300) -> bool:
"""Rate-limit goal progress extraction to once per TTL seconds per user.
Returns True if acquired (caller should proceed), False if rate-limited."""
result = r.set(f'users:{uid}:goal_extraction_lock', '1', ex=ttl, nx=True)
return result is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing @try_catch_decorator on new Redis functions

Every other Redis function in this file is decorated with @try_catch_decorator (see lines 32, 41, 831, 838, 847, etc.), which catches Redis connection errors and returns None gracefully. Without it, if Redis is temporarily unreachable, try_acquire_goal_extraction_lock will raise an unhandled exception that propagates up to send_message in chat.py, causing the entire chat endpoint to return a 500 error to the user.

The same issue applies to try_acquire_conversation_goal_lock below.

Suggested change
def try_acquire_goal_extraction_lock(uid: str, ttl: int = 300) -> bool:
"""Rate-limit goal progress extraction to once per TTL seconds per user.
Returns True if acquired (caller should proceed), False if rate-limited."""
result = r.set(f'users:{uid}:goal_extraction_lock', '1', ex=ttl, nx=True)
return result is not None
@try_catch_decorator
def try_acquire_goal_extraction_lock(uid: str, ttl: int = 300) -> bool:
"""Rate-limit goal progress extraction to once per TTL seconds per user.
Returns True if acquired (caller should proceed), False if rate-limited."""
result = r.set(f'users:{uid}:goal_extraction_lock', '1', ex=ttl, nx=True)
return result is not None

Comment on lines +899 to +903
def try_acquire_conversation_goal_lock(uid: str, conversation_id: str, ttl: int = 3600) -> bool:
"""Idempotency gate — prevent duplicate goal extraction for the same conversation.
Returns True if this is the first attempt (caller should proceed), False if already processed."""
result = r.set(f'users:{uid}:conv_goal:{conversation_id}', '1', ex=ttl, nx=True)
return result is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try_acquire_conversation_goal_lock is defined but never called

This function is added but has no callers anywhere in the codebase — neither in this PR nor elsewhere. The PR description mentions it as an "idempotency gate preventing duplicate goal extraction for the same conversation," but _update_goal_progress in process_conversation.py:361 still calls extract_and_update_goal_progress without this guard. Was this intended to be wired in somewhere?

@kodjima33 kodjima33 merged commit 1ca13e8 into main Mar 14, 2026
1 check passed
@kodjima33 kodjima33 deleted the fix/issue-5530 branch March 14, 2026 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: guard llm_mini cost — goal progress rate-limit + deduplicate chat history

1 participant