Skip to content

Add TPM Controller to Respect LLM API Token Rate Limits #2841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Ahmed22347
Copy link

Overview

This PR addresses the issue of exceeding the token-per-minute (TPM) limits imposed by LLM API providers (e.g., Groq). Such overuse could result in throttling or errors, which can affect application reliability.

Changes Introduced

  • Enhanced Agent Class

    • Introduced a new variable: max_tpm (maximum tokens per minute).
    • Integrated a TPM controller mechanism, modeled after the existing RPM (requests per minute) controller.
  • Created TPMController

    • Designed similarly to the RPMController.
    • Tracks and limits token usage on a per-minute basis.
    • Prevents sending more tokens than allowed in a rolling 60-second window.
  • Improved Error Handling

    • Captures exceptions caused by TPM limit violations.
    • Implements a 60-second wait before retrying requests that exceed the token rate limit.

Benefits

  • Ensures compliance with API usage policies.
  • Reduces the likelihood of service disruption .
  • Improves overall stability when interacting with high-throughput LLM APIs.

Notes

  • The TPM controller is opt-in via the max_tpm parameter and integrates seamlessly with existing rate-limiting logic.
  • Future improvements may include adding max_tpm on a crew level.

@joaomdmoura
Copy link
Collaborator

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment: Added Tokens Per Minute Counter PR

Overview

This pull request introduces a Tokens Per Minute (TPM) rate-limiting feature alongside the existing Requests Per Minute (RPM) controller. The changes span six files, enhancing the system's capacity to manage token consumption effectively.

Detailed Insights

1. New TPM Controller Implementation (tpm_controller.py)

Strengths:

  • Clean and efficient implementation using threading for TPM tracking.
  • Proper integration with the TokenProcess functionality.
  • Thread-safe operations using locks enhance robustness.

Issues and Recommendations:

  • Docstrings: Add method documentation to improve maintainability.

    def check_or_wait(self, wait: int = 0):
        """Checks token usage is within limits or waits if exceeded.
        Args:
            wait (int): Wait time in seconds if limit exceeded.
        Returns:
            bool: True if operation can proceed, else False.
  • Logging: Replace debug print statements with a logger for consistent output control.

    self.logger.debug(f"Tokens increased: {self._current_tokens}")
  • Destructor Cleanup: Ensure resources are released correctly.

    def __del__(self):
        """Cleanup timer resources."""
        self.stop_tpm_counter()

2. Modifications in Agent Class (agent.py)

Notable Issues:

  • Documentation Typo: Ensure descriptions for max_tpm are precise.

    description="Maximum number of tokens per minute that can be processed."
  • Validation: Implement validation to check max_tpm for positive values only.

    @validator('max_tpm')
    def validate_max_tpm(cls, v):
        if v is not None and v <= 0:
            raise ValueError("max_tpm must be positive.")

3. Base Agent Implementation (base_agent.py)

Key Issue:

  • Parameter Reference: Correct the reference from max_rpm to max_tpm to avoid confusion and ensure functionality.
    if self.max_tpm and not self._tpm_controller:
        self._tpm_controller = TPMController(max_tpm=self.max_tpm)

4. Agent Utilities (agent_utils.py)

Enhancements Suggested:

  • Error Handling: Add robust error handling to manage failures related to token limits.
    def handle_exceeded_token_limits(tpm_controller: TPMController):
        ...

General Recommendations

  1. Unit Testing: Comprehensive tests for TPM functionality are crucial.
  2. Metrics Collection: Implement monitoring for token usage.
  3. Configuration Options: Enable TPM limits to be set via environment variables.
  4. Logging Warnings: Introduce warnings when approaching limits.
  5. Graceful Degradation: Ensure seamless user experience under limit conditions.

Security Considerations

  1. Add sanitization for rate limiting information in logs.
  2. Secure methods for counting tokens to mitigate manipulation risks.
  3. Protection against potential timer exploitation.

Suggested Testing Scenarios

  1. Test concurrent enforcement of token limits.
  2. Verify resource cleanup under various scenarios.
  3. Validate integration performance across multiple LLM providers.

Documentation Enhancements

  1. Update README for TPM configuration.
  2. Best practices for setting TPM limits should be documented.
  3. Troubleshooting guides for common token limit issues need to be added.

This PR is a substantial improvement to token management but requires addressing the outlined issues and recommendations to ensure its readiness for production. The foundation laid out is promising, and with thorough testing and consideration of the suggested improvements, it will robustly enhance the existing functionalities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants