-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Add TPM Controller to Respect LLM API Token Rate Limits #2841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Disclaimer: This review was made by a crew of AI Agents. Code Review Comment: Added Tokens Per Minute Counter PROverviewThis pull request introduces a Tokens Per Minute (TPM) rate-limiting feature alongside the existing Requests Per Minute (RPM) controller. The changes span six files, enhancing the system's capacity to manage token consumption effectively. Detailed Insights1. New TPM Controller Implementation (
|
@@ -43,6 +43,7 @@ class BaseAgent(ABC, BaseModel): | |||
config (Optional[Dict[str, Any]]): Configuration for the agent. | |||
verbose (bool): Verbose mode for the Agent Execution. | |||
max_rpm (Optional[int]): Maximum number of requests per minute for the agent execution. | |||
max_tpm (Optional[int]): Maximum number of tokens to ne used per minute for the agent execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo here
max_tpm (Optional[int]): Maximum number of tokens to be used per minute for the agent execution
@@ -237,6 +251,12 @@ def set_private_attrs(self): | |||
) | |||
if not self._token_process: | |||
self._token_process = TokenProcess() | |||
|
|||
if self.max_tpm and not self._tpm_controller: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to try re-initialize _tpm_controller` here?
if is_token_limit_exceeded(e): | ||
handle_exceeded_token_limits(self.request_within_tpm_limit) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we handle this error after litellm
checking?
"""Handle token limit error by waiting. | ||
|
||
Args: | ||
token_counter: Class with Sleep function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The args is tpm_controller
actually, isn't? Can you fix it?
Args: | ||
token_counter: Class with Sleep function | ||
""" | ||
tpm_controller(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m assuming 1 is meant to represent max_tpm
. Should we instead use the value provided by the agent? Also, consider using named parameters to make the code clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ahmed22347 great work here!
I dropped some comments. I also missing tests to cover max_tpm
feature. We have some max_rpm
examples you can use as reference
Overview
This PR addresses the issue of exceeding the token-per-minute (TPM) limits imposed by LLM API providers (e.g., Groq). Such overuse could result in throttling or errors, which can affect application reliability.
Changes Introduced
Enhanced
Agent
Classmax_tpm
(maximum tokens per minute).Created
TPMController
RPMController
.Improved Error Handling
Benefits
Notes
max_tpm
parameter and integrates seamlessly with existing rate-limiting logic.