Add TPM Controller to Respect LLM API Token Rate Limits #2841

Ahmed22347 · 2025-05-15T13:25:13Z

Overview

This PR addresses the issue of exceeding the token-per-minute (TPM) limits imposed by LLM API providers (e.g., Groq). Such overuse could result in throttling or errors, which can affect application reliability.

Changes Introduced

Enhanced Agent Class
- Introduced a new variable: max_tpm (maximum tokens per minute).
- Integrated a TPM controller mechanism, modeled after the existing RPM (requests per minute) controller.
Created TPMController
- Designed similarly to the RPMController.
- Tracks and limits token usage on a per-minute basis.
- Prevents sending more tokens than allowed in a rolling 60-second window.
Improved Error Handling
- Captures exceptions caused by TPM limit violations.
- Implements a 60-second wait before retrying requests that exceed the token rate limit.

Benefits

Ensures compliance with API usage policies.
Reduces the likelihood of service disruption .
Improves overall stability when interacting with high-throughput LLM APIs.

Notes

The TPM controller is opt-in via the max_tpm parameter and integrates seamlessly with existing rate-limiting logic.
Future improvements may include adding max_tpm on a crew level.

joaomdmoura · 2025-05-15T13:27:51Z

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment: Added Tokens Per Minute Counter PR

Overview

This pull request introduces a Tokens Per Minute (TPM) rate-limiting feature alongside the existing Requests Per Minute (RPM) controller. The changes span six files, enhancing the system's capacity to manage token consumption effectively.

Detailed Insights

1. New TPM Controller Implementation (`tpm_controller.py`)

Strengths:

Clean and efficient implementation using threading for TPM tracking.
Proper integration with the TokenProcess functionality.
Thread-safe operations using locks enhance robustness.

Issues and Recommendations:

Docstrings: Add method documentation to improve maintainability.

def check_or_wait(self, wait: int = 0):
    """Checks token usage is within limits or waits if exceeded.
    Args:
        wait (int): Wait time in seconds if limit exceeded.
    Returns:
        bool: True if operation can proceed, else False.

Logging: Replace debug print statements with a logger for consistent output control.
```
self.logger.debug(f"Tokens increased: {self._current_tokens}")
```

Destructor Cleanup: Ensure resources are released correctly.

def __del__(self):
    """Cleanup timer resources."""
    self.stop_tpm_counter()

2. Modifications in Agent Class (`agent.py`)

Notable Issues:

Documentation Typo: Ensure descriptions for max_tpm are precise.

description="Maximum number of tokens per minute that can be processed."

Validation: Implement validation to check max_tpm for positive values only.

@validator('max_tpm')
def validate_max_tpm(cls, v):
    if v is not None and v <= 0:
        raise ValueError("max_tpm must be positive.")

3. Base Agent Implementation (`base_agent.py`)

Key Issue:

Parameter Reference: Correct the reference from max_rpm to max_tpm to avoid confusion and ensure functionality.

if self.max_tpm and not self._tpm_controller:
    self._tpm_controller = TPMController(max_tpm=self.max_tpm)

4. Agent Utilities (`agent_utils.py`)

Enhancements Suggested:

Error Handling: Add robust error handling to manage failures related to token limits.
```
def handle_exceeded_token_limits(tpm_controller: TPMController):
    ...
```

General Recommendations

Unit Testing: Comprehensive tests for TPM functionality are crucial.
Metrics Collection: Implement monitoring for token usage.
Configuration Options: Enable TPM limits to be set via environment variables.
Logging Warnings: Introduce warnings when approaching limits.
Graceful Degradation: Ensure seamless user experience under limit conditions.

Security Considerations

Add sanitization for rate limiting information in logs.
Secure methods for counting tokens to mitigate manipulation risks.
Protection against potential timer exploitation.

Suggested Testing Scenarios

Test concurrent enforcement of token limits.
Verify resource cleanup under various scenarios.
Validate integration performance across multiple LLM providers.

Documentation Enhancements

Update README for TPM configuration.
Best practices for setting TPM limits should be documented.
Troubleshooting guides for common token limit issues need to be added.

This PR is a substantial improvement to token management but requires addressing the outlined issues and recommendations to ensure its readiness for production. The foundation laid out is promising, and with thorough testing and consideration of the suggested improvements, it will robustly enhance the existing functionalities.

lucasgomide · 2025-06-09T16:33:17Z

src/crewai/agents/agent_builder/base_agent.py

@@ -43,6 +43,7 @@ class BaseAgent(ABC, BaseModel):
        config (Optional[Dict[str, Any]]): Configuration for the agent.
        verbose (bool): Verbose mode for the Agent Execution.
        max_rpm (Optional[int]): Maximum number of requests per minute for the agent execution.
+        max_tpm (Optional[int]): Maximum number of tokens  to ne used per minute for the agent execution.


typo here

max_tpm (Optional[int]): Maximum number of tokens to be used per minute for the agent execution

lucasgomide · 2025-06-09T16:35:30Z

src/crewai/agents/agent_builder/base_agent.py

@@ -237,6 +251,12 @@ def set_private_attrs(self):
            )
        if not self._token_process:
            self._token_process = TokenProcess()
+
+        if self.max_tpm and not self._tpm_controller:


Any reason to try re-initialize _tpm_controller` here?

lucasgomide · 2025-06-09T16:48:39Z

src/crewai/agents/crew_agent_executor.py

+                if is_token_limit_exceeded(e):
+                    handle_exceeded_token_limits(self.request_within_tpm_limit)
+                    continue


Should we handle this error after litellm checking?

lucasgomide · 2025-06-09T16:49:48Z

src/crewai/utilities/agent_utils.py

+    """Handle token limit error by waiting.
+
+    Args:
+        token_counter: Class with Sleep function 


The args is tpm_controller actually, isn't? Can you fix it?

lucasgomide · 2025-06-09T16:51:58Z

src/crewai/utilities/agent_utils.py

+    Args:
+        token_counter: Class with Sleep function 
+    """
+    tpm_controller(1)


I’m assuming 1 is meant to represent max_tpm. Should we instead use the value provided by the agent? Also, consider using named parameters to make the code clearer.

lucasgomide

@Ahmed22347 great work here!

I dropped some comments. I also missing tests to cover max_tpm feature. We have some max_rpm examples you can use as reference

Added tokens per minute counter

fa27b72

Merge branch 'main' into main

10ecba6

lucasgomide reviewed Jun 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add TPM Controller to Respect LLM API Token Rate Limits #2841

Add TPM Controller to Respect LLM API Token Rate Limits #2841

Uh oh!

Ahmed22347 commented May 15, 2025

Uh oh!

joaomdmoura commented May 15, 2025

Uh oh!

lucasgomide Jun 9, 2025

Uh oh!

lucasgomide Jun 9, 2025

Uh oh!

lucasgomide Jun 9, 2025

Uh oh!

lucasgomide Jun 9, 2025

Uh oh!

lucasgomide Jun 9, 2025

Uh oh!

lucasgomide left a comment

Uh oh!

Uh oh!

Add TPM Controller to Respect LLM API Token Rate Limits #2841

Are you sure you want to change the base?

Add TPM Controller to Respect LLM API Token Rate Limits #2841

Uh oh!

Conversation

Ahmed22347 commented May 15, 2025

Overview

Changes Introduced

Benefits

Notes

Uh oh!

joaomdmoura commented May 15, 2025

Code Review Comment: Added Tokens Per Minute Counter PR

Overview

Detailed Insights

1. New TPM Controller Implementation (tpm_controller.py)

2. Modifications in Agent Class (agent.py)

3. Base Agent Implementation (base_agent.py)

4. Agent Utilities (agent_utils.py)

General Recommendations

Security Considerations

Suggested Testing Scenarios

Documentation Enhancements

Uh oh!

lucasgomide Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

lucasgomide Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

lucasgomide Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

lucasgomide Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

lucasgomide Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

lucasgomide left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

1. New TPM Controller Implementation (`tpm_controller.py`)

2. Modifications in Agent Class (`agent.py`)

3. Base Agent Implementation (`base_agent.py`)

4. Agent Utilities (`agent_utils.py`)