Skip to content

Conversation

@kimsama
Copy link

@kimsama kimsama commented Jun 7, 2025

The MCP server should consider request number of tokens and be able to handle if the number even exceeds.

I specified the MODEL as gpt-4o-mini in the .env file. The problem of this setting gets the error below on the runtime:

Error generating contextual embedding: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4o-mini in organization 
org-xxxxxxxxxxxxxxxx on tokens per min (TPM): Limit 200000, Used 195663, Requested 7592. Please try again in 976ms. Visit 
https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}. 
Using original chunk instead.
                    INFO     HTTP Request: POST                  _client.py:1025
                             https://api.openai.com/v1/chat/completions "HTTP/1.1 429 Too Many Requests"

Chaning the code to apply below resolved the error gone:

  • Add explicit detection of 429 (Too Many Requests) exceptions
  • Implement retry logic based on Retry-After header or message delay
  • Integrate retry count and wait time with existing retry logic

kimsama added 2 commits June 7, 2025 13:29
- Add explicit detection of 429 (Too Many Requests) exceptions
- Implement retry logic based on Retry-After header or message delay
- Integrate retry count and wait time with existing retry logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant