Replies: 3 comments 10 replies
-
You'll need to use two limiters; one for your tokens-per-minute limit and the other for the requests-per-minute. I'm assuming you can know the number of tokens included in a API request before you send it. Then first acquire enough capacity for the token count, followed by the requests limiter: # shared across tasks
tpm_limit = AsyncLimiter(TOKENS_PER_MINUTE)
rpm_limit = AsyncLimiter(REQUESTS_PER_MINUTE)
# for any task making an OpenAI request
# token_count is the number of tokens in this specific request
await tpm_limit.acquire(token_count)
async with rpm_limit:
# this block is rate limited The |
Beta Was this translation helpful? Give feedback.
-
I wrote my own below: intitialize with:
and use with:
|
Beta Was this translation helpful? Give feedback.
-
I think I have a similar case. What we need is to compose two rate limiters, kinda like this: async def __aenter__():
for limiter in limiters:
await limiter.__aenter__()
async def __aexit__():
for limiter in reversed(limiters):
await limiter.__aexit__() This is just a rough idea, perhaps someone's tried it and can share thoughts. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am using an API key that has multiple rate limits associated (#requests_per_minute and #total_length_of_requests_per_minute). I was initially thinking that using multiple limiters might work but I am not quite sure about it. Is there any recommendation
For reference, I am using OpenAI API which has these rate limits (https://platform.openai.com/account/rate-limits)
Beta Was this translation helpful? Give feedback.
All reactions