Using multiple limiters #146

Naman-ntc · 2023-07-25T22:26:35Z

Naman-ntc
Jul 25, 2023

I am using an API key that has multiple rate limits associated (#requests_per_minute and #total_length_of_requests_per_minute). I was initially thinking that using multiple limiters might work but I am not quite sure about it. Is there any recommendation

For reference, I am using OpenAI API which has these rate limits (https://platform.openai.com/account/rate-limits)

mjpieters · 2023-07-26T16:54:00Z

mjpieters
Jul 26, 2023
Maintainer

You'll need to use two limiters; one for your tokens-per-minute limit and the other for the requests-per-minute. I'm assuming you can know the number of tokens included in a API request before you send it.

Then first acquire enough capacity for the token count, followed by the requests limiter:

# shared across tasks
tpm_limit = AsyncLimiter(TOKENS_PER_MINUTE)
rpm_limit = AsyncLimiter(REQUESTS_PER_MINUTE)

# for any task making an OpenAI request
# token_count is the number of tokens in this specific request
await tpm_limit.acquire(token_count)
async with rpm_limit:
    # this block is rate limited

The await tpm_limit.acquire(token_count) call only completes if you are still below the token rate limit in a 60 second window.

4 replies

Naman-ntc Jul 26, 2023
Author

Thanks but I feel this might lead to wrong handling in certain situations. Specifically, since you might acquire some tokens earlier but use it 30 seconds later because of rpm_limit, it might lead to wrong token used count at 60 seconds
Let me know if I made some stupid mistake!

Sorry for the long example!

at t=0 Let us assume the rpm_limit got full but tpm_limit has 50 tokens space. A request comes and takes the 50 tokens space but does not use it till t=30 when rpm_limit frees.
at t=60, 2 requests come which would contain the entire TOKENS_PER_MINUTE amount and tpm_limit would allow them in since according to it t=0 was when the tokens were acquired.

However, since that happened at t=30 one of the requests should have been blocked

mjpieters Jul 26, 2023
Maintainer

Then swap the checks and obtain a RPM slot first. Or, accept the possibility of race-conditions and check for capacity in both limiters (with has_capacity()) before committing to acquiring capacity. Because acquisition is an async operation the other limiter can have run out of capacity after checking.

For more sophisticated checks and maximum possible throughput you'd have to do a lot more work, queuing up requests with varying token sizes and select one of these based on the available capacity. Tuning a request flow for best use of the available tokens-per-minute resources is an optimisation problem that has no one-size-fits all solution; you'll have to accept that there will be combinations of requests and limiter configurations that lead to sub-optimal execution speeds.

Naman-ntc Jul 26, 2023
Author

I see, thanks for the explanation! I don't have a lot of background but I was thinking maybe modifying the has_capacity method (

aiolimiter/src/aiolimiter/leakybucket.py

Line 65 in 4c547a1

def has_capacity(self, amount: float = 1) -> bool:

) to take both limits together might solve the issue (perhaps not giving optimal throughput)

codercalum Apr 17, 2024

@Naman-ntc did you ever find a nice solution for this? Running into the same issue where sometimes we go over one rate limit because of race-conditions

codercalum · 2024-04-17T23:48:23Z

codercalum
Apr 17, 2024

I wrote my own below:

intitialize with:

        # request limiter
        self.request_limiter = CustomAsyncLimiter(
            max_rate=10,
            time_period=60,
            max_rate2=200,
            time_period2=60,
        )

and use with:

await self.request_limiter.acquire(1, tokens)

# SPDX-License-Identifier: MIT
# Copyright (c) 2019 Martijn Pieters
# Licensed under the MIT license as detailed in LICENSE.txt

import asyncio
from contextlib import AbstractAsyncContextManager
from types import TracebackType
from typing import Dict, Optional, Type

from aiolimiter.compat import wait_for


class CustomAsyncLimiter(AbstractAsyncContextManager):
    """A leaky bucket rate limiter.

    This is an :ref:`asynchronous context manager <async-context-managers>`;
    when used with :keyword:`async with`, entering the context acquires
    capacity::

        limiter = AsyncLimiter(10)
        for foo in bar:
            async with limiter:
                # process foo elements at 10 items per minute

    :param max_rate: Allow up to `max_rate` / `time_period` acquisitions before
       blocking.
    :param time_period: duration, in seconds, of the time period in which to
       limit the rate. Note that up to `max_rate` acquisitions are allowed
       within this time period in a burst.

    """

    __slots__ = (
        "max_rate",
        "time_period",
        "max_rate2",
        "time_period2",
        "_rate_per_sec",
        "_level",
        "_last_check",
        "_waiters",
    )

    max_rate: float  #: The configured `max_rate` value for this limiter.
    time_period: float  #: The configured `time_period` value for this limiter.

    max_rate2: float  #: The configured `max_rate` value for this limiter.
    time_period2: float  #: The configured `time_period` value for this limiter.

    def __init__(self, max_rate: float, time_period: float, max_rate2: float, time_period2: float) -> None:

        # First rate limit
        self.max_rate = max_rate
        self.time_period = time_period
        self._rate_per_sec = max_rate / time_period
        self._level = 0.0

        # Second rate limit
        self.max_rate2 = max_rate2
        self.time_period2 = time_period2
        self._rate_per_sec2 = max_rate2 / time_period2
        self._level2 = 0.0

        self._last_check = 0.0
        # queue of waiting futures to signal capacity to
        self._waiters: Dict[asyncio.Task, asyncio.Future] = {}

    def _leak(self) -> None:
        """Drip out capacity from the bucket."""
        loop = asyncio.get_running_loop()
        if self._level:
            # drip out enough level for the elapsed time since
            # we last checked
            elapsed = loop.time() - self._last_check
            decrement = elapsed * self._rate_per_sec
            self._level = max(self._level - decrement, 0)

        if self._level2:
            # drip out enough level for the elapsed time since
            # we last checked
            elapsed = loop.time() - self._last_check
            decrement = elapsed * self._rate_per_sec2
            self._level2 = max(self._level2 - decrement, 0)

        self._last_check = loop.time()

    def has_capacity(self, amount: float = 1, amount2: float = 1) -> bool:
        """Check if there is enough capacity remaining in the limiter

        :param amount: How much capacity you need to be available.
        :param amount2: How much capacity you need to be available for the second rate limit

        """
        self._leak()
        requested = self._level + amount
        requested2 = self._level2 + amount2
        # if there are tasks waiting for capacity, signal to the first
        # there there may be some now (they won't wake up until this task
        # yields with an await)
        if requested < self.max_rate and requested2 < self.max_rate2:
            for fut in self._waiters.values():
                if not fut.done():
                    fut.set_result(True)
                    break
        return self._level + amount <= self.max_rate and self._level2 + amount2 <= self.max_rate2

    async def acquire(self, amount: float = 1, amount2: float = 1) -> None:
        """Acquire capacity in the limiter.

        If the limit has been reached, blocks until enough capacity has been
        freed before returning.

        :param amount: How much capacity you need to be available.
        :param amount2: How much capacity you need to be available for the second rate limiter
        :exception: Raises :exc:`ValueError` if `amount` is greater than
           :attr:`max_rate`.

        """
        if amount > self.max_rate or amount2 > self.max_rate2:
            raise ValueError("Can't acquire more than the maximum capacity")

        loop = asyncio.get_running_loop()
        task = asyncio.current_task(loop)
        assert task is not None
        while not self.has_capacity(amount, amount2):
            # wait for the next drip to have left the bucket
            # add a future to the _waiters map to be notified
            # 'early' if capacity has come up
            fut = loop.create_future()
            self._waiters[task] = fut
            try:
                await wait_for(
                    asyncio.shield(fut), min(1 / self._rate_per_sec * amount, 1 / self._rate_per_sec2 * amount2), loop=loop
                )
            except asyncio.TimeoutError:
                pass
            fut.cancel()
        self._waiters.pop(task, None)

        self._level += amount
        self._level2 += amount2

        return None

    async def __aenter__(self) -> None:
        await self.acquire()
        return None

    async def __aexit__(
        self,
        exc_type: Optional[Type[BaseException]],
        exc: Optional[BaseException],
        tb: Optional[TracebackType],
    ) -> None:
        return None

3 replies

codercalum Apr 22, 2024

@mjpieters do you see any reason this wouldn't work?

mjpieters May 22, 2024
Maintainer

I'm sorry, I don't have the bandwidth to critique your approach in detail. I frankly don't see any advantage in your approach, though.

codercalum May 22, 2024

No problem. The value of the approach is that when using two limiters you can have a scenario where:

The first limiter says there is capacity - so we pull capacity from the rate limit
The second limiter says there isn't capacity so we wait x seconds until there is capacity.
Now we actually make a request x seconds later when rate limiter 2 allows us. But we took rate limit from 1 x seconds ago instead of now. Meaning its going to incorrectly hydrate our rate limiter earlier then it should. (Since we took capacity x seconds ago instead of when we actually made the request).

My approach ensures you only take capacity from both limiters when you can actually make the request

rafalkrupinski · 2025-03-25T14:29:33Z

rafalkrupinski
Mar 25, 2025

I think I have a similar case.
The service I'm using has two rate limits - sustained (1000reqs/h) and burst (200reqs/m). Both limiters must be satisfied, so for example, if I use 200 requests in a minute, I'm left with 800 for the remaining 59 minutes.
In my case I can work fine with sustained rate limit (tho I need to limit calls to (1000/60)/m ⋍ 16.6/m), but it's actually quite common setup.

What we need is to compose two rate limiters, kinda like this:

async def __aenter__():
  for limiter in limiters:
    await limiter.__aenter__()

async def __aexit__():
  for limiter in reversed(limiters):
    await limiter.__aexit__()

This is just a rough idea, perhaps someone's tried it and can share thoughts.

3 replies

codercalum Mar 26, 2025

@rafalkrupinski I think you can use my code above with:

    self.request_limiter = CustomAsyncLimiter(
        max_rate=1000,
        time_period=3600,
        max_rate2=200,
        time_period2=60,
    )

It should only let your requests through when there is capacity for both rate limits available

codercalum Mar 26, 2025

You are also safe to just initialize two Async rate limiters and do:

async with limiter1:
async with limiter2:

because both limiters are using the same unit 1 request = 1 request

The problem with tokens and requests is one 1 request could be any amount of tokens so you need them to be sync'd

rafalkrupinski Mar 26, 2025

yeah, that's a bit different case when the two limiters keep metrics for the same resource or a different ones

Using multiple limiters #146

Uh oh!

Replies: 3 comments · 10 replies

Uh oh!

Uh oh!

mjpieters Jul 26, 2023 Maintainer

Uh oh!

Naman-ntc Jul 26, 2023 Author

Uh oh!

Uh oh!

mjpieters Jul 26, 2023 Maintainer

Uh oh!

Uh oh!

Naman-ntc Jul 26, 2023 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mjpieters May 22, 2024 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 3 comments 10 replies

mjpieters
Jul 26, 2023
Maintainer

Naman-ntc Jul 26, 2023
Author

mjpieters Jul 26, 2023
Maintainer

Naman-ntc Jul 26, 2023
Author

mjpieters May 22, 2024
Maintainer