Skip to content

feat(quota): add server‑side per‑client request quotas (requires auth) #2096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 21, 2025

Conversation

liangwen12year
Copy link
Contributor

@liangwen12year liangwen12year commented May 2, 2025

What does this PR do?

feat(quota): add server‑side per‑client request quotas (requires auth)

Unrestricted usage can lead to runaway costs and fragmented client-side
workarounds. This commit introduces a native quota mechanism to the
server, giving operators a unified, centrally managed throttle for
per-client requests—without needing extra proxies or custom client
logic.  This helps contain cloud-compute expenses, enables fine-grained
usage control, and simplifies deployment and monitoring of Llama Stack
services.  Quotas are fully opt-in and have no effect unless explicitly
configured.

Notice that Quotas are fully opt-in and require authentication to be
enabled. The 'sqlite' is the only supported quota `type` at this time,
any other `type` will  be rejected. And the only supported `period` is
'day'.

Highlights:

- Adds `QuotaMiddleware` to enforce per-client request quotas:
  - Uses `Authorization: Bearer <client_id>` (from
    AuthenticationMiddleware)
  - Tracks usage via a SQLite-based KV store
  - Returns 429 when the quota is exceeded

- Extends `ServerConfig` with a `quota` section (type + config)

- Enforces strict coupling: quotas require authentication or the server
  will fail to start

Behavior changes:
- Quotas are disabled by default unless explicitly configured
- SQLite defaults to `./quotas.db` if no DB path is set
- The server requires authentication when quotas are enabled

To enable per-client request quotas in `run.yaml`, add:
```
server:
  port: 8321
  auth:
    provider_type: "custom"
    config:
      endpoint: "https://auth.example.com/validate"
  quota:
    type: sqlite
    config:
      db_path: ./quotas.db
      limit:
        max_requests: 1000
        period: day

Closes #2093

Test Plan

[Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.]

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 2, 2025
@liangwen12year liangwen12year marked this pull request as draft May 2, 2025 22:20
@liangwen12year liangwen12year force-pushed the per_client_quota_support branch 2 times, most recently from 572571a to cdfd3dd Compare May 2, 2025 23:35
@liangwen12year liangwen12year marked this pull request as ready for review May 2, 2025 23:39
@leseb
Copy link
Collaborator

leseb commented May 5, 2025

@liangwen12year please resolve conflicts

Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected this to work alongside the authentication middleware, but that doesn’t seem to be the case currently. As it stands, rate-limiting accepts any Bearer token without validation, so someone could bypass it by simply altering the token. Could we enforce the presence of a valid auth configuration as a prerequisite for enabling rate-limiting?

Am I missing something?

@liangwen12year
Copy link
Contributor Author

I expected this to work alongside the authentication middleware, but that doesn’t seem to be the case currently. As it stands, rate-limiting accepts any Bearer token without validation, so someone could bypass it by simply altering the token. Could we enforce the presence of a valid auth configuration as a prerequisite for enabling rate-limiting?

Am I missing something?

Thanks for the hint, we definitely need to have the authentication first.

@liangwen12year liangwen12year force-pushed the per_client_quota_support branch 2 times, most recently from 64a4fd2 to ebbebef Compare May 6, 2025 01:54
@liangwen12year liangwen12year changed the title feat(quota): add server‑side per‑client request quotas feat(quota): add server‑side per‑client request quotas (requires auth) May 6, 2025
@liangwen12year liangwen12year force-pushed the per_client_quota_support branch 2 times, most recently from 4a6b42d to e863b81 Compare May 6, 2025 21:53
Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove the uv.lock from this PR?

Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a case to be made for authenticated users having a higher amount of requests than anonymous. Now I'm starting to wonder if we shouldn't support quota with auth disabled and give a low quota. What do you think?

Sorry for the back and forth on auth required versus not required 😅

@liangwen12year liangwen12year force-pushed the per_client_quota_support branch 2 times, most recently from 28d2969 to 8efb5b4 Compare May 7, 2025 14:44
@liangwen12year
Copy link
Contributor Author

can you remove the uv.lock from this PR?

Removed, thanks !

@liangwen12year liangwen12year force-pushed the per_client_quota_support branch 3 times, most recently from 7515871 to c98fae0 Compare May 7, 2025 16:40
@liangwen12year
Copy link
Contributor Author

liangwen12year commented May 7, 2025

I think there is a case to be made for authenticated users having a higher amount of requests than anonymous. Now I'm starting to wonder if we shouldn't support quota with auth disabled and give a low quota. What do you think?

I completely agree, there’s a strong case for supporting quotas even when authentication is disabled. This would allow the system to gracefully throttle anonymous users rather than blocking them entirely, encourage users to sign up by offering a higher quota for authenticated accounts, and provide a safeguard against abuse even in the absence of authentication. It would also make the middleware more flexible and applicable to a wider range of real-world API use cases.

A practical approach could be to use the authenticated_client_id as the key when authentication is enabled, and fall back to an alternative like the client’s IP address (from scope["client"]) when it’s not. It might be valuable to introduce two configurable limits: one for anonymous users (with a low default) and a higher one for authenticated users.

Overall, I think this is a meaningful improvement that would make the feature much more versatile.

@liangwen12year liangwen12year force-pushed the per_client_quota_support branch from c98fae0 to 57cc16a Compare May 9, 2025 02:47
@liangwen12year liangwen12year force-pushed the per_client_quota_support branch 2 times, most recently from ebd0134 to 66774be Compare May 14, 2025 04:38
@leseb leseb mentioned this pull request May 15, 2025
27 tasks
@liangwen12year liangwen12year force-pushed the per_client_quota_support branch 5 times, most recently from d7da44e to 46c6c48 Compare May 16, 2025 15:31
Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit and we should be good to go, thanks for your patience!

@liangwen12year liangwen12year force-pushed the per_client_quota_support branch 3 times, most recently from b2b3813 to 69c3844 Compare May 20, 2025 11:38
@liangwen12year liangwen12year requested a review from bbrowning as a code owner May 20, 2025 11:38
@liangwen12year liangwen12year force-pushed the per_client_quota_support branch 2 times, most recently from 1074c13 to d640d7c Compare May 20, 2025 13:15
@liangwen12year liangwen12year force-pushed the per_client_quota_support branch from d640d7c to 198fcb2 Compare May 20, 2025 13:28
Unrestricted API usage can lead to runaway costs and fragmented client-side
throttling logic. This commit introduces a built-in quota mechanism at the
server level, enabling operators to centrally enforce per-client and anonymous
rate limits—without needing external proxies or client changes.

This helps contain compute costs, enforces fair usage, and simplifies deployment
and monitoring of Llama Stack services. Quotas are fully opt-in and have no
effect unless explicitly configured.

Currently, SQLite is the only supported KV store. If quotas are
configured but authentication is disabled, authenticated limits will
gracefully fall back to anonymous limits.

Highlights:
- Adds `QuotaMiddleware` to enforce request quotas:
  - Uses bearer token as client ID if present; otherwise falls back to IP address
  - Tracks requests in KV store with per-key TTL expiration
  - Returns HTTP 429 if a client exceeds their quota

- Extends `ServerConfig` with a `quota` section:
  - `kvstore`: configuration for the backend (currently only SQLite)
  - `anonymous_max_requests`: per-period cap for unauthenticated clients
  - `authenticated_max_requests`: per-period cap for authenticated clients
  - `period`: duration of the quota window (currently only `day` is supported)

- Adds full test coverage with FastAPI `TestClient` and custom middleware injection

Behavior changes:
- Quotas are disabled by default unless explicitly configured
- Anonymous users get a conservative default quota; authenticated clients can be given more generous limits

To enable per-client request quotas in `run.yaml`, add:
```yaml
server:
  port: 8321
  auth:
    provider_type: custom
    config:
      endpoint: https://auth.example.com/validate
  quota:
    kvstore:
      type: sqlite
      db_path: ./quotas.db
    anonymous_max_requests: 100
    authenticated_max_requests: 1000
    period: day
```

Signed-off-by: Wen Liang <[email protected]>
@liangwen12year liangwen12year force-pushed the per_client_quota_support branch from 198fcb2 to dacd522 Compare May 20, 2025 13:32
@leseb leseb merged commit 2890243 into meta-llama:main May 21, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFE: user or client quota
3 participants