-
Notifications
You must be signed in to change notification settings - Fork 1k
feat(quota): add server‑side per‑client request quotas (requires auth) #2096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(quota): add server‑side per‑client request quotas (requires auth) #2096
Conversation
572571a
to
cdfd3dd
Compare
@liangwen12year please resolve conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expected this to work alongside the authentication middleware, but that doesn’t seem to be the case currently. As it stands, rate-limiting accepts any Bearer token without validation, so someone could bypass it by simply altering the token. Could we enforce the presence of a valid auth configuration as a prerequisite for enabling rate-limiting?
Am I missing something?
Thanks for the hint, we definitely need to have the authentication first. |
64a4fd2
to
ebbebef
Compare
4a6b42d
to
e863b81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove the uv.lock from this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a case to be made for authenticated users having a higher amount of requests than anonymous. Now I'm starting to wonder if we shouldn't support quota with auth disabled and give a low quota. What do you think?
Sorry for the back and forth on auth required versus not required 😅
28d2969
to
8efb5b4
Compare
Removed, thanks ! |
7515871
to
c98fae0
Compare
I completely agree, there’s a strong case for supporting quotas even when authentication is disabled. This would allow the system to gracefully throttle anonymous users rather than blocking them entirely, encourage users to sign up by offering a higher quota for authenticated accounts, and provide a safeguard against abuse even in the absence of authentication. It would also make the middleware more flexible and applicable to a wider range of real-world API use cases. A practical approach could be to use the Overall, I think this is a meaningful improvement that would make the feature much more versatile. |
c98fae0
to
57cc16a
Compare
ebd0134
to
66774be
Compare
d7da44e
to
46c6c48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit and we should be good to go, thanks for your patience!
b2b3813
to
69c3844
Compare
1074c13
to
d640d7c
Compare
d640d7c
to
198fcb2
Compare
Unrestricted API usage can lead to runaway costs and fragmented client-side throttling logic. This commit introduces a built-in quota mechanism at the server level, enabling operators to centrally enforce per-client and anonymous rate limits—without needing external proxies or client changes. This helps contain compute costs, enforces fair usage, and simplifies deployment and monitoring of Llama Stack services. Quotas are fully opt-in and have no effect unless explicitly configured. Currently, SQLite is the only supported KV store. If quotas are configured but authentication is disabled, authenticated limits will gracefully fall back to anonymous limits. Highlights: - Adds `QuotaMiddleware` to enforce request quotas: - Uses bearer token as client ID if present; otherwise falls back to IP address - Tracks requests in KV store with per-key TTL expiration - Returns HTTP 429 if a client exceeds their quota - Extends `ServerConfig` with a `quota` section: - `kvstore`: configuration for the backend (currently only SQLite) - `anonymous_max_requests`: per-period cap for unauthenticated clients - `authenticated_max_requests`: per-period cap for authenticated clients - `period`: duration of the quota window (currently only `day` is supported) - Adds full test coverage with FastAPI `TestClient` and custom middleware injection Behavior changes: - Quotas are disabled by default unless explicitly configured - Anonymous users get a conservative default quota; authenticated clients can be given more generous limits To enable per-client request quotas in `run.yaml`, add: ```yaml server: port: 8321 auth: provider_type: custom config: endpoint: https://auth.example.com/validate quota: kvstore: type: sqlite db_path: ./quotas.db anonymous_max_requests: 100 authenticated_max_requests: 1000 period: day ``` Signed-off-by: Wen Liang <[email protected]>
198fcb2
to
dacd522
Compare
What does this PR do?
Closes #2093
Test Plan
[Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.]