-
Notifications
You must be signed in to change notification settings - Fork 205
Support rate limiting by concurrent requests #1986
Copy link
Copy link
Open
Labels
Description
Description:
Currently, Envoy AI Gateway utilizes token-based rate limiting (https://aigateway.envoyproxy.io/docs/capabilities/traffic/usage-based-ratelimiting). This is useful for commercial API endpoints that pay by token but not for self-hosted API endpoints like vLLM or SGLang.
As an alternative, there should be rate limiting applied directly in Envoy AI Gateway to rate limit by concurrent requests per Envoy AI Gateway API tokens (user). This is much more useful for restricting concurrent requests per Envoy API key in self-hosted multi-tenant environments.
Reactions are currently unavailable