Support rate limiting by concurrent requests

_Description_:

Currently, Envoy AI Gateway utilizes token-based rate limiting (https://aigateway.envoyproxy.io/docs/capabilities/traffic/usage-based-ratelimiting). This is useful for commercial API endpoints that pay by token but not for self-hosted API endpoints like vLLM or SGLang.

As an alternative, there should be rate limiting applied directly in Envoy AI Gateway to rate limit by concurrent requests per Envoy AI Gateway API tokens (user). This is much more useful for restricting concurrent requests per Envoy API key in self-hosted multi-tenant environments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support rate limiting by concurrent requests #1986

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support rate limiting by concurrent requests #1986

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions