Skip to content

Support rate limiting by concurrent requests #1986

@ehfd

Description

@ehfd

Description:

Currently, Envoy AI Gateway utilizes token-based rate limiting (https://aigateway.envoyproxy.io/docs/capabilities/traffic/usage-based-ratelimiting). This is useful for commercial API endpoints that pay by token but not for self-hosted API endpoints like vLLM or SGLang.

As an alternative, there should be rate limiting applied directly in Envoy AI Gateway to rate limit by concurrent requests per Envoy AI Gateway API tokens (user). This is much more useful for restricting concurrent requests per Envoy API key in self-hosted multi-tenant environments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions