feat:token ratelimit based on token bucket algorithm #1974

hzhswyz · 2025-03-28T12:30:45Z

Ⅰ. Describe what this PR did

token ratelimit based on token bucket algorithm.
The current rate limiting strategy based on fixed time window is not applicable to the scenario of the streaming output. On average, each round of dialogue with the llm takes about 3 minutes. With the token_per_minute strategy, the tokens in the first two minutes are wasted.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

codecov-commenter · 2025-03-28T12:36:17Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 43.74%. Comparing base (ef31e09) to head (64c007d).
Report is 420 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1974      +/-   ##
==========================================
+ Coverage   35.91%   43.74%   +7.83%     
==========================================
  Files          69       79      +10     
  Lines       11576    12728    +1152     
==========================================
+ Hits         4157     5568    +1411     
+ Misses       7104     6814     -290     
- Partials      315      346      +31

see 75 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

plugins/wasm-go/extensions/ai-token-ratelimit/README.md

plugins/wasm-go/extensions/ai-token-ratelimit/TokenBucketScript.lua

cr7258

🐉 LGTM

johnlanni

如果要支持令牌桶算法，应该是在原有限流配置上，支持切换算法。
算法选择不应该影响限流配置方式，例如都是qpd配置，既可以固定窗口实现，也可以令牌桶实现。
我们应该尽量保持配置字段的良好抽象，让用户和ai理解起来都更容易。

cr7258 · 2025-04-12T01:55:05Z

例如都是qpd配置，既可以固定窗口实现，也可以令牌桶实现

现有的限流字段 (token_per_second, token_per_minute ) 从字面上已经表明这是一个固定窗口的限流规则，如果要让原有的限流窗口也支持新的令牌桶算法，我想到有两种方法：

一个是按照当前 PR 的基础上，在 token_bucket_strategy 配置中新增一个 unit 字段进行标识，原有的固定窗口的配置 (token_per_second, token_per_minute 等) 保持不变，这样对于用户来说，配置改动最小，如果用户不需要用令牌桶算法，无需更改现有配置。

token_bucket_strategy:
  rate: 1000
  unit: minute
  capacity: 10000

另一种想法对原有的配置字段也进行调整，将原有的 token_per_second, token_per_minute 按照 unit 全部进行统一，然后按照 type 设置区分固定窗口和令牌桶限流策略。用户升级插件的话，需要相应地修改配置。

# 原有的固定窗口限流
rate_limit:
  # type: fixed_window  # 默认值，可以省略
  requests: 1000    # 每分钟 1000 个 token
  unit: minute

# 令牌桶限流
rate_limit:
  type: token_bucket
  rate: 1000  # 每分钟新增 1000 个 token 到令牌桶中
  unit: minute 
  capacity: 10000  # 容量最多 10000 个

@johnlanni WDYT?

johnlanni · 2025-04-12T02:27:10Z

我是倾向于这里的设计：
https://www.envoyproxy.io/docs/envoy/latest/api-v3/type/v3/ratelimit_strategy.proto.html#envoy-v3-api-msg-type-v3-ratelimitstrategy-requestspertimeunit

虽然他想的很好，还是跟requests_per_time_unit并列搞了个token_bucket🐶
但我觉得把token_bucket的参数默认化，也没问题的，比如100qps，就是max bucket 100，每秒填充100token

johnlanni · 2025-04-12T03:16:28Z

可以考虑下这种配置方式：

token_per_minute:1000
strategy: 
  type: token_bucket
  interval_factor: 0.5
  fill_factor: 0.5
  capacity_factor: 1

这个配置等价：

capacity: 1000
tokens_per_fill: 500
fill_interval: 30s

strategy的默认值是：

strategy: 
  type: fixed_window

因为固定窗口能满足 90% 的场景需求，所以这样的配置对 90% 的用户更容易理解，token_bucket虽然这样理解起来有复杂度，但是想要换算法的专家级用户，理解起来还是不难的。而且这样扩展性也好，未来可以支持：

strategy: 
  type: leaky_bucket
  capacity_factor: 1
  leak_rate_factor: 1

hzhswyz · 2025-04-14T03:30:14Z

可以考虑下这种配置方式：

token_per_minute:1000
strategy: 
  type: token_bucket
  interval_factor: 0.5
  fill_factor: 0.5
  capacity_factor: 1

现在基于令牌桶的限流策略，大部分只提供rate(每秒填充Token数)与capacity(桶容量)。如果使用这种配置相当于让用户能够自定义【填充间隔】与【每次填充数量】，这在lua脚本实现上会更复杂。例如token_per_minute=10，每秒的速率只有0.17，要实现起来就比较麻烦了。

hzhswyz · 2025-04-14T07:07:32Z

能否把不同的策略分开配置呢，这样或许更好理解一下。

令牌桶策略

token_bucket_strategy: 
  rate: 2
  capacity: 300

固定时间窗口策略

fixed_window_strategy: 
  tokens_per_window: 200
  window_unit: minute

然后同时兼容之前的token_per_minute等配置

hzhswyz requested review from johnlanni, CH3CHO and rinfx as code owners March 28, 2025 12:30

hzhswyz force-pushed the feat-token-bucket-ratelimit branch 2 times, most recently from 5b55b36 to 079f7b8 Compare March 28, 2025 13:06

cr7258 reviewed Apr 4, 2025

View reviewed changes

plugins/wasm-go/extensions/ai-token-ratelimit/README.md Outdated Show resolved Hide resolved

plugins/wasm-go/extensions/ai-token-ratelimit/README.md Outdated Show resolved Hide resolved

plugins/wasm-go/extensions/ai-token-ratelimit/TokenBucketScript.lua Outdated Show resolved Hide resolved

feat:support token bucket algorithm

64c007d

hzhswyz force-pushed the feat-token-bucket-ratelimit branch from e1418c7 to 64c007d Compare April 5, 2025 08:18

cr7258 approved these changes Apr 5, 2025

View reviewed changes

johnlanni requested changes Apr 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat:token ratelimit based on token bucket algorithm #1974

feat:token ratelimit based on token bucket algorithm #1974

hzhswyz commented Mar 28, 2025

codecov-commenter commented Mar 28, 2025 •

edited

Loading

cr7258 left a comment

johnlanni left a comment •

edited

Loading

cr7258 commented Apr 12, 2025

johnlanni commented Apr 12, 2025

johnlanni commented Apr 12, 2025 •

edited

Loading

hzhswyz commented Apr 14, 2025

hzhswyz commented Apr 14, 2025

feat:token ratelimit based on token bucket algorithm #1974

Are you sure you want to change the base?

feat:token ratelimit based on token bucket algorithm #1974

Conversation

hzhswyz commented Mar 28, 2025

Ⅰ. Describe what this PR did

Ⅱ. Does this pull request fix one issue?

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

codecov-commenter commented Mar 28, 2025 • edited Loading

Codecov Report

cr7258 left a comment

Choose a reason for hiding this comment

johnlanni left a comment • edited Loading

Choose a reason for hiding this comment

cr7258 commented Apr 12, 2025

johnlanni commented Apr 12, 2025

johnlanni commented Apr 12, 2025 • edited Loading

hzhswyz commented Apr 14, 2025

hzhswyz commented Apr 14, 2025

codecov-commenter commented Mar 28, 2025 •

edited

Loading

johnlanni left a comment •

edited

Loading

johnlanni commented Apr 12, 2025 •

edited

Loading