-
Notifications
You must be signed in to change notification settings - Fork 634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat:token ratelimit based on token bucket algorithm #1974
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1974 +/- ##
==========================================
+ Coverage 35.91% 43.74% +7.83%
==========================================
Files 69 79 +10
Lines 11576 12728 +1152
==========================================
+ Hits 4157 5568 +1411
+ Misses 7104 6814 -290
- Partials 315 346 +31 🚀 New features to boost your workflow:
|
5b55b36
to
079f7b8
Compare
plugins/wasm-go/extensions/ai-token-ratelimit/TokenBucketScript.lua
Outdated
Show resolved
Hide resolved
e1418c7
to
64c007d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐉 LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果要支持令牌桶算法,应该是在原有限流配置上,支持切换算法。
算法选择不应该影响限流配置方式,例如都是qpd配置,既可以固定窗口实现,也可以令牌桶实现。
我们应该尽量保持配置字段的良好抽象,让用户和ai理解起来都更容易。
现有的限流字段 ( 一个是按照当前 PR 的基础上,在 token_bucket_strategy:
rate: 1000
unit: minute
capacity: 10000 另一种想法对原有的配置字段也进行调整,将原有的 # 原有的固定窗口限流
rate_limit:
# type: fixed_window # 默认值,可以省略
requests: 1000 # 每分钟 1000 个 token
unit: minute
# 令牌桶限流
rate_limit:
type: token_bucket
rate: 1000 # 每分钟新增 1000 个 token 到令牌桶中
unit: minute
capacity: 10000 # 容量最多 10000 个 @johnlanni WDYT? |
虽然他想的很好,还是跟requests_per_time_unit并列搞了个token_bucket🐶 |
可以考虑下这种配置方式:
这个配置等价:
strategy的默认值是:
因为固定窗口能满足 90% 的场景需求,所以这样的配置对 90% 的用户更容易理解,token_bucket虽然这样理解起来有复杂度,但是想要换算法的专家级用户,理解起来还是不难的。而且这样扩展性也好,未来可以支持:
|
现在基于令牌桶的限流策略,大部分只提供rate(每秒填充Token数)与capacity(桶容量)。如果使用这种配置相当于让用户能够自定义【填充间隔】与【每次填充数量】,这在lua脚本实现上会更复杂。例如token_per_minute=10,每秒的速率只有0.17,要实现起来就比较麻烦了。 |
能否把不同的策略分开配置呢,这样或许更好理解一下。 令牌桶策略
固定时间窗口策略
然后同时兼容之前的token_per_minute等配置 |
Ⅰ. Describe what this PR did
token ratelimit based on token bucket algorithm.
The current rate limiting strategy based on fixed time window is not applicable to the scenario of the streaming output. On average, each round of dialogue with the llm takes about 3 minutes. With the token_per_minute strategy, the tokens in the first two minutes are wasted.
Ⅱ. Does this pull request fix one issue?
Ⅲ. Why don't you add test cases (unit test/integration test)?
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews