-
Notifications
You must be signed in to change notification settings - Fork 43
[processor/ratelimit] Add gubernator behavior config #891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
marclop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested this in unit and also manually? From my previous testing it had some unintended behavior I saw. At the minimum, we should update our tests to test both BATCHNG and GLOBAL.
No not yet, I'm currently setting up the env to test this manually, and also looking at the existing unit tests. |
|
It seems the unit tests broken by GLOBAL are mainly the dynamic rate limiter ones? The tests check for remaining hits in the event channel after each rate limit request, but GLOBAL behavior does not guarantee that the remaining hits will be accurate so the tests fail. EDIT: It seems GLOBAL request is somehow double counting the hits or something. The below event is supposed to have 940 remaining but has 880 instead. |
marclop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update the README.md to include the gubernator_behavior field, and link to the Gubernator Architecture.md.
42a77e6 to
c6c0c8f
Compare
simitt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please wait for @marclop 's final approval since he left some questions in his last review.
marclop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM, thanks for updating all the unit tests. Did you manage to test this manually with GLOBAL behavior to ensure the limits are properly applied as intended?
So, I have tested that the rate limit is applied, by using a low rate limit threshold and invoking benchmark load test on it. But I'm not sure if that counts as |
|
Benchmarked GLOBAL vs BATCHING behavior, with 2 replicas for ingest collector. PerformanceWith BATCHING: With GLOBAL: The GLOBAL behavior seems to be consistently performing better, presumably due to using local cache for rate limit requests. Rate Limit LogsBoth also have rate limit being applied at similar frequencies: With BATCHING: With GLOBAL: |
|
Test failure seems unrelated to our change: |
Can you also link the resource usage across the Ingest collector when you performed the test?
Might be related? Since we are using the value from cache? Would be good to validate that. |
The |
With BATCHING, CPU usage around 95% (of 500m) and memory around 12% of (1Gi). With GLOBAL, CPU usage around 95% (of 500m) and memory around 20% (of 1Gi). |
Do we know the reason for the memory growth? Seems to be a big difference when it comes down to caching the limits locally with the global behaviour? |
Summary
Add config for gubernator behavior.
Testing
Unit testing is done via Go tests.
Manual testing is done as follow:
go.modreplacement to use newratelimitprocessorgubernator_behavior