-
Notifications
You must be signed in to change notification settings - Fork 1.4k
plugin/decision: check if event is too large after compression #7521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…D cache sparingly Renamed the "soft" limit to "uncompressed limit" throughout the code and documentation for clarity. In the size and event buffer the uncompressed limit was being dropped after each upload, now it is carried over. The event buffer doesn't reset the encoder at all. Checking if an individual size is too big was comparing the uncompressed limit to the compressed limit causing events to be dropped or lose the ND cache unnecesarily. This is now fixed, instead if the uncompressed limit allows it the event is compressed and then multiple attempts are made before losing the ND cache or dropping the event. The configurable upload is used to calculate the uncompressed size by exponentially growing it, this could cause an overflow if it was set too high. Added a max. Signed-off-by: sspaink <[email protected]>
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Signed-off-by: sspaink <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts and questions.
Signed-off-by: sspaink <[email protected]>
Signed-off-by: sspaink <[email protected]>
Signed-off-by: sspaink <[email protected]>
@johanfylling I've updated the logic to find an event that is too big to instead make use of the recursion that splits chunks when the uncompressed limit grows too large. Now the uncompressed limit is taken in account, and the first event that is written helps adjust the uncompressed limit to a reasonable starting point opposed to growing from the upload size limit. Also added a new histogram metric to track the number of events in each chunk. Not sure how useful this is for users 🤔 at the moment I am just using it in Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions.
It's been a while since I looked at this last, so sorry if I'm rehashing old stuff 😅.
* track maxEventSize to drop ND cache quicker Signed-off-by: sspaink <[email protected]>
Signed-off-by: sspaink <[email protected]>
Signed-off-by: sspaink <[email protected]>
* revert not dropping adaptive uncompressed limit Signed-off-by: sspaink <[email protected]>
@johanfylling I think I put too much into one pull request, so I decided to split up the changes into separate issues/PRs.
I think this should help make the reviewing a little easier. I also added better documentation describing the specific problem and how to reproduce it in each issue/PR. Sorry for not doing this to begin with, I don't think it affects any of your most recent review comments. Thank you for bearing with me 😄 |
Signed-off-by: sspaink <[email protected]>
Signed-off-by: sspaink <[email protected]>
Signed-off-by: Sebastian Spaink <[email protected]>
Signed-off-by: sspaink <[email protected]>
Signed-off-by: sspaink <[email protected]>
Signed-off-by: sspaink <[email protected]>
Signed-off-by: sspaink <[email protected]>
Signed-off-by: sspaink <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional comments.
Signed-off-by: sspaink <[email protected]>
Signed-off-by: sspaink <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran some perf tests of my own; plerformance looks largely unchanged.

main:
Requests: 200
Requests total: 100000
Duration: 1m18.516058833s
Max concurrency: 500
Average req/s: 1273.6248034646687
timer_rego_query_eval_ns: Min: 5.375µs, Max: 817.916µs, Mean: 21.718µs, P50: 13.479µs, P75: 17.489µs, P90: 39.924µs, P95: 61.302µs, P99: 161.134µs, P99.9: 809.603µs, P99.99: 817.916µs
duration: Min: 180.792µs, Max: 88.047416ms, Mean: 678.158µs, P50: 385.375µs, P75: 477.729µs, P90: 796.262µs, P95: 1.218756ms, P99: 3.42696ms, P99.9: 87.394911ms, P99.99: 88.047416ms
timer_server_handler_ns: Min: 68.542µs, Max: 11.609167ms, Mean: 231.362µs, P50: 170.854µs, P75: 217.093µs, P90: 328.608µs, P95: 526.806µs, P99: 1.156035ms, P99.9: 11.396883ms, P99.99: 11.609167ms
timer_rego_external_resolve_ns: Min: 41ns, Max: 3.958µs, Mean: 115ns, P50: 84ns, P75: 125ns, P90: 167ns, P95: 208ns, P99: 334ns, P99.9: 3.868µs, P99.99: 3.958µs
timer_rego_query_compile_ns: Min: 10.875µs, Max: 385.125µs, Mean: 30.839µs, P50: 25.187µs, P75: 30.656µs, P90: 41.271µs, P95: 58.918µs, P99: 235.747µs, P99.9: 384.389µs, P99.99: 385.125µs
Peaks:
duration: 93.717667ms
timer_server_handler_ns: 93.573541ms
timer_rego_external_resolve_ns: 343.375µs
timer_rego_query_compile_ns: 7.865875ms
timer_rego_query_eval_ns: 4.087875ms
Peak duration: 93.717667ms
PR:
Global metrics:
Requests: 200
Requests total: 100000
Duration: 1m18.222317792s
Max concurrency: 500
Average req/s: 1278.4075289856378
duration: Min: 164.5µs, Max: 63.79725ms, Mean: 698.156µs, P50: 398.854µs, P75: 497.385µs, P90: 905.012µs, P95: 1.423327ms, P99: 4.176799ms, P99.9: 63.694633ms, P99.99: 63.79725ms
timer_rego_query_eval_ns: Min: 5.75µs, Max: 408.417µs, Mean: 20.925µs, P50: 13.833µs, P75: 17.364µs, P90: 39.675µs, P95: 55.189µs, P99: 180.39µs, P99.9: 406.198µs, P99.99: 408.417µs
timer_server_handler_ns: Min: 70.666µs, Max: 33.640792ms, Mean: 247.448µs, P50: 180.375µs, P75: 218.073µs, P90: 293.366µs, P95: 499.251µs, P99: 1.030884ms, P99.9: 32.711279ms, P99.99: 33.640792ms
timer_rego_external_resolve_ns: Min: 41ns, Max: 2.583µs, Mean: 116ns, P50: 125ns, P75: 125ns, P90: 167ns, P95: 167ns, P99: 292ns, P99.9: 2.547µs, P99.99: 2.583µs
timer_rego_query_compile_ns: Min: 11.208µs, Max: 739.959µs, Mean: 32.406µs, P50: 26.145µs, P75: 30.833µs, P90: 39.595µs, P95: 57.502µs, P99: 227.093µs, P99.9: 733.351µs, P99.99: 739.959µs
Peaks:
timer_rego_query_eval_ns: 2.902666ms
timer_server_handler_ns: 91.444459ms
timer_rego_external_resolve_ns: 299.792µs
timer_rego_query_compile_ns: 2.533792ms
duration: 92.222292ms
Peak duration: 92.222292ms
Looks like we're nearing the end of this story 🙂. I think there might be just the one thing left to fix.
Signed-off-by: sspaink <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! 😃
Let's get this baby in! 🎉
Why the changes in this PR are needed?
resolve: #7526
What are the changes in this PR?
This PR changes what happens when an event is written to the chunk encoder when calling
Write
andWriteBytes
. Originally the incoming event uncompressed size was compared to the compressed limit causing the issue. To fix this, the logic has changed to rely on the adaptive uncompressed limit to prevent large events from sneaking into a chunk. In case the uncompressed limit is wrong, the events are decoded and written recursively into a chunk. The base case is that the incoming event is the first event being written into a chunk. This is when the event is compressed and the ND cache or the entire event can be dropped, the benefit is that in case the event is too big even after compression only a single event had to be compressed multiple times.Moving the logic when to drop the ND cache into the encoder also has the benefit that the
size
andevent
buffer can reuse the logic.The variable
soft limit
has also been renamed touncompressed limit
throughout the code and documentation to help clarify what it is meant to represent.Notes to assist PR review:
Repeating the reproduction steps outlined in #7526, but using a build with the changes in this PR no error is logged.