plugin/decision: check if event is too large after compression #7521

sspaink · 2025-04-17T21:54:24Z

Why the changes in this PR are needed?

resolve: #7526

What are the changes in this PR?

This PR changes what happens when an event is written to the chunk encoder when calling Write and WriteBytes. Originally the incoming event uncompressed size was compared to the compressed limit causing the issue. To fix this, the logic has changed to rely on the adaptive uncompressed limit to prevent large events from sneaking into a chunk. In case the uncompressed limit is wrong, the events are decoded and written recursively into a chunk. The base case is that the incoming event is the first event being written into a chunk. This is when the event is compressed and the ND cache or the entire event can be dropped, the benefit is that in case the event is too big even after compression only a single event had to be compressed multiple times.

Moving the logic when to drop the ND cache into the encoder also has the benefit that the size and event buffer can reuse the logic.

The variable soft limit has also been renamed to uncompressed limit throughout the code and documentation to help clarify what it is meant to represent.

Notes to assist PR review:

Repeating the reproduction steps outlined in #7526, but using a build with the changes in this PR no error is logged.

…D cache sparingly Renamed the "soft" limit to "uncompressed limit" throughout the code and documentation for clarity. In the size and event buffer the uncompressed limit was being dropped after each upload, now it is carried over. The event buffer doesn't reset the encoder at all. Checking if an individual size is too big was comparing the uncompressed limit to the compressed limit causing events to be dropped or lose the ND cache unnecesarily. This is now fixed, instead if the uncompressed limit allows it the event is compressed and then multiple attempts are made before losing the ND cache or dropping the event. The configurable upload is used to calculate the uncompressed size by exponentially growing it, this could cause an overflow if it was set too high. Added a max. Signed-off-by: sspaink <[email protected]>

netlify · 2025-04-17T21:56:05Z

✅ Deploy Preview for openpolicyagent ready!

Name	Link
🔨 Latest commit	`4a17e02`
🔍 Latest deploy log	https://app.netlify.com/projects/openpolicyagent/deploys/685bfbfd2c2653000821ffdf
😎 Deploy Preview	https://deploy-preview-7521--openpolicyagent.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Signed-off-by: sspaink <[email protected]>

johanfylling

Some thoughts and questions.

v1/plugins/logs/encoder.go

v1/plugins/logs/plugin_test.go

v1/plugins/logs/encoder_test.go

Signed-off-by: sspaink <[email protected]>

sspaink · 2025-04-29T16:36:04Z

@johanfylling I've updated the logic to find an event that is too big to instead make use of the recursion that splits chunks when the uncompressed limit grows too large. Now the uncompressed limit is taken in account, and the first event that is written helps adjust the uncompressed limit to a reasonable starting point opposed to growing from the upload size limit.

Also added a new histogram metric to track the number of events in each chunk. Not sure how useful this is for users 🤔 at the moment I am just using it in TestChunkEncoderAdaptive to find the maximum.

Thanks!

johanfylling

Some questions.

It's been a while since I looked at this last, so sorry if I'm rehashing old stuff 😅.

v1/plugins/logs/encoder.go

v1/plugins/logs/plugin.go

* track maxEventSize to drop ND cache quicker Signed-off-by: sspaink <[email protected]>

Signed-off-by: sspaink <[email protected]>

* revert not dropping adaptive uncompressed limit Signed-off-by: sspaink <[email protected]>

sspaink · 2025-05-09T22:23:18Z

@johanfylling I think I put too much into one pull request, so I decided to split up the changes into separate issues/PRs.

This PR contains the changes to check if an event is too large after compression, updated the issue as well with a way to reproduce this and explains the issue better.
Don't reset adaptive limit between uploads: plugin/decision: don't drop adaptive uncompressed size limit on upload #7561
Set the minimum and maximum boundaries to the upload limit: plugin/decision: set config boundaries to upload_size_limit_bytes #7563

I think this should help make the reviewing a little easier. I also added better documentation describing the specific problem and how to reproduce it in each issue/PR. Sorry for not doing this to begin with, I don't think it affects any of your most recent review comments. Thank you for bearing with me 😄

Signed-off-by: sspaink <[email protected]>

Signed-off-by: Sebastian Spaink <[email protected]>

Signed-off-by: sspaink <[email protected]>

johanfylling

Some additional comments.

v1/plugins/logs/encoder.go

v1/plugins/logs/eventBuffer.go

v1/plugins/logs/encoder.go

Signed-off-by: sspaink <[email protected]>

johanfylling

Ran some perf tests of my own; plerformance looks largely unchanged.

main:

Requests: 200
Requests total: 100000
Duration: 1m18.516058833s
Max concurrency: 500
Average req/s: 1273.6248034646687
timer_rego_query_eval_ns: Min: 5.375µs, Max: 817.916µs, Mean: 21.718µs, P50: 13.479µs, P75: 17.489µs, P90: 39.924µs, P95: 61.302µs, P99: 161.134µs, P99.9: 809.603µs, P99.99: 817.916µs
duration: Min: 180.792µs, Max: 88.047416ms, Mean: 678.158µs, P50: 385.375µs, P75: 477.729µs, P90: 796.262µs, P95: 1.218756ms, P99: 3.42696ms, P99.9: 87.394911ms, P99.99: 88.047416ms
timer_server_handler_ns: Min: 68.542µs, Max: 11.609167ms, Mean: 231.362µs, P50: 170.854µs, P75: 217.093µs, P90: 328.608µs, P95: 526.806µs, P99: 1.156035ms, P99.9: 11.396883ms, P99.99: 11.609167ms
timer_rego_external_resolve_ns: Min: 41ns, Max: 3.958µs, Mean: 115ns, P50: 84ns, P75: 125ns, P90: 167ns, P95: 208ns, P99: 334ns, P99.9: 3.868µs, P99.99: 3.958µs
timer_rego_query_compile_ns: Min: 10.875µs, Max: 385.125µs, Mean: 30.839µs, P50: 25.187µs, P75: 30.656µs, P90: 41.271µs, P95: 58.918µs, P99: 235.747µs, P99.9: 384.389µs, P99.99: 385.125µs
Peaks:
duration: 93.717667ms
timer_server_handler_ns: 93.573541ms
timer_rego_external_resolve_ns: 343.375µs
timer_rego_query_compile_ns: 7.865875ms
timer_rego_query_eval_ns: 4.087875ms
Peak duration: 93.717667ms

PR:

Global metrics:
Requests: 200
Requests total: 100000
Duration: 1m18.222317792s
Max concurrency: 500
Average req/s: 1278.4075289856378
duration: Min: 164.5µs, Max: 63.79725ms, Mean: 698.156µs, P50: 398.854µs, P75: 497.385µs, P90: 905.012µs, P95: 1.423327ms, P99: 4.176799ms, P99.9: 63.694633ms, P99.99: 63.79725ms
timer_rego_query_eval_ns: Min: 5.75µs, Max: 408.417µs, Mean: 20.925µs, P50: 13.833µs, P75: 17.364µs, P90: 39.675µs, P95: 55.189µs, P99: 180.39µs, P99.9: 406.198µs, P99.99: 408.417µs
timer_server_handler_ns: Min: 70.666µs, Max: 33.640792ms, Mean: 247.448µs, P50: 180.375µs, P75: 218.073µs, P90: 293.366µs, P95: 499.251µs, P99: 1.030884ms, P99.9: 32.711279ms, P99.99: 33.640792ms
timer_rego_external_resolve_ns: Min: 41ns, Max: 2.583µs, Mean: 116ns, P50: 125ns, P75: 125ns, P90: 167ns, P95: 167ns, P99: 292ns, P99.9: 2.547µs, P99.99: 2.583µs
timer_rego_query_compile_ns: Min: 11.208µs, Max: 739.959µs, Mean: 32.406µs, P50: 26.145µs, P75: 30.833µs, P90: 39.595µs, P95: 57.502µs, P99: 227.093µs, P99.9: 733.351µs, P99.99: 739.959µs
Peaks:
timer_rego_query_eval_ns: 2.902666ms
timer_server_handler_ns: 91.444459ms
timer_rego_external_resolve_ns: 299.792µs
timer_rego_query_compile_ns: 2.533792ms
duration: 92.222292ms
Peak duration: 92.222292ms

Looks like we're nearing the end of this story 🙂. I think there might be just the one thing left to fix.

v1/plugins/logs/eventBuffer.go

v1/plugins/logs/encoder.go

Signed-off-by: sspaink <[email protected]>

johanfylling

Nice! 😃

Let's get this baby in! 🎉

sspaink mentioned this pull request Apr 17, 2025

fix: use compressed event size to close chunk #7517

Closed

add nil check and revert unneeded changes to tests

1f992ce

Signed-off-by: sspaink <[email protected]>

johanfylling reviewed Apr 23, 2025

View reviewed changes

johanfylling and others added 4 commits April 25, 2025 18:05

Merge branch 'main' into fixencoderv2

f746686

use the recursive logic to find individual bad events

e7c6178

Signed-off-by: sspaink <[email protected]>

update comments

3537a42

Signed-off-by: sspaink <[email protected]>

optimize threshold

92f149b

Signed-off-by: sspaink <[email protected]>

sspaink changed the title ~~fix: don't drop adaptive uncompressed size limit on upload and drop ND cache sparingly~~ plugin/decision: check if event is too large after compression and don't drop adaptive uncompressed size limit on upload Apr 30, 2025

sspaink added the monitoring Issues related to decision log and status plugins label Apr 30, 2025

johanfylling reviewed May 6, 2025

View reviewed changes

sspaink and others added 4 commits May 6, 2025 16:40

* recursive call to append new event when scale up and equilibrium

08eeb01

* track maxEventSize to drop ND cache quicker Signed-off-by: sspaink <[email protected]>

oops forgot to increment metric

d8863cf

Signed-off-by: sspaink <[email protected]>

avoid calling Flush on gzip writer

a1f6e53

Signed-off-by: sspaink <[email protected]>

Merge branch 'main' into fixencoderv2

6e7b98c

sspaink changed the title ~~plugin/decision: check if event is too large after compression and don't drop adaptive uncompressed size limit on upload~~ plugin/decision: check if event is too large after compression May 9, 2025

* revert setting boundary on upload limit

d2941fd

* revert not dropping adaptive uncompressed limit Signed-off-by: sspaink <[email protected]>

sspaink and others added 8 commits May 9, 2025 17:33

oops

bf55f2b

Signed-off-by: sspaink <[email protected]>

oops

93e0769

Signed-off-by: sspaink <[email protected]>

Merge branch 'main' into fixencoderv2

bb9c3f6

Signed-off-by: Sebastian Spaink <[email protected]>

oops

3297ff2

Signed-off-by: sspaink <[email protected]>

Merge branch 'main' into fixencoderv2

e0bb60f

rename

2a79150

Signed-off-by: sspaink <[email protected]>

oops

2c9a40c

Signed-off-by: sspaink <[email protected]>

fix tests

56ed8cd

Signed-off-by: sspaink <[email protected]>

sspaink mentioned this pull request May 23, 2025

plugin/decision: new "immediate" trigger mode #7516

Draft

move marshalling outside of lock

be3e7f4

Signed-off-by: sspaink <[email protected]>

sspaink requested a review from johanfylling June 4, 2025 17:17

Merge branch 'main' into fixencoderv2

f54c847

johanfylling reviewed Jun 11, 2025

View reviewed changes

sspaink and others added 3 commits June 12, 2025 10:15

Merge branch 'main' into fixencoderv2

1db9314

resolve feedback

b73c65d

Signed-off-by: sspaink <[email protected]>

resolve feedback

a55f02e

Signed-off-by: sspaink <[email protected]>

johanfylling reviewed Jun 19, 2025

View reviewed changes

v1/plugins/logs/eventBuffer.go Outdated Show resolved Hide resolved

v1/plugins/logs/encoder.go Show resolved Hide resolved

remove continue

0610826

Signed-off-by: sspaink <[email protected]>

johanfylling approved these changes Jun 24, 2025

View reviewed changes

Merge branch 'main' into fixencoderv2

4a17e02

sspaink merged commit d917e3a into open-policy-agent:main Jun 25, 2025
31 checks passed

plugin/decision: check if event is too large after compression #7521

plugin/decision: check if event is too large after compression #7521

Uh oh!

Conversation

sspaink commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why the changes in this PR are needed?

What are the changes in this PR?

Notes to assist PR review:

Uh oh!

netlify bot commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for openpolicyagent ready!

Uh oh!

johanfylling left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sspaink commented Apr 29, 2025

Uh oh!

johanfylling left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sspaink commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johanfylling left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johanfylling left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

johanfylling left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sspaink commented Apr 17, 2025 •

edited

Loading

netlify bot commented Apr 17, 2025 •

edited

Loading

sspaink commented May 9, 2025 •

edited

Loading