-
Notifications
You must be signed in to change notification settings - Fork 3.3k
[processor/redactionprocessor] Add URL sanitization feature to redactionprocessor #41774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2098e70 to
2a38c00
Compare
jcreixell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, added a few comments.
I would suggest having a look at grafana/clusterurl#3, which makes significant improvements to the classifier. I think you might not need most of the ad-hoc logic with the improved heuristics (I have also contributed these changes to obi)
In addition, have a look at open-telemetry/opentelemetry-ebpf-instrumentation#417 for ideas on how to speed up the additional chars lookup (although the improvements aren't that drastic according to benchmarks, you can't do much better than using a lookup table in terms of performance)
|
Thank you for your reviews @jcreixell @grcevski
Oh well, actually we can just import that library. In other comments, I saw the library not under |
grcevski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
Thanks for all the iterations and benchmarking @iblancasa ! |
jcreixell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, if you like you can also wait until this is merged and bump the library version for an extra optimization
Thank you for taking the time to implement this!
db60a91 to
b83f9cd
Compare
|
During the SIG call, I was asked to provide some numbers about the impact of this feature in the OpenTelemetry Collector Contrib binary. Before adding the change: $ vmmap 18494 | grep 'Physical footprint'
Physical footprint: 71.8M
Physical footprint (peak): 71.8M
$ du -sk bin/otelcontribcol_darwin_arm64
471484 bin/otelcontribcol_darwin_arm64After the change (not enabling the feature): $ vmmap 26350 | grep 'Physical footprint'
Physical footprint: 72.0M
Physical footprint (peak): 72.0M
$ du -sk bin/otelcontribcol_darwin_arm64
471504 bin/otelcontribcol_darwin_arm64Enabling the feature: $ vmmap 28082 | grep 'Physical footprint'
Physical footprint: 72.8M
Physical footprint (peak): 72.8M/cc @dmitryax |
8b7c21d to
b50f463
Compare
|
@dmitryax is there anything else needed to get this merged? I believe the binary footprint is negligible, especially considering that this is a contrib repo and that OCB exists. The model itself uses 14Kb, could be compressed to reduce it to 7Kb if really needed but not sure it is worth it. If the memory or binary size are a concern, we could re-visit creating a standalone processor as originally proposed in #41100, but we would be going full circle and we already rejected the idea in favor of this approach (after also discarding the OTTL function approach due top its stateful nature). |
80f4bf8 to
c87e83b
Compare
Signed-off-by: Israel Blancas <iblancasa@gmail.com>
Description
Add URL sanitization feature to redactionprocessor
Link to tracking issue
Fixes #41535
Testing