Skip to content

Conversation

@iblancasa
Copy link
Contributor

@iblancasa iblancasa commented Aug 5, 2025

Description

Add URL sanitization feature to redactionprocessor

Link to tracking issue

Fixes #41535

Testing

  • Added new tests
  • Tested with this config:
receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  debug:
    verbosity: detailed

processors:
  redaction:
    allow_all_keys: true
    url_sanitization:
      enabled: true
      attributes:
        - "url"

service:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [redaction]
      exporters: [debug]
    traces:
      receivers: [otlp]
      processors: [redaction]
      exporters: [debug]
    metrics:
      receivers: [otlp]
      processors: [redaction]
      exporters: [debug]

@iblancasa iblancasa requested review from a team, TylerHelmuth, dmitryax and mx-psi as code owners August 5, 2025 09:24
@iblancasa iblancasa added the processor/redaction Redaction processor label Aug 5, 2025
@iblancasa iblancasa force-pushed the 41535 branch 2 times, most recently from 2098e70 to 2a38c00 Compare August 5, 2025 10:21
@iblancasa iblancasa requested a review from jcreixell August 5, 2025 13:16
Copy link
Contributor

@jcreixell jcreixell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, added a few comments.

I would suggest having a look at grafana/clusterurl#3, which makes significant improvements to the classifier. I think you might not need most of the ad-hoc logic with the improved heuristics (I have also contributed these changes to obi)

In addition, have a look at open-telemetry/opentelemetry-ebpf-instrumentation#417 for ideas on how to speed up the additional chars lookup (although the improvements aren't that drastic according to benchmarks, you can't do much better than using a lookup table in terms of performance)

@iblancasa
Copy link
Contributor Author

Thank you for your reviews @jcreixell @grcevski

I would suggest having a look at grafana/clusterurl#3, [...]

Oh well, actually we can just import that library. In other comments, I saw the library not under grafana org and I was a bit concerned about support. Let me try that.

Copy link

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@grcevski
Copy link

grcevski commented Aug 7, 2025

Thanks for all the iterations and benchmarking @iblancasa !

Copy link
Contributor

@jcreixell jcreixell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, if you like you can also wait until this is merged and bump the library version for an extra optimization

Thank you for taking the time to implement this!

@iblancasa iblancasa force-pushed the 41535 branch 3 times, most recently from db60a91 to b83f9cd Compare August 14, 2025 15:50
@iblancasa
Copy link
Contributor Author

During the SIG call, I was asked to provide some numbers about the impact of this feature in the OpenTelemetry Collector Contrib binary.

Before adding the change:

$ vmmap 18494 | grep 'Physical footprint'
Physical footprint:         71.8M
Physical footprint (peak):  71.8M
$ du -sk bin/otelcontribcol_darwin_arm64
471484  bin/otelcontribcol_darwin_arm64

After the change (not enabling the feature):

$ vmmap 26350 | grep 'Physical footprint'
Physical footprint:         72.0M
Physical footprint (peak):  72.0M
$ du -sk bin/otelcontribcol_darwin_arm64
471504  bin/otelcontribcol_darwin_arm64

Enabling the feature:

$ vmmap 28082 | grep 'Physical footprint'
Physical footprint:         72.8M
Physical footprint (peak):  72.8M

/cc @dmitryax

@jcreixell
Copy link
Contributor

@dmitryax is there anything else needed to get this merged? I believe the binary footprint is negligible, especially considering that this is a contrib repo and that OCB exists. The model itself uses 14Kb, could be compressed to reduce it to 7Kb if really needed but not sure it is worth it.

If the memory or binary size are a concern, we could re-visit creating a standalone processor as originally proposed in #41100, but we would be going full circle and we already rejected the idea in favor of this approach (after also discarding the OTTL function approach due top its stateful nature).

@iblancasa iblancasa force-pushed the 41535 branch 4 times, most recently from 80f4bf8 to c87e83b Compare September 19, 2025 15:48
Signed-off-by: Israel Blancas <iblancasa@gmail.com>
@dmitryax dmitryax merged commit b355d8e into open-telemetry:main Sep 25, 2025
185 of 186 checks passed
@github-actions github-actions bot added this to the next release milestone Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[processor/redactionprocessor] Add capabilities to sanitize urls

5 participants