-
Notifications
You must be signed in to change notification settings - Fork 961
Implement custom OpenTelemetry sampler to filter uninstrumented traces #8647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement custom OpenTelemetry sampler to filter uninstrumented traces #8647
Conversation
Add AllowedRootSpanSampler to filter out traces that don't originate from known instrumented code paths. This reduces noise from library code and uninstrumented paths when exporting to OTLP backends. Uses the idiomatic OpenTelemetry ParentBased sampler pattern: - Root spans are sampled only if their name is in LH_BN_ROOT_SPAN_NAMES - Child spans automatically inherit their parent's sampling decision - Efficient head-based sampling with no per-span tracking overhead This enables effective trace sampling in production - without filtering, the majority of traces would be noise, making low sample rates ineffective at capturing meaningful instrumented code paths.
eserilev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few small things on my end
| _attributes: &[opentelemetry::KeyValue], | ||
| _links: &[Link], | ||
| ) -> SamplingResult { | ||
| if self.allowed_names.contains(&name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small optimization could be to make allowed_names a set instead of a list. Right now the list of allowed spans is relatively small so it might not matter much until the list gets bigger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had a thought: perhaps we could do prefix instead, so we don't have to maintain this list at all, because longer term this could be quite bad devex - say I add a trace, but forgot to add to this new trace to the allowed list, then i build and deploy BUT couldn't find the trace and had to debug - this wastes some dev cycles.
I'm thinking to add a lh_ prefix (keeping it short so that it doesn't take up a lot of backend storage), what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this - i think this is a better longer term solution than maintaining a list. Let me know your thoughts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this seems much better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pub struct AllowedRootSpanSampler { | ||
| allowed_names: &'static [&'static str], | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a few unit tests here could be nice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, added!
Replace the allowlist-based AllowedRootSpanSampler with a generic `PrefixBasedSampler` that filters spans by prefix. Root spans now use the `lh_` prefix to identify Lighthouse instrumented entry points.
Changes:
- Rename `lighthouse_tracing` to `tracing_samplers` and move to `common/`
- Replace `AllowedRootSpanSampler` with `PrefixBasedSampler`
- Remove all `SPAN_*` constants, use inline strings at call sites
- Remove `LH_BN_ROOT_SPAN_NAMES` allowlist
This eliminates the need to maintain an allowlist of span names. New instrumented spans just need the `lh_` prefix to be exported. The sampler is now generic and can be reused by validator_client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
One more thought I had was to maybe add a custom proc macro so that we don't have to do
#[instrument(name = "lh_produce_unaggregated_attestation", skip_all, fields(%request_slot, %request_index), level = "debug")]
fn produce_unaggregated_attestation()and instead do something like this
#[lh_instrument(skip_all, fields(%request_slot, %request_index), level = "debug")]
fn produce_unaggregated_attestation()where the proc macro automatically appends lh_ to the function name
Thanks, I see the convenience with the macro, but IMO the gain is quite trivial to justify adding custom macro for this - it may not be immediately obvious what this macro does without looking at the implementation and might add confusion vs when to use |
|
sorry thought I already left a comment. yes I agree with you, the proc macro on second thought seems pretty useless. LGTM! |
Merge Queue Status✅ The pull request has been merged at d0c0324 This pull request spent 39 minutes 25 seconds in the queue, including 38 minutes 9 seconds running CI. Required conditions to merge
|

Description
Adds a
PrefixBasedSamplerto filter out traces that don't originate from known instrumented code paths. This reduces noise from uninstrumented code paths when exporting to OTLP backends.Root spans use the
lh_prefix to identify Lighthouse instrumented entry points. The sampler is used with OpenTelemetry'sParentBasedsampler:lh_This enables effective trace sampling for #8554 - without filtering, we get spans from uninstrumented code paths (e.g.
fork_choice_write_lockonly traces), making low sample rates ineffective at capturing meaningful instrumented traces.Additional Info
The
lh_prefix approach eliminates the need to maintain an allowlist of span names - new instrumented spans just need the prefix to be exported. The prefix is kept short to minimize storage overhead in tracing backends.