Skip to content

Conversation

@jimmygchen
Copy link
Member

@jimmygchen jimmygchen commented Jan 12, 2026

Description

Adds a PrefixBasedSampler to filter out traces that don't originate from known instrumented code paths. This reduces noise from uninstrumented code paths when exporting to OTLP backends.

Root spans use the lh_ prefix to identify Lighthouse instrumented entry points. The sampler is used with OpenTelemetry's ParentBased sampler:

  • Root spans are sampled only if their name starts with lh_
  • Child spans automatically inherit their parent's sampling decision

This enables effective trace sampling for #8554 - without filtering, we get spans from uninstrumented code paths (e.g. fork_choice_write_lock only traces), making low sample rates ineffective at capturing meaningful instrumented traces.

Additional Info

The lh_ prefix approach eliminates the need to maintain an allowlist of span names - new instrumented spans just need the prefix to be exported. The prefix is kept short to minimize storage overhead in tracing backends.

Add AllowedRootSpanSampler to filter out traces that don't originate
from known instrumented code paths. This reduces noise from library
code and uninstrumented paths when exporting to OTLP backends.

Uses the idiomatic OpenTelemetry ParentBased sampler pattern:
- Root spans are sampled only if their name is in LH_BN_ROOT_SPAN_NAMES
- Child spans automatically inherit their parent's sampling decision
- Efficient head-based sampling with no per-span tracking overhead

This enables effective trace sampling in production - without filtering,
the majority of traces would be noise, making low sample rates ineffective
at capturing meaningful instrumented code paths.
@jimmygchen jimmygchen requested a review from eserilev January 12, 2026 05:11
@jimmygchen jimmygchen added ready-for-review The code is ready for review tracing labels Jan 12, 2026
Copy link
Member

@eserilev eserilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few small things on my end

_attributes: &[opentelemetry::KeyValue],
_links: &[Link],
) -> SamplingResult {
if self.allowed_names.contains(&name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small optimization could be to make allowed_names a set instead of a list. Right now the list of allowed spans is relatively small so it might not matter much until the list gets bigger

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a thought: perhaps we could do prefix instead, so we don't have to maintain this list at all, because longer term this could be quite bad devex - say I add a trace, but forgot to add to this new trace to the allowed list, then i build and deploy BUT couldn't find the trace and had to debug - this wastes some dev cycles.

I'm thinking to add a lh_ prefix (keeping it short so that it doesn't take up a lot of backend storage), what do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this - i think this is a better longer term solution than maintaining a list. Let me know your thoughts!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this seems much better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking much better without the noise \o/

image

Comment on lines 98 to 100
pub struct AllowedRootSpanSampler {
allowed_names: &'static [&'static str],
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a few unit tests here could be nice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added!

    Replace the allowlist-based AllowedRootSpanSampler with a generic `PrefixBasedSampler` that filters spans by prefix. Root spans now use the `lh_` prefix to identify Lighthouse instrumented entry points.

    Changes:
    - Rename `lighthouse_tracing` to `tracing_samplers` and move to `common/`
    - Replace `AllowedRootSpanSampler` with `PrefixBasedSampler`
    - Remove all `SPAN_*` constants, use inline strings at call sites
    - Remove `LH_BN_ROOT_SPAN_NAMES` allowlist

    This eliminates the need to maintain an allowlist of span names. New instrumented spans just need the `lh_` prefix to be exported. The sampler is now generic and can be reused by validator_client.
@jimmygchen jimmygchen requested a review from jxs as a code owner January 13, 2026 06:39
@jimmygchen jimmygchen requested a review from eserilev January 13, 2026 07:06
Copy link
Member

@eserilev eserilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

One more thought I had was to maybe add a custom proc macro so that we don't have to do

#[instrument(name = "lh_produce_unaggregated_attestation", skip_all, fields(%request_slot, %request_index), level = "debug")]
fn produce_unaggregated_attestation()

and instead do something like this

#[lh_instrument(skip_all, fields(%request_slot, %request_index), level = "debug")]
fn produce_unaggregated_attestation()

where the proc macro automatically appends lh_ to the function name

@jimmygchen
Copy link
Member Author

LGTM!

One more thought I had was to maybe add a custom proc macro so that we don't have to do

#[instrument(name = "lh_produce_unaggregated_attestation", skip_all, fields(%request_slot, %request_index), level = "debug")]
fn produce_unaggregated_attestation()

and instead do something like this

#[lh_instrument(skip_all, fields(%request_slot, %request_index), level = "debug")]
fn produce_unaggregated_attestation()

where the proc macro automatically appends lh_ to the function name

Thanks, I see the convenience with the macro, but IMO the gain is quite trivial to justify adding custom macro for this - it may not be immediately obvious what this macro does without looking at the implementation and might add confusion vs when to use #[instrument] macro - which is still what we will use most of the time when creating spans. I have a slight preference to be explicit here for readability, but I'm open if you and others think it's useful to add.

@eserilev
Copy link
Member

sorry thought I already left a comment. yes I agree with you, the proc macro on second thought seems pretty useless. LGTM!

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Jan 22, 2026
@mergify mergify bot added the queued label Jan 22, 2026
@mergify
Copy link

mergify bot commented Jan 22, 2026

Merge Queue Status

✅ The pull request has been merged at d0c0324

This pull request spent 39 minutes 25 seconds in the queue, including 38 minutes 9 seconds running CI.
The checks were run on draft #8693.

Required conditions to merge
  • check-success=local-testnet-success
  • check-success=test-suite-success

mergify bot added a commit that referenced this pull request Jan 22, 2026
@mergify mergify bot merged commit 7f06500 into sigp:unstable Jan 22, 2026
36 checks passed
@mergify mergify bot removed the queued label Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge This PR is ready to merge. tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants