Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions oteps/profiles/4855-profiles-obi-correlation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Correlating Profiles to OBI Traces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to provide additional context (e.g. links if context exists somewhere else) for correlation in general:

  • why OBI traces are targeted here and not the SDK traces?
  • how/if OBI traces are correlated to SDK traces?
  • how SDK traces are correlated to profiles

Also, what happens when application is getting instrumented and which mechanisms would win?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why OBI traces are targeted here and not the SDK traces?

there are other proposals which targets SDK traces specifically, see: #4719 and https://docs.google.com/document/d/1eatbHpEXXhWZEPrXZpfR58-5RIx-81mUgF69Zpn3Rz4/edit?tab=t.0#heading=h.fvztn3xtjxxm

how/if OBI traces are correlated to SDK traces?

I don't think they are, they have different ways of doing the same thing (context propagation) and if a trace exists, it will be ideally inherited both by OBI and SDKs

how SDK traces are correlated to profiles

this OTEP is OBI specific, for SDKs I believe the work/proposal is located here: https://docs.google.com/document/d/1eatbHpEXXhWZEPrXZpfR58-5RIx-81mUgF69Zpn3Rz4/edit?tab=t.0#heading=h.fvztn3xtjxxm

Also, what happens when application is getting instrumented and which mechanisms would win?

I think it should be something like:

trace_ctx = try_sdk()
if !trace_ctx:
    trace_ctx = try_obi()

or viceversa; if both sources have a trace context and it differs, perhaps it's a bug

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 I'm one of the authors of the gdoc mentioned -- we plan to turn it into an OTEP soon. The TL;DR of why we need OBI specifics is that due to the way OBI works it's awkward to use the same mechanism we're planning for regular SDKs and vice-versa.

See also #4719 (comment) for more on this discussion.

Copy link
Member

@christos68k christos68k Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova We discussed the priority/fidelity (who wins?) issue at today's Profiling SIG (also previously raised by me in the proposed implementation here)

Given that this OTEP focuses on a specific technical solution (fast data exchange between two OTel components), should we attempt to answer the priority/fidelity question here or somewhere more general? In today's SIG, we discussed that the same clash can be present in the rest of OTel, e.g. if one has multiple layers of instrumentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having more context in the OTEP is helpful and increases chances of people reviewing it.
It's important to understand the implications of this otep on general correlation between obi, sdk, and profiling.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to understand the implications of this otep on general correlation between obi, sdk, and profiling.

In my opinion it belongs in a more general document as it would dictate the correlation/priority/etc. of generic readers (eg. the profiler) with writers (eg. OBI, SDKs). Let me know if should I add (or clarify) something OBI specific


This OTEP introduces a standard communication channel and a specification for correlating profiles to [opentelemetry-ebpf-instrumentation (OBI)](https://github.com/open-telemetry/opentelemetry-ebpf-instrumentation) traces.

<!-- toc -->
* [Motivation](#motivation)
* [Design Notes](#design-notes)
* [Communication Channel](#communication-channel)
* [Data Model](#data-model)
<!-- toc -->

## Motivation

Currently, OBI traces and profiles operate independently, making it difficult to attribute profiling data to specific traces or spans. By establishing a standard kernel-resident communication channel, this OTEP enables:

- Correlating profiles with their corresponding traces or spans
- End-to-end observability workflows without requiring application-level instrumentation

## Design Notes

### Communication Channel

The communication channel between OBI and the profiler is implemented via an eBPF map pinned at `$PINPATH/otel_traces_ctx_v1`.

$PINPATH will default to bpffs (`/sys/fs/bpf`) but there must be options to specify an alternative location. If set, the user-configured

location in OBI must match the one set in the profiler.

On startup, both OBI and the profiler, will create the map and pin it if it doesn't exist. OBI will expose an helper function for the profiler

to do so.

### Data Model

As described in the [Profiles Data Model](./0239-profiles-data-model.md), the shared eBPF map uses a minimal structure to store correlation data.

#### eBPF Map Specification

```c
struct {
__uint(type, BPF_MAP_TYPE_LRU_HASH);
__type(key, u64);
__type(value, struct trace_context);
__uint(max_entries, 1 << 14);
__uint(pinning, LIBBPF_PIN_BY_NAME);
} otel_traces_ctx_v1 SEC(".maps");
```

- **Key:** `(u64)pid_tgid`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Go where goroutines are multiplexed on OS threads? Will this hold a goroutine id instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map will still be indexed by pid_tgid. OBI hooks the Go runtime (see: https://github.com/open-telemetry/opentelemetry-ebpf-instrumentation/blob/main/bpf/gotracer/go_runtime.c#L197) and keeps track of the goroutine statuses, whenever a goroutine starts doing work, this map is then updated to keep the context for the "currently executing work" goroutine.

Something similar happens for other languages with different async primitives

- **Value:**

```c
struct trace_context {
u8 trace_id[16];
u8 span_id[8];
Comment on lines +52 to +53
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we don't type these as

Suggested change
u8 trace_id[16];
u8 span_id[8];
u64 trace_id[2];
u64 span_id;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. These are opaque 16/8-byte identifiers, not integers. The canonical form (including traceparent) is defined over raw bytes, and we manipulate them as bytes in eBPF and in Go ([16]byte / [8]byte). Using u8[16] / u8[8] keeps the in-memory layout identical to the wire format and the userspace representation.

If we type them as u64 / u64[2], we introduce endianness concerns. We'd have to define hi/lo ordering and a canonical byte order, and consistently convert at every kernel <-> userspace boundary and during header encoding. That's unnecessary complexity and an easy source of subtle cross-arch bugs, especially since the protocol is byte-ordered, not integer-ordered.

We don't perform arithmetic on these values, so there's no real benefit to representing them as integers. As a side note, keeping them as byte arrays also avoids type-punning/strict-aliasing issues from casting between u64* and byte buffers.

};
```
Loading