feat: Propagate W3C trace context headers from clients #2153

bbrowning · 2025-05-13T12:48:22Z

What does this PR do?

This extracts the W3C trace context headers (traceparent and tracestate) from incoming requests, stuffs them as attributes on the spans we create, and uses them within the tracing provider implementation to actually wrap our spans in the proper context.

What this means in practice is that when a client (such as an OpenAI client) is instrumented to create these traces, we'll continue that distributed trace within Llama Stack as opposed to creating our own root span that breaks the distributed trace between client and server.

It's slightly awkward to do this in Llama Stack because our Tracing API knows nothing about opentelemetry, W3C trace headers, etc - that's only knowledge the specific provider implementation has. So, that's why the trace headers get extracted by in the server code but not actually used until the provider implementation to form the proper context.

This also centralizes how we were adding the __root__ and __root_span__ attributes, as those two were being added in different parts of the code instead of from a single place.

Closes #2097

Test Plan

This was tested manually using the helpful scripts from #2097. I verified that Llama Stack properly joined the client's span when the client was instrumented for distributed tracing, and that Llama Stack properly started its own root span when the incoming request was not part of an existing trace.

Here's an example of the joined spans:

This extracts the W3C trace context headers (traceparent and tracestate) from incoming requests, stuffs them as attributes on the spans we create, and uses them within the tracing provider implementation to actually wrap our spans in the proper context. What this means in practice is that when a client (such as an OpenAI client) is instrumented to create these traces, we'll continue that distributed trace within Llama Stack as opposed to creating our own root span that breaks the distributed trace between client and server. It's slightly awkward to do this in Llama Stack because our Tracing API knows nothing about opentelemetry, W3C trace headers, etc - that's only knowledge the specific provider implementation has. So, that's why the trace headers get extracted by in the server code but not actually used until the provider implementation to form the proper context. This also centralizes how we were adding the `__root__` and `__root_span__` attributes, as those two were being added in different parts of the code instead of from a single place. Fixes meta-llama#2097 Signed-off-by: Ben Browning <[email protected]>

ashwinb · 2025-05-14T03:34:47Z

What minimal bit would you add to our Telemetry API so it is natively aware of these things? Would the server still not need to extract these somehow and stash them into the in memory session somewhere to be used by the providers?

bbrowning · 2025-05-15T17:44:03Z

So, as a hypothetical situation, if we supported distributed tracing natively instead of via an API/provider abstraction, we'd just pull in the opentelemetry libraries directly to extract the headers in the server and inject the headers for outgoing requests from our own client calls, such as when we use OpenAI or other clients to speak to the backend inference providers.

I think the solution in the PR here is a fine compromise for the case here of ensuring we don't break the trace started by the client. But, if we want to solve #2154 , we're going to have to figure something else out to bridge the worlds of opentelemetry context tracking and our Telemetry API with its own notion of spans and contexts.

Until we do some prototypes or mapping out how or if our Telemetry API can merge with opentelemetry's native span/context tracking pieces for both incoming and outgoing requests, I don't have any concrete changes to suggest for our Telemetry API. It may be that we can find a clever way to bridge the two worlds without any API changes, which would be a good outcome. I was planning to dig into that side of things more as part of working on adding tracing for outgoing requests, just to see what's possible today and what may need some deeper changes.

codefromthecrypt · 2025-05-27T06:39:45Z

excited to try this, but the easiest way for me is via docker. Would love to have main, daily or RC tags pushed, so I opened #2274

bbrowning requested review from ashwinb, yanxi0830, hardikjshah, dltn, raghotham, dineshyv, vladimirivic, sixianyi0721, ehhuang, terrytangyuan, SLR722 and leseb as code owners May 13, 2025 12:48

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 13, 2025

bbrowning mentioned this pull request May 13, 2025

Support http header trace propagation #2097

Closed

ashwinb approved these changes May 20, 2025

View reviewed changes

ashwinb merged commit 6d20b72 into meta-llama:main May 20, 2025
23 checks passed

bbrowning deleted the traceparent-headers branch May 20, 2025 10:28

codefromthecrypt mentioned this pull request May 27, 2025

publish docker tags for release candidates #2274

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Propagate W3C trace context headers from clients #2153

feat: Propagate W3C trace context headers from clients #2153

Uh oh!

bbrowning commented May 13, 2025

Uh oh!

ashwinb commented May 14, 2025

Uh oh!

bbrowning commented May 15, 2025

Uh oh!

Uh oh!

codefromthecrypt commented May 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat: Propagate W3C trace context headers from clients #2153

feat: Propagate W3C trace context headers from clients #2153

Uh oh!

Conversation

bbrowning commented May 13, 2025

What does this PR do?

Test Plan

Uh oh!

ashwinb commented May 14, 2025

Uh oh!

bbrowning commented May 15, 2025

Uh oh!

Uh oh!

codefromthecrypt commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codefromthecrypt commented May 27, 2025 •

edited

Loading