Skip to content

feat: Propagate W3C trace context headers from clients #2153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 20, 2025

Conversation

bbrowning
Copy link
Collaborator

What does this PR do?

This extracts the W3C trace context headers (traceparent and tracestate) from incoming requests, stuffs them as attributes on the spans we create, and uses them within the tracing provider implementation to actually wrap our spans in the proper context.

What this means in practice is that when a client (such as an OpenAI client) is instrumented to create these traces, we'll continue that distributed trace within Llama Stack as opposed to creating our own root span that breaks the distributed trace between client and server.

It's slightly awkward to do this in Llama Stack because our Tracing API knows nothing about opentelemetry, W3C trace headers, etc - that's only knowledge the specific provider implementation has. So, that's why the trace headers get extracted by in the server code but not actually used until the provider implementation to form the proper context.

This also centralizes how we were adding the __root__ and __root_span__ attributes, as those two were being added in different parts of the code instead of from a single place.

Closes #2097

Test Plan

This was tested manually using the helpful scripts from #2097. I verified that Llama Stack properly joined the client's span when the client was instrumented for distributed tracing, and that Llama Stack properly started its own root span when the incoming request was not part of an existing trace.

Here's an example of the joined spans:

Screenshot 2025-05-13 at 8 46 09 AM

This extracts the W3C trace context headers (traceparent and
tracestate) from incoming requests, stuffs them as attributes on the
spans we create, and uses them within the tracing provider
implementation to actually wrap our spans in the proper context.

What this means in practice is that when a client (such as an OpenAI
client) is instrumented to create these traces, we'll continue that
distributed trace within Llama Stack as opposed to creating our own
root span that breaks the distributed trace between client and server.

It's slightly awkward to do this in Llama Stack because our Tracing
API knows nothing about opentelemetry, W3C trace headers, etc - that's
only knowledge the specific provider implementation has. So, that's
why the trace headers get extracted by in the server code but not
actually used until the provider implementation to form the proper
context.

This also centralizes how we were adding the `__root__` and
`__root_span__` attributes, as those two were being added in different
parts of the code instead of from a single place.

Fixes meta-llama#2097

Signed-off-by: Ben Browning <[email protected]>
@ashwinb
Copy link
Contributor

ashwinb commented May 14, 2025

What minimal bit would you add to our Telemetry API so it is natively aware of these things? Would the server still not need to extract these somehow and stash them into the in memory session somewhere to be used by the providers?

@bbrowning
Copy link
Collaborator Author

So, as a hypothetical situation, if we supported distributed tracing natively instead of via an API/provider abstraction, we'd just pull in the opentelemetry libraries directly to extract the headers in the server and inject the headers for outgoing requests from our own client calls, such as when we use OpenAI or other clients to speak to the backend inference providers.

I think the solution in the PR here is a fine compromise for the case here of ensuring we don't break the trace started by the client. But, if we want to solve #2154 , we're going to have to figure something else out to bridge the worlds of opentelemetry context tracking and our Telemetry API with its own notion of spans and contexts.

Until we do some prototypes or mapping out how or if our Telemetry API can merge with opentelemetry's native span/context tracking pieces for both incoming and outgoing requests, I don't have any concrete changes to suggest for our Telemetry API. It may be that we can find a clever way to bridge the two worlds without any API changes, which would be a good outcome. I was planning to dig into that side of things more as part of working on adding tracing for outgoing requests, just to see what's possible today and what may need some deeper changes.

@ashwinb ashwinb merged commit 6d20b72 into meta-llama:main May 20, 2025
23 checks passed
@bbrowning bbrowning deleted the traceparent-headers branch May 20, 2025 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support http header trace propagation
3 participants