Support http header trace propagation

### 🚀 Describe the new functionality needed

Right now, if the openai client is instrumented with opentelemetry, it doesn't join the server-side trace. Instead, you get two traces like this:

instrumented openai client
<img width="1426" alt="Image" src="https://github.com/user-attachments/assets/35e99786-170b-482a-acbb-39e4e5b3513a" />

llama-stack with trace sinks
<img width="1437" alt="Image" src="https://github.com/user-attachments/assets/f02d96a2-8163-4f01-acd4-feb1db5ef8c6" />

### 💡 Why is this needed? What if we don't build it?

The most typical use case of distributed tracing is to see traces that cross RPC boundaries. Until propagation flows (e.g. w3c traceparent and/or b3 headers), an application cannot see its side effects on the server, or understand latency or failures knowable on the server side.

### Other thoughts


Here's a test script I use which when run with `uv` applies the env. e.g. `uv run -q --env-file .env main.py`

```python
# /// script
# dependencies = [
#     "openai",
#     "elastic-opentelemetry",
#     "elastic-opentelemetry-instrumentation-openai",
#     "opentelemetry-instrumentation-httpx"
# ]
# ///
import argparse
import os

import openai
from opentelemetry.instrumentation import auto_instrumentation

auto_instrumentation.initialize()

CHAT_MODEL = os.environ["CHAT_MODEL"]


def main():
    parser = argparse.ArgumentParser(description="OpenTelemetry-Enabled OpenAI Test Client")
    parser.add_argument(
        "--use-responses-api", action="store_true", help="Use the responses API instead of chat completions."
    )
    args = parser.parse_args()

    client = openai.Client()

    messages = [
        {
            "role": "user",
            "content": "Answer in up to 3 words: Which ocean contains Bouvet Island?",
        }
    ]
    extra_body = {"chat_template_kwargs": {"enable_thinking": False}}
    if args.use_responses_api:
        response = client.responses.create(
            model=CHAT_MODEL, input=messages[0]["content"], temperature=0, extra_body=extra_body
        )
        print(response.output[0].content[0].text)
    else:
        chat_completion = client.chat.completions.create(
            model=CHAT_MODEL, messages=messages, temperature=0, extra_body=extra_body
        )
        print(chat_completion.choices[0].message.content)


if __name__ == "__main__":
    main()
```

Here's the ENV
```
# Variables used in test script (except CHAT_MODEL these are OpenAI defaults)
OPENAI_BASE_URL=http://localhost:8321/v1/openai/v1
OPENAI_API_KEY=unused
CHAT_MODEL=llama3.2:1b

# Variable name used by llama-stack
INFERENCE_MODEL=llama3.2:1b

# OpenTelemetry configuration
TELEMETRY_SINKS=otel_trace,otel_metric
OTEL_SERVICE_NAME=llama-stack
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support http header trace propagation #2097

🚀 Describe the new functionality needed

💡 Why is this needed? What if we don't build it?

Other thoughts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support http header trace propagation #2097

Description

🚀 Describe the new functionality needed

💡 Why is this needed? What if we don't build it?

Other thoughts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions