Skip to content

Support http header trace propagation #2097

Closed
@codefromthecrypt

Description

@codefromthecrypt

🚀 Describe the new functionality needed

Right now, if the openai client is instrumented with opentelemetry, it doesn't join the server-side trace. Instead, you get two traces like this:

instrumented openai client
Image

llama-stack with trace sinks
Image

💡 Why is this needed? What if we don't build it?

The most typical use case of distributed tracing is to see traces that cross RPC boundaries. Until propagation flows (e.g. w3c traceparent and/or b3 headers), an application cannot see its side effects on the server, or understand latency or failures knowable on the server side.

Other thoughts

Here's a test script I use which when run with uv applies the env. e.g. uv run -q --env-file .env main.py

# /// script
# dependencies = [
#     "openai",
#     "elastic-opentelemetry",
#     "elastic-opentelemetry-instrumentation-openai",
#     "opentelemetry-instrumentation-httpx"
# ]
# ///
import argparse
import os

import openai
from opentelemetry.instrumentation import auto_instrumentation

auto_instrumentation.initialize()

CHAT_MODEL = os.environ["CHAT_MODEL"]


def main():
    parser = argparse.ArgumentParser(description="OpenTelemetry-Enabled OpenAI Test Client")
    parser.add_argument(
        "--use-responses-api", action="store_true", help="Use the responses API instead of chat completions."
    )
    args = parser.parse_args()

    client = openai.Client()

    messages = [
        {
            "role": "user",
            "content": "Answer in up to 3 words: Which ocean contains Bouvet Island?",
        }
    ]
    extra_body = {"chat_template_kwargs": {"enable_thinking": False}}
    if args.use_responses_api:
        response = client.responses.create(
            model=CHAT_MODEL, input=messages[0]["content"], temperature=0, extra_body=extra_body
        )
        print(response.output[0].content[0].text)
    else:
        chat_completion = client.chat.completions.create(
            model=CHAT_MODEL, messages=messages, temperature=0, extra_body=extra_body
        )
        print(chat_completion.choices[0].message.content)


if __name__ == "__main__":
    main()

Here's the ENV

# Variables used in test script (except CHAT_MODEL these are OpenAI defaults)
OPENAI_BASE_URL=http://localhost:8321/v1/openai/v1
OPENAI_API_KEY=unused
CHAT_MODEL=llama3.2:1b

# Variable name used by llama-stack
INFERENCE_MODEL=llama3.2:1b

# OpenTelemetry configuration
TELEMETRY_SINKS=otel_trace,otel_metric
OTEL_SERVICE_NAME=llama-stack

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions