Closed
Description
🚀 Describe the new functionality needed
Right now, if the openai client is instrumented with opentelemetry, it doesn't join the server-side trace. Instead, you get two traces like this:
💡 Why is this needed? What if we don't build it?
The most typical use case of distributed tracing is to see traces that cross RPC boundaries. Until propagation flows (e.g. w3c traceparent and/or b3 headers), an application cannot see its side effects on the server, or understand latency or failures knowable on the server side.
Other thoughts
Here's a test script I use which when run with uv
applies the env. e.g. uv run -q --env-file .env main.py
# /// script
# dependencies = [
# "openai",
# "elastic-opentelemetry",
# "elastic-opentelemetry-instrumentation-openai",
# "opentelemetry-instrumentation-httpx"
# ]
# ///
import argparse
import os
import openai
from opentelemetry.instrumentation import auto_instrumentation
auto_instrumentation.initialize()
CHAT_MODEL = os.environ["CHAT_MODEL"]
def main():
parser = argparse.ArgumentParser(description="OpenTelemetry-Enabled OpenAI Test Client")
parser.add_argument(
"--use-responses-api", action="store_true", help="Use the responses API instead of chat completions."
)
args = parser.parse_args()
client = openai.Client()
messages = [
{
"role": "user",
"content": "Answer in up to 3 words: Which ocean contains Bouvet Island?",
}
]
extra_body = {"chat_template_kwargs": {"enable_thinking": False}}
if args.use_responses_api:
response = client.responses.create(
model=CHAT_MODEL, input=messages[0]["content"], temperature=0, extra_body=extra_body
)
print(response.output[0].content[0].text)
else:
chat_completion = client.chat.completions.create(
model=CHAT_MODEL, messages=messages, temperature=0, extra_body=extra_body
)
print(chat_completion.choices[0].message.content)
if __name__ == "__main__":
main()
Here's the ENV
# Variables used in test script (except CHAT_MODEL these are OpenAI defaults)
OPENAI_BASE_URL=http://localhost:8321/v1/openai/v1
OPENAI_API_KEY=unused
CHAT_MODEL=llama3.2:1b
# Variable name used by llama-stack
INFERENCE_MODEL=llama3.2:1b
# OpenTelemetry configuration
TELEMETRY_SINKS=otel_trace,otel_metric
OTEL_SERVICE_NAME=llama-stack