update to add vLLM tracing guide

sallyom · sallyom · commit c2e3a796861c · 2025-04-07T13:59:44.000-04:00
Signed-off-by: sallyom &lt;somalley@redhat.com&gt;
diff --git a/kubernetes/observability/README.md b/kubernetes/observability/README.md
@@ -1,53 +1,8 @@
 # Monitor Llamastack & vLLM in OpenShift
 
 Follow this README to configure an observability stack in OpenShift to visualize Llamastack telemetry and vLLM metrics.
+First, ensure Llamastack and vLLM are configured to generate telemetry by following this [configuration guide](./run-configuration.md)
 
-## Generate telemetry from Llamastack and vLLM
-
-### vLLM
-
-For vLLM, metrics are generated by default and are exposed at `vllm-endpoint:port/metrics`. For a list of metrics,
-you can `curl localhost:8000/metrics` from within a vLLM container.
-
-### Llamastack
-
-With Llamastack, you need to specify in the run-config.yaml to enable telemetry collection with an opentelemetry receiver.
-Here's how to do that:
-
-#### Updated manifests for telemetry trace collection with opentelemetry receiver endpoint
-
-This is for traces only. There is a similar `otel_metric` sink and `otel_metric_endpoint`, however, there are currently
-only 4 metrics generated within Llamastack, and these are duplicates of what vLLM provides.
-
-[kubernetes/llama-stack/configmap.yaml](../llama-stack/configmap.yaml)
-
-```yaml
----
-      telemetry:
-      - provider_id: meta-reference
-        provider_type: inline::meta-reference
-        config:
-          service_name: ${env.OTEL_SERVICE_NAME:llama-stack}
-          sinks: ${env.TELEMETRY_SINKS:console, otel_trace, sqlite} <-add otel_trace and/or otel_metric
-          otel_trace_endpoint: ${env.OTEL_TRACE_ENDPOINT:} <-add ONLY if opentelemetry receiver endpoint is available.
----
-```
-And, in [kubernetes/llama-stack/deployment.yaml](../llama-stack/deployment.yaml)
-
-```yaml
----
-        env:
-        - name: OTEL_SERVICE_NAME
-          value: llamastack
-        - name: OTEL_TRACE_ENDPOINT
-          value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/traces
-       #-  name: OTEL_METRIC_ENDPOINT
-       #-  value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/metrics
----
-```
-
-The otel-endpoint is `http://service-name-otc.namespace-of-otc.svc.cluster.local:4318/v1/traces,metrics` if exporting to
-central otel-collector. If using otel-collector sidecar, this would be `http://localhost:4318/v1/traces,metrics`.
 
 ## OpenShift Observability Operators
 
@@ -134,6 +89,16 @@ oc patch deployment <deployment-name> \
   -p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.opentelemetry.io/inject":"vllm-otelsidecar"}}}}}'
 ```
 
+### Cluster Observability Operator Tracing UIPlugin
+
+The Jaeger frontend feature of TempoStack is no longer supported by Red Hat. This has been replaced by the COO UIPlugin. To create the UIPlugin for
+Tracing, first ensure the TempoStack described above is created. This is a prerequisite. Then, all that's necessary to view traces from
+the OpenShift console at `Observe -> Traces` is to create the following [Tracing UIPlugin resource](./tracing-ui-plugin.yaml). 
+
+```bash
+oc apply ./tracing-ui-plugin.yaml
+```
+
 ### Grafana 
 
 This will deploy a Grafana instance, and Prometheus & Tempo DataSources
@@ -157,13 +122,3 @@ The dashboard is slightly modified from https://github.com/kevchu3/openshift4-gr
 ```bash
 oc apply -n observability-hub -f cluster-metrics-dashboard/cluster-metrics.yaml 
 ```
-
-### Cluster Observability Operator Tracing UIPlugin
-
-The Jaeger frontend feature of TempoStack is no longer supported by Red Hat. This has been replaced by the COO UIPlugin. To create the UIPlugin for
-Tracing, first ensure the TempoStack described above is created. This is a prerequisite. Then, all that's necessary to view traces from
-the OpenShift console at `Observe -> Traces` is to create the following [Tracing UIPlugin resource](./tracing-ui-plugin.yaml). 
-
-```bash
-oc apply ./tracing-ui-plugin.yaml
-```
diff --git a/kubernetes/observability/run-configuration.md b/kubernetes/observability/run-configuration.md
@@ -0,0 +1,101 @@
+## Generate telemetry from Llamastack and vLLM
+
+### vLLM
+
+#### metrics
+
+For vLLM, metrics are generated by default and are exposed at `vllm-endpoint:port/metrics`. For a list of metrics,
+you can `curl localhost:8000/metrics` from within a vLLM container.
+
+#### traces
+
+It's possible to generate vLLM distributed trace data by updating the vLLM image and start command. This [Containerfile](./vllm-Containerfile)
+shows the necessary packages to generate vLLM traces.
+
+Here is how you would build vLLM with the tracing packages:
+
+```bash
+podman build --platform x86_64 -t quay.io/[your-quay-username]/vllm:otlp-tracing -f vllm-Containerfile .
+podman push quay.io/[your-quay-username]/vllm:otlp-tracing
+```
+
+Then, add the following updates to the vLLM deployment.yaml. We'll use the [granite-8b deployment](../llama-serve/granite-8b/vllm.yaml):
+This example assumes there is an OpenTelemetryCollector with sidecar mode in the same namespace.
+See [OpenTelemetryCollector Sidecars Deployment](./README.md#opentelemetrycollector_sidecars_deployment)
+
+
+```yaml
+---
+  template:
+    metadata:
+      labels:
+        app: granite-8b
+      annotations:
+        sidecar.opentelemetry.io/inject: vllm-otelsidecar 
+    spec:
+      containers:
+      - args:
+        - --model
+        - ibm-granite/granite-3.2-8b-instruct
+        - --max-model-len
+        - "128000"
+        - --enable-auto-tool-choice
+        - --chat-template
+        - /app/tool_chat_template_granite.jinja
+        - --tool-call-parser=granite
+        - --otlp-traces-endpoint="grpc://localhost:4317"
+        - --port
+        - "8000"
+        env:
+        - name: OTEL_SERVICE_NAME
+          value: "vllm-granite8b"
+        - name: OTEL_EXPORTER_OTLP_TRACES_INSECURE
+          value: true
+---
+```
+
+With the updated vLLM image and the updated deployment, distributed trace data will be generated and collected by the opentelemetry-collector
+sidecar container and exported to the central observability-hub as outlined in the [README.md](./README.md) with a `TempoStack` as a tracing backend.
+
+### Llamastack
+
+With Llamastack, you need to specify in the run-config.yaml to enable telemetry collection with an opentelemetry receiver.
+Here's how to do that:
+
+#### Updated manifests for telemetry trace collection with opentelemetry receiver endpoint
+
+This is for traces only. There is a similar `otel_metric` sink and `otel_metric_endpoint`, however, there are currently
+only 4 metrics generated within Llamastack, and these are duplicates of what vLLM provides.
+
+[kubernetes/llama-stack/configmap.yaml](../llama-stack/configmap.yaml)
+
+```yaml
+---
+      telemetry:
+      - provider_id: meta-reference
+        provider_type: inline::meta-reference
+        config:
+          service_name: ${env.OTEL_SERVICE_NAME:llama-stack}
+          sinks: ${env.TELEMETRY_SINKS:console, otel_trace, sqlite} <-add otel_trace and/or otel_metric
+          otel_trace_endpoint: ${env.OTEL_TRACE_ENDPOINT:} <-add ONLY if opentelemetry receiver endpoint is available.
+---
+```
+And, in [kubernetes/llama-stack/deployment.yaml](../llama-stack/deployment.yaml)
+
+```yaml
+---
+        env:
+        - name: OTEL_SERVICE_NAME
+          value: llamastack
+        - name: OTEL_TRACE_ENDPOINT
+          value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/traces
+       #-  name: OTEL_METRIC_ENDPOINT
+       #-  value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/metrics
+---
+```
+
+The otel-endpoint is `http://service-name-otc.namespace-of-otc.svc.cluster.local:4318/v1/traces,metrics` if exporting to
+central otel-collector. If using otel-collector sidecar, this would be `http://localhost:4318/v1/traces,metrics`.
+
+Now that vLLM and Llamastack are configured to generate and export telemetry, follow the [observability-hub guide](./README.md) to view and analyze
+the data.
diff --git a/kubernetes/observability/vllm-Containerfile b/kubernetes/observability/vllm-Containerfile
@@ -0,0 +1,9 @@
+# Use the vllm-openai image as the base
+FROM docker.io/vllm/vllm-openai:latest
+
+# Install OpenTelemetry packages
+RUN pip install \
+    "opentelemetry-sdk>=1.26.0,<1.27.0" \
+    "opentelemetry-api>=1.26.0,<1.27.0" \
+    "opentelemetry-exporter-otlp>=1.26.0,<1.27.0" \
+    "opentelemetry-semantic-conventions-ai>=0.4.1,<0.5.0"