update to add vLLM tracing guide

sallyom · sallyom · commit 5e45f76fd5aa · 2025-04-07T16:40:41.000-04:00
Signed-off-by: sallyom &lt;somalley@redhat.com&gt;
diff --git a/kubernetes/observability/README.md b/kubernetes/observability/README.md
@@ -1,53 +1,8 @@
 # Monitor Llamastack & vLLM in OpenShift
 
 Follow this README to configure an observability stack in OpenShift to visualize Llamastack telemetry and vLLM metrics.
+First, ensure Llamastack and vLLM are configured to generate telemetry by following this [configuration guide](./run-configuration.md)
 
-## Generate telemetry from Llamastack and vLLM
-
-### vLLM
-
-For vLLM, metrics are generated by default and are exposed at `vllm-endpoint:port/metrics`. For a list of metrics,
-you can `curl localhost:8000/metrics` from within a vLLM container.
-
-### Llamastack
-
-With Llamastack, you need to specify in the run-config.yaml to enable telemetry collection with an opentelemetry receiver.
-Here's how to do that:
-
-#### Updated manifests for telemetry trace collection with opentelemetry receiver endpoint
-
-This is for traces only. There is a similar `otel_metric` sink and `otel_metric_endpoint`, however, there are currently
-only 4 metrics generated within Llamastack, and these are duplicates of what vLLM provides.
-
-[kubernetes/llama-stack/configmap.yaml](../llama-stack/configmap.yaml)
-
-```yaml
----
-      telemetry:
-      - provider_id: meta-reference
-        provider_type: inline::meta-reference
-        config:
-          service_name: ${env.OTEL_SERVICE_NAME:llama-stack}
-          sinks: ${env.TELEMETRY_SINKS:console, otel_trace, sqlite} <-add otel_trace and/or otel_metric
-          otel_trace_endpoint: ${env.OTEL_TRACE_ENDPOINT:} <-add ONLY if opentelemetry receiver endpoint is available.
----
-```
-And, in [kubernetes/llama-stack/deployment.yaml](../llama-stack/deployment.yaml)
-
-```yaml
----
-        env:
-        - name: OTEL_SERVICE_NAME
-          value: llamastack
-        - name: OTEL_TRACE_ENDPOINT
-          value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/traces
-       #-  name: OTEL_METRIC_ENDPOINT
-       #-  value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/metrics
----
-```
-
-The otel-endpoint is `http://service-name-otc.namespace-of-otc.svc.cluster.local:4318/v1/traces,metrics` if exporting to
-central otel-collector. If using otel-collector sidecar, this would be `http://localhost:4318/v1/traces,metrics`.
 
 ## OpenShift Observability Operators
 
@@ -88,6 +43,12 @@ oc create ns observability-hub
 
 ### Tracing Backend (Tempo with Minio for S3 storage)
 
+In order to view distributed tracing data from LLamastack and/or vLLM, you must deploy a tracing backend. The supported tracing backend in OpenShift
+is Tempo. See the OpenShift Tempo
+[documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/distributed_tracing/distributed-tracing-platform-tempo#distr-tracing-tempo-install-tempostack-web-console_dist-tracing-tempo-installing)
+for further details. Tempo must be paired with a storage solution. For this example, `MinIO` is used. The necessary resources can be created by
+applying the `./tempo` manifests. 
+
 ```bash
 # edit storageclassName & secret as necessary
 # secret and storage for testing only
@@ -97,7 +58,7 @@ oc apply --kustomize ./tempo -n observability-hub
 ### OpenTelemetryCollector deployment
 
 OpenTelemetry Collector is used to aggregate telemetry from various workloads, process individual signals, and export
-to various backends. This is used to collect traces from various workloads and export all as a single
+to various backends. This example will collect traces from various workloads and export all as a single
 authenticated stream to the in-cluster TempoStack. For in-cluster only, opentelemetry-collector is not necessary to collect
 metrics. Metrics are sent to the in-cluster user-workload-monitoring prometheus by creating the podmonitors and servicemonitors.
 However, if exporting off-cluster to a 3rd party observability vendor, the collector is necessary for all signals,
@@ -134,9 +95,24 @@ oc patch deployment <deployment-name> \
   -p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.opentelemetry.io/inject":"vllm-otelsidecar"}}}}}'
 ```
 
+### Cluster Observability Operator Tracing UIPlugin
+
+The Jaeger frontend feature of TempoStack is no longer supported by Red Hat. This has been replaced by the COO UIPlugin. To create the UIPlugin for
+Tracing, first ensure the TempoStack described above is created. This is a prerequisite. Then, all that's necessary to view traces from
+the OpenShift console at `Observe -> Traces` is to create the following [Tracing UIPlugin resource](./tracing-ui-plugin.yaml). 
+
+```bash
+oc apply ./tracing-ui-plugin.yaml
+```
+
+You should now see traces and metrics in the OpenShift console, from the `Oberve` tab. 
+
 ### Grafana 
 
-This will deploy a Grafana instance, and Prometheus & Tempo DataSources
+Most users are familiar with Grafana for visualizing and analyzing telemetry. To create the Grafana resources necessary to view
+Llamastack and vLLM telemetry, follow the below example.
+
+This example will deploy a Grafana instance, and Prometheus & Tempo DataSources
 The prometheus datasource is the user-workload-monitoring prometheus running in `openshift-user-workload-monitoring` namespace.
 The Grafana console is configured with `username: rhel, password: rhel`
 
@@ -157,13 +133,3 @@ The dashboard is slightly modified from https://github.com/kevchu3/openshift4-gr
 ```bash
 oc apply -n observability-hub -f cluster-metrics-dashboard/cluster-metrics.yaml 
 ```
-
-### Cluster Observability Operator Tracing UIPlugin
-
-The Jaeger frontend feature of TempoStack is no longer supported by Red Hat. This has been replaced by the COO UIPlugin. To create the UIPlugin for
-Tracing, first ensure the TempoStack described above is created. This is a prerequisite. Then, all that's necessary to view traces from
-the OpenShift console at `Observe -> Traces` is to create the following [Tracing UIPlugin resource](./tracing-ui-plugin.yaml). 
-
-```bash
-oc apply ./tracing-ui-plugin.yaml
-```
diff --git a/kubernetes/observability/otel-collector/otel-collector-vllm-sidecar.yaml b/kubernetes/observability/otel-collector/otel-collector-vllm-sidecar.yaml
@@ -21,22 +21,33 @@ spec:
           insecure: true
     processors: {}
     receivers:
+      otlp:
+        protocols:
+          grpc: {}
+          http: {}
       prometheus:
         config:
           scrape_configs:
             - job_name: vllm-sidecar
-              scrape_interval: 5s
+              scrape_interval: 15s
               static_configs:
                 - targets:
                     - 'localhost:8000'
     service:
       pipelines:
+        traces:
+          exporters:
+            - debug
+            - otlphttp
+          receivers:
+            - otlp
         metrics:
           exporters:
             - debug
             - otlphttp
           receivers:
             - prometheus
+            - otlp
       telemetry:
         metrics:
           address: '0.0.0.0:8888'
diff --git a/kubernetes/observability/otel-collector/otel-collector.yaml b/kubernetes/observability/otel-collector/otel-collector.yaml
@@ -28,13 +28,6 @@ spec:
           authenticator: bearertokenauth
         headers:
           X-Scope-OrgID: "dev"
-      # cluster user-workload monitoring prometheus backend
-      #prometheus/ocp-uwm:
-      #  add_metric_suffixes: false
-      #  endpoint: 0.0.0.0:8889
-      #  metric_expiration: 180m
-      #  resource_to_telemetry_conversion:
-      #    enabled: true
 
     receivers:
       prometheus:
diff --git a/kubernetes/observability/run-configuration.md b/kubernetes/observability/run-configuration.md
@@ -0,0 +1,107 @@
+## Generate telemetry from Llamastack and vLLM
+
+### vLLM
+
+#### metrics
+
+For vLLM, metrics are generated by default and are exposed at `vllm-endpoint:port/metrics`. For a list of metrics,
+you can `curl localhost:8000/metrics` from within a vLLM container.
+
+#### traces
+
+It's possible to generate vLLM distributed trace data by updating the vLLM image and start command. This [Containerfile](./vllm-Containerfile)
+shows the necessary packages to generate vLLM traces.
+
+Here is how you would build vLLM with the tracing packages:
+
+```bash
+podman build --platform x86_64 -t quay.io/[your-quay-username]/vllm:otlp-tracing -f vllm-Containerfile .
+podman push quay.io/[your-quay-username]/vllm:otlp-tracing
+```
+
+Then, add the following updates to the vLLM deployment.yaml. We'll use the [granite-8b deployment](../llama-serve/granite-8b/vllm.yaml):
+This example assumes there is an OpenTelemetryCollector with sidecar mode in the same namespace.
+See [OpenTelemetryCollector Sidecars Deployment](./README.md#opentelemetrycollector_sidecars_deployment)
+
+
+```yaml
+---
+  template:
+    metadata:
+      labels:
+        app: granite-8b
+      annotations:
+        sidecar.opentelemetry.io/inject: vllm-otelsidecar 
+    spec:
+      containers:
+      - args:
+        - --model
+        - ibm-granite/granite-3.2-8b-instruct
+        - --max-model-len
+        - "128000"
+        - --enable-auto-tool-choice
+        - --chat-template
+        - /app/tool_chat_template_granite.jinja
+        - --tool-call-parser=granite
+        - --otlp-traces-endpoint
+        - 127.0.0.1:4317
+        - --collect-detailed-traces
+        - "all"
+        - --port
+        - "8000"
+        image: 'quay.io/sallyom/vllm:otlp-tracing'
+        env:
+        - name: OTEL_SERVICE_NAME
+          value: "vllm-granite8b"
+        - name: OTEL_EXPORTER_OTLP_TRACES_INSECURE
+          value: "true"
+---
+```
+
+With the updated vLLM image and the updated deployment, distributed trace data will be generated and collected by the opentelemetry-collector
+sidecar container and exported to the central observability-hub as outlined in the [README.md](./README.md) with a `TempoStack` as a tracing backend.
+There is a performance impact with enabling tracing with vLLM, so it's recommended to update the deployment to enable tracing only when debugging to
+avoid the performance impact. A complete list of vLLM engine arguments can be found [here](https://docs.vllm.ai/en/latest/serving/engine_args.html).
+
+### Llamastack
+
+With Llamastack, you need to specify in the run-config.yaml to enable telemetry collection with an opentelemetry receiver.
+Here's how to do that:
+
+#### Updated manifests for telemetry trace collection with opentelemetry receiver endpoint
+
+This is for traces only. There is a similar `otel_metric` sink and `otel_metric_endpoint`, however, there are currently
+only 4 metrics generated within Llamastack, and these are duplicates of what vLLM provides.
+
+[kubernetes/llama-stack/configmap.yaml](../llama-stack/configmap.yaml)
+
+```yaml
+---
+      telemetry:
+      - provider_id: meta-reference
+        provider_type: inline::meta-reference
+        config:
+          service_name: ${env.OTEL_SERVICE_NAME:llama-stack}
+          sinks: ${env.TELEMETRY_SINKS:console, otel_trace, sqlite} <-add otel_trace and/or otel_metric
+          otel_trace_endpoint: ${env.OTEL_TRACE_ENDPOINT:} <-add ONLY if opentelemetry receiver endpoint is available.
+---
+```
+And, in [kubernetes/llama-stack/deployment.yaml](../llama-stack/deployment.yaml)
+
+```yaml
+---
+        env:
+        - name: OTEL_SERVICE_NAME
+          value: llamastack
+        - name: OTEL_TRACE_ENDPOINT
+          value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/traces
+       #-  name: OTEL_METRIC_ENDPOINT
+       #-  value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/metrics
+---
+```
+
+The otel-endpoint is `http://service-name-otc.namespace-of-otc.svc.cluster.local:4318/v1/traces,metrics` if exporting to
+central otel-collector. If using otel-collector sidecar, this would be `http://localhost:4318/v1/traces,metrics`.
+
+Now that vLLM and Llamastack are configured to generate and export telemetry, follow the [observability-hub guide](./README.md) to view and analyze
+the data.
diff --git a/kubernetes/observability/vllm-Containerfile b/kubernetes/observability/vllm-Containerfile
@@ -0,0 +1,9 @@
+# Use the vllm-openai image as the base
+FROM docker.io/vllm/vllm-openai:v0.7.3
+
+# Install OpenTelemetry packages
+RUN pip install \
+    "opentelemetry-sdk>=1.26.0,<1.27.0" \
+    "opentelemetry-api>=1.26.0,<1.27.0" \
+    "opentelemetry-exporter-otlp>=1.26.0,<1.27.0" \
+    "opentelemetry-semantic-conventions-ai>=0.4.1,<0.5.0"