|
| 1 | +# Monitor Llamastack & vLLM in OpenShift |
| 2 | + |
| 3 | +Follow this README to configure an observability stack in OpenShift to visualize Llamastack telemetry and vLLM metrics. |
| 4 | +First, ensure Llamastack and vLLM are configured to generate telemetry by following this [configuration guide](./run-configuration.md) |
| 5 | + |
| 6 | + |
| 7 | +## OpenShift Observability Operators |
| 8 | + |
| 9 | +Operators are available from OperatorHub |
| 10 | +The following operators must be installed in order to proceed with this example. |
| 11 | + |
| 12 | +### Operator descriptions |
| 13 | + |
| 14 | +1. **Red Hat Build of OpenTelemetry**: The OpenTelemetry Collector (OTC) is provided from this operator. |
| 15 | +Metrics and traces will be distributed from the OTC to various backends. Tempo is deployed and is the tracing backend. |
| 16 | + |
| 17 | +2. **Tempo Operator**: Provides `TempoStack` Custom Resource. This is the backend for distributed tracing. |
| 18 | +An S3-compatible storage (Minio) is paired with Tempo. |
| 19 | + |
| 20 | +3. **Cluster Observability Operator**: This provides PodMonitor and ServiceMonitor Custom Resources which are necessary for |
| 21 | +user-workload monitoring's prometheus to scrape workload metrics. Also, the COO provides UIPlugins for viewing telemetry. |
| 22 | + |
| 23 | +3. **(optional) Grafana Operator**: Provides Grafana APIs including `GrafanaDashboard`, `Grafana`, and `GrafanaDataSource` that will be used to visualize telemetry. |
| 24 | + |
| 25 | +## Create PodMonitor or ServiceMonitor for any AI Workload that exposes a metrics endpoint |
| 26 | + |
| 27 | +This is how to enable collection of user-workload metrics for any workload within OpenShift. You need to create a `PodMonitor` or a `ServiceMonitor`. |
| 28 | +The PodMonitor will ensure all metrics from pods with matching selectors will be scraped by the user-workload-monitoring Prometheus, and a ServiceMonitor will |
| 29 | +scrape from any pod that runs under a particular service. |
| 30 | + |
| 31 | +* [Example PodMonitor](./podmonitor-example-0.yaml) |
| 32 | +* [Example ServiceMonitor](./servicemonitor-example.yaml) |
| 33 | + |
| 34 | +Upon creation of either, metrics will be scraped and will be visible from the console `Observe -> Metrics` dashboards. |
| 35 | + |
| 36 | +## Create custom resources and configurations for a central observability hub |
| 37 | + |
| 38 | +Create the observablity hub namespace `observability-hub`. If a different namespace is created, be sure to update the resource yamls accordingly. |
| 39 | + |
| 40 | +```bash |
| 41 | +oc create ns observability-hub |
| 42 | +``` |
| 43 | + |
| 44 | +### Tracing Backend (Tempo with Minio for S3 storage) |
| 45 | + |
| 46 | +In order to view distributed tracing data from LLamastack and/or vLLM, you must deploy a tracing backend. The supported tracing backend in OpenShift |
| 47 | +is Tempo. See the OpenShift Tempo |
| 48 | +[documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/distributed_tracing/distributed-tracing-platform-tempo#distr-tracing-tempo-install-tempostack-web-console_dist-tracing-tempo-installing) |
| 49 | +for further details. Tempo must be paired with a storage solution. For this example, `MinIO` is used. The necessary resources can be created by |
| 50 | +applying the `./tempo` manifests. |
| 51 | + |
| 52 | +```bash |
| 53 | +# edit storageclassName & secret as necessary |
| 54 | +# secret and storage for testing only |
| 55 | +oc apply --kustomize ./tempo -n observability-hub |
| 56 | +``` |
| 57 | + |
| 58 | +### OpenTelemetryCollector deployment |
| 59 | + |
| 60 | +OpenTelemetry Collector is used to aggregate telemetry from various workloads, process individual signals, and export |
| 61 | +to various backends. This example will collect traces from various workloads and export all as a single |
| 62 | +authenticated stream to the in-cluster TempoStack. For in-cluster only, opentelemetry-collector is not necessary to collect |
| 63 | +metrics. Metrics are sent to the in-cluster user-workload-monitoring prometheus by creating the podmonitors and servicemonitors. |
| 64 | +However, if exporting off-cluster to a 3rd party observability vendor, the collector is necessary for all signals, |
| 65 | +and can provide a single place with which to receive telemetry from various workloads and export as a single authenticated and |
| 66 | +secure OTLP stream. |
| 67 | + |
| 68 | +#### Central OpenTelemetry Collector |
| 69 | + |
| 70 | +To create a central opentelemetry-collector, update the |
| 71 | +[otel-collector/otel-collector.yaml](./otel-collector/otel-collector.yaml) to match your requirements and then apply. |
| 72 | + |
| 73 | +```bash |
| 74 | +oc apply --kustomize ./otel-collector -n observability-hub |
| 75 | +``` |
| 76 | + |
| 77 | +#### OpenTelemetryCollector Sidecars deployment |
| 78 | + |
| 79 | +You can add individual metrics endpoints to the central otel-collector in observability-hub, but |
| 80 | +another way is to add otel-collector sidecar containers to individual deployments throughout the |
| 81 | +cluster. Paired with an annotation on the deployment, telemetry will be exported as configured. |
| 82 | + |
| 83 | +Any deployment with the template.metadata.annotations `sidecar.opentelemetry.io/inject: vllm-otelsidecar` |
| 84 | +will receive and export telemetry as configured in the |
| 85 | +[otel-collector-vllm-sidecar.yaml](./otel-collector/otel-collector-vllm-sidecar.yaml). |
| 86 | + |
| 87 | +Any deployment with the template.metadata.annotations `sidecar.opentelemetry.io/inject: llamastack-otelsidecar` |
| 88 | +will receive and export telemetry as configured in the |
| 89 | +[otel-collector-llamstack-sidecar.yaml](./otel-collector/otel-collector-llamastack-sidecar.yaml). |
| 90 | + |
| 91 | +The example below will add otel-collector sidecar custom resources to the `llama-serve` namespace, |
| 92 | +and upon a scale down, scale up of the deployments with the added annotations, sidecar otel-collector |
| 93 | +containers will be added to the pods. |
| 94 | + |
| 95 | +```bash |
| 96 | +oc apply -f ./otel-collector/otel-collector-vllm-sidecar.yaml -n llama-serve |
| 97 | +oc apply -f ./otel-collector/otel-collector-llamastack-sidecar.yaml -n llama-serve |
| 98 | + |
| 99 | +# Then, annotate whatever deployment you'd like to collect telemetry from |
| 100 | +# Add the annotation to the deployment's `template.metadata.annotations` from the console. |
| 101 | +# OR |
| 102 | +# Patch or modify the llamastack and vLLM deployments with the appropriate annotation. |
| 103 | +# Replace `deployment-name`, `namespace`, and `name-of-otelsideccar` in the below command. |
| 104 | + |
| 105 | +oc patch deployment deployment-name \ |
| 106 | + -n namespace \ |
| 107 | + --type='merge' \ |
| 108 | + -p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.opentelemetry.io/inject":"name-of-otelsidecar"}}}}}' |
| 109 | +``` |
| 110 | + |
| 111 | +### Cluster Observability Operator Tracing UIPlugin |
| 112 | + |
| 113 | +The Jaeger frontend feature of TempoStack is no longer supported by Red Hat. This has been replaced by the COO UIPlugin. To create the UIPlugin for |
| 114 | +Tracing, first ensure the TempoStack described above is created. This is a prerequisite. Then, all that's necessary to view traces from |
| 115 | +the OpenShift console at `Observe -> Traces` is to create the following [Tracing UIPlugin resource](./tracing-ui-plugin.yaml). |
| 116 | + |
| 117 | +```bash |
| 118 | +oc apply ./tracing-ui-plugin.yaml |
| 119 | +``` |
| 120 | + |
| 121 | +You should now see traces and metrics in the OpenShift console, from the `Oberve` tab. |
| 122 | + |
| 123 | +### Grafana |
| 124 | + |
| 125 | +Most users are familiar with Grafana for visualizing and analyzing telemetry. To create the Grafana resources necessary to view |
| 126 | +Llamastack and vLLM telemetry, follow the below example. |
| 127 | + |
| 128 | +This example will deploy a Grafana instance, and Prometheus & Tempo DataSources |
| 129 | +The prometheus datasource is the user-workload-monitoring prometheus running in `openshift-user-workload-monitoring` namespace. |
| 130 | +The Grafana console is configured with `username: rhel, password: rhel` |
| 131 | + |
| 132 | +```bash |
| 133 | +oc apply -k ./grafana/instance-with-prom-tempo-ds |
| 134 | +``` |
| 135 | + |
| 136 | +Upon success, you can explore metrics and traces from Grafana route. |
| 137 | + |
| 138 | +#### GrafanaDashboard to visualize cluster metrics and traces |
| 139 | + |
| 140 | +Check out [github.com/kevchu3/openshift-4-grafana](https://github.com/kevchu3/openshift4-grafana/tree/master/dashboards/crds) for a list of |
| 141 | +dashboards to deploy on OpenShift. |
| 142 | + |
| 143 | +Here's an example to download and deploy a GrafanaDashboard for OpenShift 4.16 cluster metrics. |
| 144 | +The dashboard is slightly modified from https://github.com/kevchu3/openshift4-grafana/blob/master/dashboards/json_raw/cluster_metrics.ocp416.json |
| 145 | + |
| 146 | +```bash |
| 147 | +oc apply -n observability-hub -f cluster-metrics-dashboard/cluster-metrics.yaml |
| 148 | +``` |
0 commit comments