ogx-ai
diff --git a/‎kubernetes/llama-stack/deployment.yaml‎
Lines changed: 3 additions & 3 deletions b/‎kubernetes/llama-stack/deployment.yaml‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎kubernetes/llama-stack/otel-collector-sidecar.yaml‎
Lines changed: 58 additions & 0 deletions b/‎kubernetes/llama-stack/otel-collector-sidecar.yaml‎
Lines changed: 58 additions & 0 deletions
diff --git a/‎kubernetes/observability/README.md‎
Lines changed: 148 additions & 0 deletions b/‎kubernetes/observability/README.md‎
Lines changed: 148 additions & 0 deletions
diff --git a/‎kubernetes/observability/grafana/cluster-metrics-dashboard/cluster-metrics.yaml‎
Lines changed: 13 additions & 0 deletions b/‎kubernetes/observability/grafana/cluster-metrics-dashboard/cluster-metrics.yaml‎
Lines changed: 13 additions & 0 deletions
@@ -8,8 +8,8 @@ spec:
       app: llamastack
   template:
     metadata:
-      #annotations:
-      #  sidecar.opentelemetry.io/inject: otelsidecar
+      annotations:
+        sidecar.opentelemetry.io/inject: llamastack-otelsidecar
       labels:
         app: llamastack
     spec:
@@ -47,7 +47,7 @@ spec:
             - name: OTEL_SERVICE_NAME
               value: om-llamastack
             - name: OTEL_TRACE_ENDPOINT
-              value: 'http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/traces'
+              value: 'http://localhost:4318/v1/traces'
             - name: SAFETY_MODEL
               value: meta-llama/Llama-Guard-3-8B
             - name: SAFETY_VLLM_URL
 
@@ -0,0 +1,58 @@
+# Once this exists, any pod with the template.metadata.annotation below will send metrics
+# to observability-hub:
+# sidecar.opentelemetry.io/inject: llamastack-otelsidecar
+apiVersion: opentelemetry.io/v1beta1
+kind: OpenTelemetryCollector
+metadata:
+  name: llamastack-otelsidecar
+spec:
+  observability:
+    metrics: {}
+  deploymentUpdateStrategy: {}
+  config:
+    exporters:
+      debug: {}
+      otlphttp:
+        # all sidecars export to the central observability-hub otel-collector, then be
+        # exported to various backends from there (in-cluster, external 3rd party)
+        # this is deployed with ../observability/otel-collector manifests
+        # see ../observability/README.md for how to deploy this collector
+        endpoint: 'http://otel-collector-collector.observability-hub.svc.cluster.local:4318'
+        tls:
+          insecure: true
+    processors: {}
+    receivers:
+      otlp:
+        protocols:
+          grpc: {}
+          http: {}
+    service:
+      pipelines:
+        traces:
+          exporters:
+            - debug
+            - otlphttp
+          receivers:
+            - otlp
+      telemetry:
+        metrics:
+          address: '0.0.0.0:8888'
+  mode: sidecar
+  resources: {}
+  podDnsConfig: {}
+  managementState: managed
+  upgradeStrategy: automatic
+  ingress:
+    route: {}
+  daemonSetUpdateStrategy: {}
+  targetAllocator:
+    allocationStrategy: consistent-hashing
+    filterStrategy: relabel-config
+    observability:
+      metrics: {}
+    prometheusCR:
+      scrapeInterval: 30s
+    resources: {}
+  replicas: 1
+  ipFamilyPolicy: SingleStack
+
@@ -0,0 +1,148 @@
+# Monitor Llamastack & vLLM in OpenShift
+
+Follow this README to configure an observability stack in OpenShift to visualize Llamastack telemetry and vLLM metrics.
+First, ensure Llamastack and vLLM are configured to generate telemetry by following this [configuration guide](./run-configuration.md)
+
+
+## OpenShift Observability Operators
+
+Operators are available from OperatorHub
+The following operators must be installed in order to proceed with this example.
+
+### Operator descriptions
+
+1. **Red Hat Build of OpenTelemetry**: The OpenTelemetry Collector (OTC) is provided from this operator.
+Metrics and traces will be distributed from the OTC to various backends. Tempo is deployed and is the tracing backend.
+
+2. **Tempo Operator**: Provides `TempoStack` Custom Resource. This is the backend for distributed tracing.
+An S3-compatible storage (Minio) is paired with Tempo.
+
+3. **Cluster Observability Operator**: This provides PodMonitor and ServiceMonitor Custom Resources which are necessary for 
+user-workload monitoring's prometheus to scrape workload metrics. Also, the COO provides UIPlugins for viewing telemetry. 
+
+3. **(optional) Grafana Operator**: Provides Grafana APIs including `GrafanaDashboard`, `Grafana`, and `GrafanaDataSource` that will be used to visualize telemetry.
+
+## Create PodMonitor or ServiceMonitor for any AI Workload that exposes a metrics endpoint
+
+This is how to enable collection of user-workload metrics for any workload within OpenShift. You need to create a `PodMonitor` or a `ServiceMonitor`.
+The PodMonitor will ensure all metrics from pods with matching selectors will be scraped by the user-workload-monitoring Prometheus, and a ServiceMonitor will
+scrape from any pod that runs under a particular service.
+
+* [Example PodMonitor](./podmonitor-example-0.yaml)
+* [Example ServiceMonitor](./servicemonitor-example.yaml)
+
+Upon creation of either, metrics will be scraped and will be visible from the console `Observe -> Metrics` dashboards.
+
+## Create custom resources and configurations for a central observability hub
+
+Create the observablity hub namespace `observability-hub`. If a different namespace is created, be sure to update the resource yamls accordingly.
+
+```bash
+oc create ns observability-hub
+```
+
+### Tracing Backend (Tempo with Minio for S3 storage)
+
+In order to view distributed tracing data from LLamastack and/or vLLM, you must deploy a tracing backend. The supported tracing backend in OpenShift
+is Tempo. See the OpenShift Tempo
+[documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/distributed_tracing/distributed-tracing-platform-tempo#distr-tracing-tempo-install-tempostack-web-console_dist-tracing-tempo-installing)
+for further details. Tempo must be paired with a storage solution. For this example, `MinIO` is used. The necessary resources can be created by
+applying the `./tempo` manifests. 
+
+```bash
+# edit storageclassName & secret as necessary
+# secret and storage for testing only
+oc apply --kustomize ./tempo -n observability-hub
+```
+
+### OpenTelemetryCollector deployment
+
+OpenTelemetry Collector is used to aggregate telemetry from various workloads, process individual signals, and export
+to various backends. This example will collect traces from various workloads and export all as a single
+authenticated stream to the in-cluster TempoStack. For in-cluster only, opentelemetry-collector is not necessary to collect
+metrics. Metrics are sent to the in-cluster user-workload-monitoring prometheus by creating the podmonitors and servicemonitors.
+However, if exporting off-cluster to a 3rd party observability vendor, the collector is necessary for all signals,
+and can provide a single place with which to receive telemetry from various workloads and export as a single authenticated and
+secure OTLP stream.
+
+#### Central OpenTelemetry Collector
+
+To create a central opentelemetry-collector, update the
+[otel-collector/otel-collector.yaml](./otel-collector/otel-collector.yaml) to match your requirements and then apply.
+
+```bash
+oc apply --kustomize ./otel-collector -n observability-hub
+```
+
+#### OpenTelemetryCollector Sidecars deployment
+
+You can add individual metrics endpoints to the central otel-collector in observability-hub, but
+another way is to add otel-collector sidecar containers to individual deployments throughout the
+cluster. Paired with an annotation on the deployment, telemetry will be exported as configured.
+
+Any deployment with the template.metadata.annotations `sidecar.opentelemetry.io/inject: vllm-otelsidecar`
+will receive and export telemetry as configured in the
+[otel-collector-vllm-sidecar.yaml](./otel-collector/otel-collector-vllm-sidecar.yaml).
+
+Any deployment with the template.metadata.annotations `sidecar.opentelemetry.io/inject: llamastack-otelsidecar`
+will receive and export telemetry as configured in the
+[otel-collector-llamstack-sidecar.yaml](./otel-collector/otel-collector-llamastack-sidecar.yaml).
+
+The example below will add otel-collector sidecar custom resources to the `llama-serve` namespace,
+and upon a scale down, scale up of the deployments with the added annotations, sidecar otel-collector
+containers will be added to the pods.
+
+```bash
+oc apply -f ./otel-collector/otel-collector-vllm-sidecar.yaml -n llama-serve
+oc apply -f ./otel-collector/otel-collector-llamastack-sidecar.yaml -n llama-serve
+
+# Then, annotate whatever deployment you'd like to collect telemetry from
+# Add the annotation to the deployment's `template.metadata.annotations` from the console.
+# OR
+# Patch or modify the llamastack and vLLM deployments with the appropriate annotation.
+# Replace `deployment-name`, `namespace`, and `name-of-otelsideccar` in the below command.
+
+oc patch deployment deployment-name \
+  -n namespace \
+  --type='merge' \
+  -p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.opentelemetry.io/inject":"name-of-otelsidecar"}}}}}'
+```
+
+### Cluster Observability Operator Tracing UIPlugin
+
+The Jaeger frontend feature of TempoStack is no longer supported by Red Hat. This has been replaced by the COO UIPlugin. To create the UIPlugin for
+Tracing, first ensure the TempoStack described above is created. This is a prerequisite. Then, all that's necessary to view traces from
+the OpenShift console at `Observe -> Traces` is to create the following [Tracing UIPlugin resource](./tracing-ui-plugin.yaml). 
+
+```bash
+oc apply ./tracing-ui-plugin.yaml
+```
+
+You should now see traces and metrics in the OpenShift console, from the `Oberve` tab. 
+
+### Grafana 
+
+Most users are familiar with Grafana for visualizing and analyzing telemetry. To create the Grafana resources necessary to view
+Llamastack and vLLM telemetry, follow the below example.
+
+This example will deploy a Grafana instance, and Prometheus & Tempo DataSources
+The prometheus datasource is the user-workload-monitoring prometheus running in `openshift-user-workload-monitoring` namespace.
+The Grafana console is configured with `username: rhel, password: rhel`
+
+```bash
+oc apply -k ./grafana/instance-with-prom-tempo-ds
+```
+
+Upon success, you can explore metrics and traces from Grafana route.
+
+#### GrafanaDashboard to visualize cluster metrics and traces
+
+Check out [github.com/kevchu3/openshift-4-grafana](https://github.com/kevchu3/openshift4-grafana/tree/master/dashboards/crds) for a list of
+dashboards to deploy on OpenShift.
+
+Here's an example to download and deploy a GrafanaDashboard for OpenShift 4.16 cluster metrics.
+The dashboard is slightly modified from https://github.com/kevchu3/openshift4-grafana/blob/master/dashboards/json_raw/cluster_metrics.ocp416.json
+
+```bash
+oc apply -n observability-hub -f cluster-metrics-dashboard/cluster-metrics.yaml 
+```
@@ -0,0 +1,13 @@
+kind: GrafanaDashboard
+apiVersion: grafana.integreatly.org/v1beta1
+metadata:
+  name: cluster-metrics
+  labels:
+    app: grafana
+spec:
+  instanceSelector:
+    matchLabels:
+      dashboards: grafana   # This label matches the grafana Grafana instance
+  # This json was copied and modified from https://github.com/kevchu3/openshift4-grafana/blob/master/dashboards/json_raw/cluster_metrics.ocp416.json
+  url: https://raw.githubusercontent.com/redhat-et/edge-ocp-observability/refs/heads/main/observability-hub/grafana/cluster-metrics-dashboard/cluster_metrics_ocp.json
+