Skip to content

Commit 2bc9b22

Browse files
authored
Merge pull request #57 from sallyom/observability-hub
add manifests and documentation for observability
2 parents c9f101b + 4cca9b0 commit 2bc9b22

32 files changed

Lines changed: 3218 additions & 3 deletions

kubernetes/llama-stack/deployment.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ spec:
88
app: llamastack
99
template:
1010
metadata:
11-
#annotations:
12-
# sidecar.opentelemetry.io/inject: otelsidecar
11+
annotations:
12+
sidecar.opentelemetry.io/inject: llamastack-otelsidecar
1313
labels:
1414
app: llamastack
1515
spec:
@@ -47,7 +47,7 @@ spec:
4747
- name: OTEL_SERVICE_NAME
4848
value: om-llamastack
4949
- name: OTEL_TRACE_ENDPOINT
50-
value: 'http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/traces'
50+
value: 'http://localhost:4318/v1/traces'
5151
- name: SAFETY_MODEL
5252
value: meta-llama/Llama-Guard-3-8B
5353
- name: SAFETY_VLLM_URL
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Once this exists, any pod with the template.metadata.annotation below will send metrics
2+
# to observability-hub:
3+
# sidecar.opentelemetry.io/inject: llamastack-otelsidecar
4+
apiVersion: opentelemetry.io/v1beta1
5+
kind: OpenTelemetryCollector
6+
metadata:
7+
name: llamastack-otelsidecar
8+
spec:
9+
observability:
10+
metrics: {}
11+
deploymentUpdateStrategy: {}
12+
config:
13+
exporters:
14+
debug: {}
15+
otlphttp:
16+
# all sidecars export to the central observability-hub otel-collector, then be
17+
# exported to various backends from there (in-cluster, external 3rd party)
18+
# this is deployed with ../observability/otel-collector manifests
19+
# see ../observability/README.md for how to deploy this collector
20+
endpoint: 'http://otel-collector-collector.observability-hub.svc.cluster.local:4318'
21+
tls:
22+
insecure: true
23+
processors: {}
24+
receivers:
25+
otlp:
26+
protocols:
27+
grpc: {}
28+
http: {}
29+
service:
30+
pipelines:
31+
traces:
32+
exporters:
33+
- debug
34+
- otlphttp
35+
receivers:
36+
- otlp
37+
telemetry:
38+
metrics:
39+
address: '0.0.0.0:8888'
40+
mode: sidecar
41+
resources: {}
42+
podDnsConfig: {}
43+
managementState: managed
44+
upgradeStrategy: automatic
45+
ingress:
46+
route: {}
47+
daemonSetUpdateStrategy: {}
48+
targetAllocator:
49+
allocationStrategy: consistent-hashing
50+
filterStrategy: relabel-config
51+
observability:
52+
metrics: {}
53+
prometheusCR:
54+
scrapeInterval: 30s
55+
resources: {}
56+
replicas: 1
57+
ipFamilyPolicy: SingleStack
58+

kubernetes/observability/README.md

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
# Monitor Llamastack & vLLM in OpenShift
2+
3+
Follow this README to configure an observability stack in OpenShift to visualize Llamastack telemetry and vLLM metrics.
4+
First, ensure Llamastack and vLLM are configured to generate telemetry by following this [configuration guide](./run-configuration.md)
5+
6+
7+
## OpenShift Observability Operators
8+
9+
Operators are available from OperatorHub
10+
The following operators must be installed in order to proceed with this example.
11+
12+
### Operator descriptions
13+
14+
1. **Red Hat Build of OpenTelemetry**: The OpenTelemetry Collector (OTC) is provided from this operator.
15+
Metrics and traces will be distributed from the OTC to various backends. Tempo is deployed and is the tracing backend.
16+
17+
2. **Tempo Operator**: Provides `TempoStack` Custom Resource. This is the backend for distributed tracing.
18+
An S3-compatible storage (Minio) is paired with Tempo.
19+
20+
3. **Cluster Observability Operator**: This provides PodMonitor and ServiceMonitor Custom Resources which are necessary for
21+
user-workload monitoring's prometheus to scrape workload metrics. Also, the COO provides UIPlugins for viewing telemetry.
22+
23+
3. **(optional) Grafana Operator**: Provides Grafana APIs including `GrafanaDashboard`, `Grafana`, and `GrafanaDataSource` that will be used to visualize telemetry.
24+
25+
## Create PodMonitor or ServiceMonitor for any AI Workload that exposes a metrics endpoint
26+
27+
This is how to enable collection of user-workload metrics for any workload within OpenShift. You need to create a `PodMonitor` or a `ServiceMonitor`.
28+
The PodMonitor will ensure all metrics from pods with matching selectors will be scraped by the user-workload-monitoring Prometheus, and a ServiceMonitor will
29+
scrape from any pod that runs under a particular service.
30+
31+
* [Example PodMonitor](./podmonitor-example-0.yaml)
32+
* [Example ServiceMonitor](./servicemonitor-example.yaml)
33+
34+
Upon creation of either, metrics will be scraped and will be visible from the console `Observe -> Metrics` dashboards.
35+
36+
## Create custom resources and configurations for a central observability hub
37+
38+
Create the observablity hub namespace `observability-hub`. If a different namespace is created, be sure to update the resource yamls accordingly.
39+
40+
```bash
41+
oc create ns observability-hub
42+
```
43+
44+
### Tracing Backend (Tempo with Minio for S3 storage)
45+
46+
In order to view distributed tracing data from LLamastack and/or vLLM, you must deploy a tracing backend. The supported tracing backend in OpenShift
47+
is Tempo. See the OpenShift Tempo
48+
[documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/distributed_tracing/distributed-tracing-platform-tempo#distr-tracing-tempo-install-tempostack-web-console_dist-tracing-tempo-installing)
49+
for further details. Tempo must be paired with a storage solution. For this example, `MinIO` is used. The necessary resources can be created by
50+
applying the `./tempo` manifests.
51+
52+
```bash
53+
# edit storageclassName & secret as necessary
54+
# secret and storage for testing only
55+
oc apply --kustomize ./tempo -n observability-hub
56+
```
57+
58+
### OpenTelemetryCollector deployment
59+
60+
OpenTelemetry Collector is used to aggregate telemetry from various workloads, process individual signals, and export
61+
to various backends. This example will collect traces from various workloads and export all as a single
62+
authenticated stream to the in-cluster TempoStack. For in-cluster only, opentelemetry-collector is not necessary to collect
63+
metrics. Metrics are sent to the in-cluster user-workload-monitoring prometheus by creating the podmonitors and servicemonitors.
64+
However, if exporting off-cluster to a 3rd party observability vendor, the collector is necessary for all signals,
65+
and can provide a single place with which to receive telemetry from various workloads and export as a single authenticated and
66+
secure OTLP stream.
67+
68+
#### Central OpenTelemetry Collector
69+
70+
To create a central opentelemetry-collector, update the
71+
[otel-collector/otel-collector.yaml](./otel-collector/otel-collector.yaml) to match your requirements and then apply.
72+
73+
```bash
74+
oc apply --kustomize ./otel-collector -n observability-hub
75+
```
76+
77+
#### OpenTelemetryCollector Sidecars deployment
78+
79+
You can add individual metrics endpoints to the central otel-collector in observability-hub, but
80+
another way is to add otel-collector sidecar containers to individual deployments throughout the
81+
cluster. Paired with an annotation on the deployment, telemetry will be exported as configured.
82+
83+
Any deployment with the template.metadata.annotations `sidecar.opentelemetry.io/inject: vllm-otelsidecar`
84+
will receive and export telemetry as configured in the
85+
[otel-collector-vllm-sidecar.yaml](./otel-collector/otel-collector-vllm-sidecar.yaml).
86+
87+
Any deployment with the template.metadata.annotations `sidecar.opentelemetry.io/inject: llamastack-otelsidecar`
88+
will receive and export telemetry as configured in the
89+
[otel-collector-llamstack-sidecar.yaml](./otel-collector/otel-collector-llamastack-sidecar.yaml).
90+
91+
The example below will add otel-collector sidecar custom resources to the `llama-serve` namespace,
92+
and upon a scale down, scale up of the deployments with the added annotations, sidecar otel-collector
93+
containers will be added to the pods.
94+
95+
```bash
96+
oc apply -f ./otel-collector/otel-collector-vllm-sidecar.yaml -n llama-serve
97+
oc apply -f ./otel-collector/otel-collector-llamastack-sidecar.yaml -n llama-serve
98+
99+
# Then, annotate whatever deployment you'd like to collect telemetry from
100+
# Add the annotation to the deployment's `template.metadata.annotations` from the console.
101+
# OR
102+
# Patch or modify the llamastack and vLLM deployments with the appropriate annotation.
103+
# Replace `deployment-name`, `namespace`, and `name-of-otelsideccar` in the below command.
104+
105+
oc patch deployment deployment-name \
106+
-n namespace \
107+
--type='merge' \
108+
-p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.opentelemetry.io/inject":"name-of-otelsidecar"}}}}}'
109+
```
110+
111+
### Cluster Observability Operator Tracing UIPlugin
112+
113+
The Jaeger frontend feature of TempoStack is no longer supported by Red Hat. This has been replaced by the COO UIPlugin. To create the UIPlugin for
114+
Tracing, first ensure the TempoStack described above is created. This is a prerequisite. Then, all that's necessary to view traces from
115+
the OpenShift console at `Observe -> Traces` is to create the following [Tracing UIPlugin resource](./tracing-ui-plugin.yaml).
116+
117+
```bash
118+
oc apply ./tracing-ui-plugin.yaml
119+
```
120+
121+
You should now see traces and metrics in the OpenShift console, from the `Oberve` tab.
122+
123+
### Grafana
124+
125+
Most users are familiar with Grafana for visualizing and analyzing telemetry. To create the Grafana resources necessary to view
126+
Llamastack and vLLM telemetry, follow the below example.
127+
128+
This example will deploy a Grafana instance, and Prometheus & Tempo DataSources
129+
The prometheus datasource is the user-workload-monitoring prometheus running in `openshift-user-workload-monitoring` namespace.
130+
The Grafana console is configured with `username: rhel, password: rhel`
131+
132+
```bash
133+
oc apply -k ./grafana/instance-with-prom-tempo-ds
134+
```
135+
136+
Upon success, you can explore metrics and traces from Grafana route.
137+
138+
#### GrafanaDashboard to visualize cluster metrics and traces
139+
140+
Check out [github.com/kevchu3/openshift-4-grafana](https://github.com/kevchu3/openshift4-grafana/tree/master/dashboards/crds) for a list of
141+
dashboards to deploy on OpenShift.
142+
143+
Here's an example to download and deploy a GrafanaDashboard for OpenShift 4.16 cluster metrics.
144+
The dashboard is slightly modified from https://github.com/kevchu3/openshift4-grafana/blob/master/dashboards/json_raw/cluster_metrics.ocp416.json
145+
146+
```bash
147+
oc apply -n observability-hub -f cluster-metrics-dashboard/cluster-metrics.yaml
148+
```
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
kind: GrafanaDashboard
2+
apiVersion: grafana.integreatly.org/v1beta1
3+
metadata:
4+
name: cluster-metrics
5+
labels:
6+
app: grafana
7+
spec:
8+
instanceSelector:
9+
matchLabels:
10+
dashboards: grafana # This label matches the grafana Grafana instance
11+
# This json was copied and modified from https://github.com/kevchu3/openshift4-grafana/blob/master/dashboards/json_raw/cluster_metrics.ocp416.json
12+
url: https://raw.githubusercontent.com/redhat-et/edge-ocp-observability/refs/heads/main/observability-hub/grafana/cluster-metrics-dashboard/cluster_metrics_ocp.json
13+

0 commit comments

Comments
 (0)