Skip to content

Commit 2b11b5c

Browse files
committed
update to add vLLM tracing guide
Signed-off-by: sallyom <somalley@redhat.com>
1 parent 352fcab commit 2b11b5c

5 files changed

Lines changed: 146 additions & 66 deletions

File tree

kubernetes/observability/README.md

Lines changed: 24 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,8 @@
11
# Monitor Llamastack & vLLM in OpenShift
22

33
Follow this README to configure an observability stack in OpenShift to visualize Llamastack telemetry and vLLM metrics.
4+
First, ensure Llamastack and vLLM are configured to generate telemetry by following this [configuration guide](./run-configuration.md)
45

5-
## Generate telemetry from Llamastack and vLLM
6-
7-
### vLLM
8-
9-
For vLLM, metrics are generated by default and are exposed at `vllm-endpoint:port/metrics`. For a list of metrics,
10-
you can `curl localhost:8000/metrics` from within a vLLM container.
11-
12-
### Llamastack
13-
14-
With Llamastack, you need to specify in the run-config.yaml to enable telemetry collection with an opentelemetry receiver.
15-
Here's how to do that:
16-
17-
#### Updated manifests for telemetry trace collection with opentelemetry receiver endpoint
18-
19-
This is for traces only. There is a similar `otel_metric` sink and `otel_metric_endpoint`, however, there are currently
20-
only 4 metrics generated within Llamastack, and these are duplicates of what vLLM provides.
21-
22-
[kubernetes/llama-stack/configmap.yaml](../llama-stack/configmap.yaml)
23-
24-
```yaml
25-
---
26-
telemetry:
27-
- provider_id: meta-reference
28-
provider_type: inline::meta-reference
29-
config:
30-
service_name: ${env.OTEL_SERVICE_NAME:llama-stack}
31-
sinks: ${env.TELEMETRY_SINKS:console, otel_trace, sqlite} <-add otel_trace and/or otel_metric
32-
otel_trace_endpoint: ${env.OTEL_TRACE_ENDPOINT:} <-add ONLY if opentelemetry receiver endpoint is available.
33-
---
34-
```
35-
And, in [kubernetes/llama-stack/deployment.yaml](../llama-stack/deployment.yaml)
36-
37-
```yaml
38-
---
39-
env:
40-
- name: OTEL_SERVICE_NAME
41-
value: llamastack
42-
- name: OTEL_TRACE_ENDPOINT
43-
value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/traces
44-
#- name: OTEL_METRIC_ENDPOINT
45-
#- value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/metrics
46-
---
47-
```
48-
49-
The otel-endpoint is `http://service-name-otc.namespace-of-otc.svc.cluster.local:4318/v1/traces,metrics` if exporting to
50-
central otel-collector. If using otel-collector sidecar, this would be `http://localhost:4318/v1/traces,metrics`.
516

527
## OpenShift Observability Operators
538

@@ -88,6 +43,12 @@ oc create ns observability-hub
8843

8944
### Tracing Backend (Tempo with Minio for S3 storage)
9045

46+
In order to view distributed tracing data from LLamastack and/or vLLM, you must deploy a tracing backend. The supported tracing backend in OpenShift
47+
is Tempo. See the OpenShift Tempo
48+
[documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/distributed_tracing/distributed-tracing-platform-tempo#distr-tracing-tempo-install-tempostack-web-console_dist-tracing-tempo-installing)
49+
for further details. Tempo must be paired with a storage solution. For this example, `MinIO` is used. The necessary resources can be created by
50+
applying the `./tempo` manifests.
51+
9152
```bash
9253
# edit storageclassName & secret as necessary
9354
# secret and storage for testing only
@@ -97,7 +58,7 @@ oc apply --kustomize ./tempo -n observability-hub
9758
### OpenTelemetryCollector deployment
9859

9960
OpenTelemetry Collector is used to aggregate telemetry from various workloads, process individual signals, and export
100-
to various backends. This is used to collect traces from various workloads and export all as a single
61+
to various backends. This example will collect traces from various workloads and export all as a single
10162
authenticated stream to the in-cluster TempoStack. For in-cluster only, opentelemetry-collector is not necessary to collect
10263
metrics. Metrics are sent to the in-cluster user-workload-monitoring prometheus by creating the podmonitors and servicemonitors.
10364
However, if exporting off-cluster to a 3rd party observability vendor, the collector is necessary for all signals,
@@ -134,9 +95,24 @@ oc patch deployment <deployment-name> \
13495
-p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.opentelemetry.io/inject":"vllm-otelsidecar"}}}}}'
13596
```
13697

98+
### Cluster Observability Operator Tracing UIPlugin
99+
100+
The Jaeger frontend feature of TempoStack is no longer supported by Red Hat. This has been replaced by the COO UIPlugin. To create the UIPlugin for
101+
Tracing, first ensure the TempoStack described above is created. This is a prerequisite. Then, all that's necessary to view traces from
102+
the OpenShift console at `Observe -> Traces` is to create the following [Tracing UIPlugin resource](./tracing-ui-plugin.yaml).
103+
104+
```bash
105+
oc apply ./tracing-ui-plugin.yaml
106+
```
107+
108+
You should now see traces and metrics in the OpenShift console, from the `Oberve` tab.
109+
137110
### Grafana
138111

139-
This will deploy a Grafana instance, and Prometheus & Tempo DataSources
112+
Most users are familiar with Grafana for visualizing and analyzing telemetry. To create the Grafana resources necessary to view
113+
Llamastack and vLLM telemetry, follow the below example.
114+
115+
This example will deploy a Grafana instance, and Prometheus & Tempo DataSources
140116
The prometheus datasource is the user-workload-monitoring prometheus running in `openshift-user-workload-monitoring` namespace.
141117
The Grafana console is configured with `username: rhel, password: rhel`
142118

@@ -157,13 +133,3 @@ The dashboard is slightly modified from https://github.com/kevchu3/openshift4-gr
157133
```bash
158134
oc apply -n observability-hub -f cluster-metrics-dashboard/cluster-metrics.yaml
159135
```
160-
161-
### Cluster Observability Operator Tracing UIPlugin
162-
163-
The Jaeger frontend feature of TempoStack is no longer supported by Red Hat. This has been replaced by the COO UIPlugin. To create the UIPlugin for
164-
Tracing, first ensure the TempoStack described above is created. This is a prerequisite. Then, all that's necessary to view traces from
165-
the OpenShift console at `Observe -> Traces` is to create the following [Tracing UIPlugin resource](./tracing-ui-plugin.yaml).
166-
167-
```bash
168-
oc apply ./tracing-ui-plugin.yaml
169-
```

kubernetes/observability/otel-collector/otel-collector-vllm-sidecar.yaml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,22 +21,33 @@ spec:
2121
insecure: true
2222
processors: {}
2323
receivers:
24+
otlp:
25+
protocols:
26+
grpc: {}
27+
http: {}
2428
prometheus:
2529
config:
2630
scrape_configs:
2731
- job_name: vllm-sidecar
28-
scrape_interval: 5s
32+
scrape_interval: 15s
2933
static_configs:
3034
- targets:
3135
- 'localhost:8000'
3236
service:
3337
pipelines:
38+
traces:
39+
exporters:
40+
- debug
41+
- otlphttp
42+
receivers:
43+
- otlp
3444
metrics:
3545
exporters:
3646
- debug
3747
- otlphttp
3848
receivers:
3949
- prometheus
50+
- otlp
4051
telemetry:
4152
metrics:
4253
address: '0.0.0.0:8888'

kubernetes/observability/otel-collector/otel-collector.yaml

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,6 @@ spec:
2828
authenticator: bearertokenauth
2929
headers:
3030
X-Scope-OrgID: "dev"
31-
# cluster user-workload monitoring prometheus backend
32-
#prometheus/ocp-uwm:
33-
# add_metric_suffixes: false
34-
# endpoint: 0.0.0.0:8889
35-
# metric_expiration: 180m
36-
# resource_to_telemetry_conversion:
37-
# enabled: true
3831

3932
receivers:
4033
prometheus:
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
## Generate telemetry from Llamastack and vLLM
2+
3+
### vLLM
4+
5+
#### metrics
6+
7+
For vLLM, metrics are generated by default and are exposed at `vllm-endpoint:port/metrics`. For a list of metrics,
8+
you can `curl localhost:8000/metrics` from within a vLLM container.
9+
10+
#### traces
11+
12+
It's possible to generate vLLM distributed trace data by updating the vLLM image and start command. This [Containerfile](./vllm-Containerfile)
13+
shows the necessary packages to generate vLLM traces.
14+
15+
Here is how you would build vLLM with the tracing packages:
16+
17+
```bash
18+
podman build --platform x86_64 -t quay.io/[your-quay-username]/vllm:otlp-tracing -f vllm-Containerfile .
19+
podman push quay.io/[your-quay-username]/vllm:otlp-tracing
20+
```
21+
22+
Then, add the following updates to the vLLM deployment.yaml. We'll use the [granite-8b deployment](../llama-serve/granite-8b/vllm.yaml):
23+
This example assumes there is an OpenTelemetryCollector with sidecar mode in the same namespace.
24+
See [OpenTelemetryCollector Sidecars Deployment](./README.md#opentelemetrycollector_sidecars_deployment)
25+
26+
27+
```yaml
28+
---
29+
template:
30+
metadata:
31+
labels:
32+
app: granite-8b
33+
annotations:
34+
sidecar.opentelemetry.io/inject: vllm-otelsidecar
35+
spec:
36+
containers:
37+
- args:
38+
- --model
39+
- ibm-granite/granite-3.2-8b-instruct
40+
- --max-model-len
41+
- "128000"
42+
- --enable-auto-tool-choice
43+
- --chat-template
44+
- /app/tool_chat_template_granite.jinja
45+
- --tool-call-parser=granite
46+
- --otlp-traces-endpoint="grpc://localhost:4317"
47+
- --port
48+
- "8000"
49+
env:
50+
- name: OTEL_SERVICE_NAME
51+
value: "vllm-granite8b"
52+
- name: OTEL_EXPORTER_OTLP_TRACES_INSECURE
53+
value: true
54+
---
55+
```
56+
57+
With the updated vLLM image and the updated deployment, distributed trace data will be generated and collected by the opentelemetry-collector
58+
sidecar container and exported to the central observability-hub as outlined in the [README.md](./README.md) with a `TempoStack` as a tracing backend.
59+
60+
### Llamastack
61+
62+
With Llamastack, you need to specify in the run-config.yaml to enable telemetry collection with an opentelemetry receiver.
63+
Here's how to do that:
64+
65+
#### Updated manifests for telemetry trace collection with opentelemetry receiver endpoint
66+
67+
This is for traces only. There is a similar `otel_metric` sink and `otel_metric_endpoint`, however, there are currently
68+
only 4 metrics generated within Llamastack, and these are duplicates of what vLLM provides.
69+
70+
[kubernetes/llama-stack/configmap.yaml](../llama-stack/configmap.yaml)
71+
72+
```yaml
73+
---
74+
telemetry:
75+
- provider_id: meta-reference
76+
provider_type: inline::meta-reference
77+
config:
78+
service_name: ${env.OTEL_SERVICE_NAME:llama-stack}
79+
sinks: ${env.TELEMETRY_SINKS:console, otel_trace, sqlite} <-add otel_trace and/or otel_metric
80+
otel_trace_endpoint: ${env.OTEL_TRACE_ENDPOINT:} <-add ONLY if opentelemetry receiver endpoint is available.
81+
---
82+
```
83+
And, in [kubernetes/llama-stack/deployment.yaml](../llama-stack/deployment.yaml)
84+
85+
```yaml
86+
---
87+
env:
88+
- name: OTEL_SERVICE_NAME
89+
value: llamastack
90+
- name: OTEL_TRACE_ENDPOINT
91+
value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/traces
92+
#- name: OTEL_METRIC_ENDPOINT
93+
#- value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/metrics
94+
---
95+
```
96+
97+
The otel-endpoint is `http://service-name-otc.namespace-of-otc.svc.cluster.local:4318/v1/traces,metrics` if exporting to
98+
central otel-collector. If using otel-collector sidecar, this would be `http://localhost:4318/v1/traces,metrics`.
99+
100+
Now that vLLM and Llamastack are configured to generate and export telemetry, follow the [observability-hub guide](./README.md) to view and analyze
101+
the data.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Use the vllm-openai image as the base
2+
FROM docker.io/vllm/vllm-openai:v0.7.3
3+
4+
# Install OpenTelemetry packages
5+
RUN pip install \
6+
"opentelemetry-sdk>=1.26.0,<1.27.0" \
7+
"opentelemetry-api>=1.26.0,<1.27.0" \
8+
"opentelemetry-exporter-otlp>=1.26.0,<1.27.0" \
9+
"opentelemetry-semantic-conventions-ai>=0.4.1,<0.5.0"

0 commit comments

Comments
 (0)