Skip to content

Commit 5e45f76

Browse files
committed
update to add vLLM tracing guide
Signed-off-by: sallyom <somalley@redhat.com>
1 parent 352fcab commit 5e45f76

5 files changed

Lines changed: 152 additions & 66 deletions

File tree

kubernetes/observability/README.md

Lines changed: 24 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,8 @@
11
# Monitor Llamastack & vLLM in OpenShift
22

33
Follow this README to configure an observability stack in OpenShift to visualize Llamastack telemetry and vLLM metrics.
4+
First, ensure Llamastack and vLLM are configured to generate telemetry by following this [configuration guide](./run-configuration.md)
45

5-
## Generate telemetry from Llamastack and vLLM
6-
7-
### vLLM
8-
9-
For vLLM, metrics are generated by default and are exposed at `vllm-endpoint:port/metrics`. For a list of metrics,
10-
you can `curl localhost:8000/metrics` from within a vLLM container.
11-
12-
### Llamastack
13-
14-
With Llamastack, you need to specify in the run-config.yaml to enable telemetry collection with an opentelemetry receiver.
15-
Here's how to do that:
16-
17-
#### Updated manifests for telemetry trace collection with opentelemetry receiver endpoint
18-
19-
This is for traces only. There is a similar `otel_metric` sink and `otel_metric_endpoint`, however, there are currently
20-
only 4 metrics generated within Llamastack, and these are duplicates of what vLLM provides.
21-
22-
[kubernetes/llama-stack/configmap.yaml](../llama-stack/configmap.yaml)
23-
24-
```yaml
25-
---
26-
telemetry:
27-
- provider_id: meta-reference
28-
provider_type: inline::meta-reference
29-
config:
30-
service_name: ${env.OTEL_SERVICE_NAME:llama-stack}
31-
sinks: ${env.TELEMETRY_SINKS:console, otel_trace, sqlite} <-add otel_trace and/or otel_metric
32-
otel_trace_endpoint: ${env.OTEL_TRACE_ENDPOINT:} <-add ONLY if opentelemetry receiver endpoint is available.
33-
---
34-
```
35-
And, in [kubernetes/llama-stack/deployment.yaml](../llama-stack/deployment.yaml)
36-
37-
```yaml
38-
---
39-
env:
40-
- name: OTEL_SERVICE_NAME
41-
value: llamastack
42-
- name: OTEL_TRACE_ENDPOINT
43-
value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/traces
44-
#- name: OTEL_METRIC_ENDPOINT
45-
#- value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/metrics
46-
---
47-
```
48-
49-
The otel-endpoint is `http://service-name-otc.namespace-of-otc.svc.cluster.local:4318/v1/traces,metrics` if exporting to
50-
central otel-collector. If using otel-collector sidecar, this would be `http://localhost:4318/v1/traces,metrics`.
516

527
## OpenShift Observability Operators
538

@@ -88,6 +43,12 @@ oc create ns observability-hub
8843

8944
### Tracing Backend (Tempo with Minio for S3 storage)
9045

46+
In order to view distributed tracing data from LLamastack and/or vLLM, you must deploy a tracing backend. The supported tracing backend in OpenShift
47+
is Tempo. See the OpenShift Tempo
48+
[documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/distributed_tracing/distributed-tracing-platform-tempo#distr-tracing-tempo-install-tempostack-web-console_dist-tracing-tempo-installing)
49+
for further details. Tempo must be paired with a storage solution. For this example, `MinIO` is used. The necessary resources can be created by
50+
applying the `./tempo` manifests.
51+
9152
```bash
9253
# edit storageclassName & secret as necessary
9354
# secret and storage for testing only
@@ -97,7 +58,7 @@ oc apply --kustomize ./tempo -n observability-hub
9758
### OpenTelemetryCollector deployment
9859

9960
OpenTelemetry Collector is used to aggregate telemetry from various workloads, process individual signals, and export
100-
to various backends. This is used to collect traces from various workloads and export all as a single
61+
to various backends. This example will collect traces from various workloads and export all as a single
10162
authenticated stream to the in-cluster TempoStack. For in-cluster only, opentelemetry-collector is not necessary to collect
10263
metrics. Metrics are sent to the in-cluster user-workload-monitoring prometheus by creating the podmonitors and servicemonitors.
10364
However, if exporting off-cluster to a 3rd party observability vendor, the collector is necessary for all signals,
@@ -134,9 +95,24 @@ oc patch deployment <deployment-name> \
13495
-p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.opentelemetry.io/inject":"vllm-otelsidecar"}}}}}'
13596
```
13697

98+
### Cluster Observability Operator Tracing UIPlugin
99+
100+
The Jaeger frontend feature of TempoStack is no longer supported by Red Hat. This has been replaced by the COO UIPlugin. To create the UIPlugin for
101+
Tracing, first ensure the TempoStack described above is created. This is a prerequisite. Then, all that's necessary to view traces from
102+
the OpenShift console at `Observe -> Traces` is to create the following [Tracing UIPlugin resource](./tracing-ui-plugin.yaml).
103+
104+
```bash
105+
oc apply ./tracing-ui-plugin.yaml
106+
```
107+
108+
You should now see traces and metrics in the OpenShift console, from the `Oberve` tab.
109+
137110
### Grafana
138111

139-
This will deploy a Grafana instance, and Prometheus & Tempo DataSources
112+
Most users are familiar with Grafana for visualizing and analyzing telemetry. To create the Grafana resources necessary to view
113+
Llamastack and vLLM telemetry, follow the below example.
114+
115+
This example will deploy a Grafana instance, and Prometheus & Tempo DataSources
140116
The prometheus datasource is the user-workload-monitoring prometheus running in `openshift-user-workload-monitoring` namespace.
141117
The Grafana console is configured with `username: rhel, password: rhel`
142118

@@ -157,13 +133,3 @@ The dashboard is slightly modified from https://github.com/kevchu3/openshift4-gr
157133
```bash
158134
oc apply -n observability-hub -f cluster-metrics-dashboard/cluster-metrics.yaml
159135
```
160-
161-
### Cluster Observability Operator Tracing UIPlugin
162-
163-
The Jaeger frontend feature of TempoStack is no longer supported by Red Hat. This has been replaced by the COO UIPlugin. To create the UIPlugin for
164-
Tracing, first ensure the TempoStack described above is created. This is a prerequisite. Then, all that's necessary to view traces from
165-
the OpenShift console at `Observe -> Traces` is to create the following [Tracing UIPlugin resource](./tracing-ui-plugin.yaml).
166-
167-
```bash
168-
oc apply ./tracing-ui-plugin.yaml
169-
```

kubernetes/observability/otel-collector/otel-collector-vllm-sidecar.yaml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,22 +21,33 @@ spec:
2121
insecure: true
2222
processors: {}
2323
receivers:
24+
otlp:
25+
protocols:
26+
grpc: {}
27+
http: {}
2428
prometheus:
2529
config:
2630
scrape_configs:
2731
- job_name: vllm-sidecar
28-
scrape_interval: 5s
32+
scrape_interval: 15s
2933
static_configs:
3034
- targets:
3135
- 'localhost:8000'
3236
service:
3337
pipelines:
38+
traces:
39+
exporters:
40+
- debug
41+
- otlphttp
42+
receivers:
43+
- otlp
3444
metrics:
3545
exporters:
3646
- debug
3747
- otlphttp
3848
receivers:
3949
- prometheus
50+
- otlp
4051
telemetry:
4152
metrics:
4253
address: '0.0.0.0:8888'

kubernetes/observability/otel-collector/otel-collector.yaml

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,6 @@ spec:
2828
authenticator: bearertokenauth
2929
headers:
3030
X-Scope-OrgID: "dev"
31-
# cluster user-workload monitoring prometheus backend
32-
#prometheus/ocp-uwm:
33-
# add_metric_suffixes: false
34-
# endpoint: 0.0.0.0:8889
35-
# metric_expiration: 180m
36-
# resource_to_telemetry_conversion:
37-
# enabled: true
3831

3932
receivers:
4033
prometheus:
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
## Generate telemetry from Llamastack and vLLM
2+
3+
### vLLM
4+
5+
#### metrics
6+
7+
For vLLM, metrics are generated by default and are exposed at `vllm-endpoint:port/metrics`. For a list of metrics,
8+
you can `curl localhost:8000/metrics` from within a vLLM container.
9+
10+
#### traces
11+
12+
It's possible to generate vLLM distributed trace data by updating the vLLM image and start command. This [Containerfile](./vllm-Containerfile)
13+
shows the necessary packages to generate vLLM traces.
14+
15+
Here is how you would build vLLM with the tracing packages:
16+
17+
```bash
18+
podman build --platform x86_64 -t quay.io/[your-quay-username]/vllm:otlp-tracing -f vllm-Containerfile .
19+
podman push quay.io/[your-quay-username]/vllm:otlp-tracing
20+
```
21+
22+
Then, add the following updates to the vLLM deployment.yaml. We'll use the [granite-8b deployment](../llama-serve/granite-8b/vllm.yaml):
23+
This example assumes there is an OpenTelemetryCollector with sidecar mode in the same namespace.
24+
See [OpenTelemetryCollector Sidecars Deployment](./README.md#opentelemetrycollector_sidecars_deployment)
25+
26+
27+
```yaml
28+
---
29+
template:
30+
metadata:
31+
labels:
32+
app: granite-8b
33+
annotations:
34+
sidecar.opentelemetry.io/inject: vllm-otelsidecar
35+
spec:
36+
containers:
37+
- args:
38+
- --model
39+
- ibm-granite/granite-3.2-8b-instruct
40+
- --max-model-len
41+
- "128000"
42+
- --enable-auto-tool-choice
43+
- --chat-template
44+
- /app/tool_chat_template_granite.jinja
45+
- --tool-call-parser=granite
46+
- --otlp-traces-endpoint
47+
- 127.0.0.1:4317
48+
- --collect-detailed-traces
49+
- "all"
50+
- --port
51+
- "8000"
52+
image: 'quay.io/sallyom/vllm:otlp-tracing'
53+
env:
54+
- name: OTEL_SERVICE_NAME
55+
value: "vllm-granite8b"
56+
- name: OTEL_EXPORTER_OTLP_TRACES_INSECURE
57+
value: "true"
58+
---
59+
```
60+
61+
With the updated vLLM image and the updated deployment, distributed trace data will be generated and collected by the opentelemetry-collector
62+
sidecar container and exported to the central observability-hub as outlined in the [README.md](./README.md) with a `TempoStack` as a tracing backend.
63+
There is a performance impact with enabling tracing with vLLM, so it's recommended to update the deployment to enable tracing only when debugging to
64+
avoid the performance impact. A complete list of vLLM engine arguments can be found [here](https://docs.vllm.ai/en/latest/serving/engine_args.html).
65+
66+
### Llamastack
67+
68+
With Llamastack, you need to specify in the run-config.yaml to enable telemetry collection with an opentelemetry receiver.
69+
Here's how to do that:
70+
71+
#### Updated manifests for telemetry trace collection with opentelemetry receiver endpoint
72+
73+
This is for traces only. There is a similar `otel_metric` sink and `otel_metric_endpoint`, however, there are currently
74+
only 4 metrics generated within Llamastack, and these are duplicates of what vLLM provides.
75+
76+
[kubernetes/llama-stack/configmap.yaml](../llama-stack/configmap.yaml)
77+
78+
```yaml
79+
---
80+
telemetry:
81+
- provider_id: meta-reference
82+
provider_type: inline::meta-reference
83+
config:
84+
service_name: ${env.OTEL_SERVICE_NAME:llama-stack}
85+
sinks: ${env.TELEMETRY_SINKS:console, otel_trace, sqlite} <-add otel_trace and/or otel_metric
86+
otel_trace_endpoint: ${env.OTEL_TRACE_ENDPOINT:} <-add ONLY if opentelemetry receiver endpoint is available.
87+
---
88+
```
89+
And, in [kubernetes/llama-stack/deployment.yaml](../llama-stack/deployment.yaml)
90+
91+
```yaml
92+
---
93+
env:
94+
- name: OTEL_SERVICE_NAME
95+
value: llamastack
96+
- name: OTEL_TRACE_ENDPOINT
97+
value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/traces
98+
#- name: OTEL_METRIC_ENDPOINT
99+
#- value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/metrics
100+
---
101+
```
102+
103+
The otel-endpoint is `http://service-name-otc.namespace-of-otc.svc.cluster.local:4318/v1/traces,metrics` if exporting to
104+
central otel-collector. If using otel-collector sidecar, this would be `http://localhost:4318/v1/traces,metrics`.
105+
106+
Now that vLLM and Llamastack are configured to generate and export telemetry, follow the [observability-hub guide](./README.md) to view and analyze
107+
the data.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Use the vllm-openai image as the base
2+
FROM docker.io/vllm/vllm-openai:v0.7.3
3+
4+
# Install OpenTelemetry packages
5+
RUN pip install \
6+
"opentelemetry-sdk>=1.26.0,<1.27.0" \
7+
"opentelemetry-api>=1.26.0,<1.27.0" \
8+
"opentelemetry-exporter-otlp>=1.26.0,<1.27.0" \
9+
"opentelemetry-semantic-conventions-ai>=0.4.1,<0.5.0"

0 commit comments

Comments
 (0)