Skip to content

Commit c2e3a79

Browse files
committed
update to add vLLM tracing guide
Signed-off-by: sallyom <somalley@redhat.com>
1 parent 352fcab commit c2e3a79

3 files changed

Lines changed: 121 additions & 56 deletions

File tree

kubernetes/observability/README.md

Lines changed: 11 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,8 @@
11
# Monitor Llamastack & vLLM in OpenShift
22

33
Follow this README to configure an observability stack in OpenShift to visualize Llamastack telemetry and vLLM metrics.
4+
First, ensure Llamastack and vLLM are configured to generate telemetry by following this [configuration guide](./run-configuration.md)
45

5-
## Generate telemetry from Llamastack and vLLM
6-
7-
### vLLM
8-
9-
For vLLM, metrics are generated by default and are exposed at `vllm-endpoint:port/metrics`. For a list of metrics,
10-
you can `curl localhost:8000/metrics` from within a vLLM container.
11-
12-
### Llamastack
13-
14-
With Llamastack, you need to specify in the run-config.yaml to enable telemetry collection with an opentelemetry receiver.
15-
Here's how to do that:
16-
17-
#### Updated manifests for telemetry trace collection with opentelemetry receiver endpoint
18-
19-
This is for traces only. There is a similar `otel_metric` sink and `otel_metric_endpoint`, however, there are currently
20-
only 4 metrics generated within Llamastack, and these are duplicates of what vLLM provides.
21-
22-
[kubernetes/llama-stack/configmap.yaml](../llama-stack/configmap.yaml)
23-
24-
```yaml
25-
---
26-
telemetry:
27-
- provider_id: meta-reference
28-
provider_type: inline::meta-reference
29-
config:
30-
service_name: ${env.OTEL_SERVICE_NAME:llama-stack}
31-
sinks: ${env.TELEMETRY_SINKS:console, otel_trace, sqlite} <-add otel_trace and/or otel_metric
32-
otel_trace_endpoint: ${env.OTEL_TRACE_ENDPOINT:} <-add ONLY if opentelemetry receiver endpoint is available.
33-
---
34-
```
35-
And, in [kubernetes/llama-stack/deployment.yaml](../llama-stack/deployment.yaml)
36-
37-
```yaml
38-
---
39-
env:
40-
- name: OTEL_SERVICE_NAME
41-
value: llamastack
42-
- name: OTEL_TRACE_ENDPOINT
43-
value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/traces
44-
#- name: OTEL_METRIC_ENDPOINT
45-
#- value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/metrics
46-
---
47-
```
48-
49-
The otel-endpoint is `http://service-name-otc.namespace-of-otc.svc.cluster.local:4318/v1/traces,metrics` if exporting to
50-
central otel-collector. If using otel-collector sidecar, this would be `http://localhost:4318/v1/traces,metrics`.
516

527
## OpenShift Observability Operators
538

@@ -134,6 +89,16 @@ oc patch deployment <deployment-name> \
13489
-p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.opentelemetry.io/inject":"vllm-otelsidecar"}}}}}'
13590
```
13691

92+
### Cluster Observability Operator Tracing UIPlugin
93+
94+
The Jaeger frontend feature of TempoStack is no longer supported by Red Hat. This has been replaced by the COO UIPlugin. To create the UIPlugin for
95+
Tracing, first ensure the TempoStack described above is created. This is a prerequisite. Then, all that's necessary to view traces from
96+
the OpenShift console at `Observe -> Traces` is to create the following [Tracing UIPlugin resource](./tracing-ui-plugin.yaml).
97+
98+
```bash
99+
oc apply ./tracing-ui-plugin.yaml
100+
```
101+
137102
### Grafana
138103

139104
This will deploy a Grafana instance, and Prometheus & Tempo DataSources
@@ -157,13 +122,3 @@ The dashboard is slightly modified from https://github.com/kevchu3/openshift4-gr
157122
```bash
158123
oc apply -n observability-hub -f cluster-metrics-dashboard/cluster-metrics.yaml
159124
```
160-
161-
### Cluster Observability Operator Tracing UIPlugin
162-
163-
The Jaeger frontend feature of TempoStack is no longer supported by Red Hat. This has been replaced by the COO UIPlugin. To create the UIPlugin for
164-
Tracing, first ensure the TempoStack described above is created. This is a prerequisite. Then, all that's necessary to view traces from
165-
the OpenShift console at `Observe -> Traces` is to create the following [Tracing UIPlugin resource](./tracing-ui-plugin.yaml).
166-
167-
```bash
168-
oc apply ./tracing-ui-plugin.yaml
169-
```
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
## Generate telemetry from Llamastack and vLLM
2+
3+
### vLLM
4+
5+
#### metrics
6+
7+
For vLLM, metrics are generated by default and are exposed at `vllm-endpoint:port/metrics`. For a list of metrics,
8+
you can `curl localhost:8000/metrics` from within a vLLM container.
9+
10+
#### traces
11+
12+
It's possible to generate vLLM distributed trace data by updating the vLLM image and start command. This [Containerfile](./vllm-Containerfile)
13+
shows the necessary packages to generate vLLM traces.
14+
15+
Here is how you would build vLLM with the tracing packages:
16+
17+
```bash
18+
podman build --platform x86_64 -t quay.io/[your-quay-username]/vllm:otlp-tracing -f vllm-Containerfile .
19+
podman push quay.io/[your-quay-username]/vllm:otlp-tracing
20+
```
21+
22+
Then, add the following updates to the vLLM deployment.yaml. We'll use the [granite-8b deployment](../llama-serve/granite-8b/vllm.yaml):
23+
This example assumes there is an OpenTelemetryCollector with sidecar mode in the same namespace.
24+
See [OpenTelemetryCollector Sidecars Deployment](./README.md#opentelemetrycollector_sidecars_deployment)
25+
26+
27+
```yaml
28+
---
29+
template:
30+
metadata:
31+
labels:
32+
app: granite-8b
33+
annotations:
34+
sidecar.opentelemetry.io/inject: vllm-otelsidecar
35+
spec:
36+
containers:
37+
- args:
38+
- --model
39+
- ibm-granite/granite-3.2-8b-instruct
40+
- --max-model-len
41+
- "128000"
42+
- --enable-auto-tool-choice
43+
- --chat-template
44+
- /app/tool_chat_template_granite.jinja
45+
- --tool-call-parser=granite
46+
- --otlp-traces-endpoint="grpc://localhost:4317"
47+
- --port
48+
- "8000"
49+
env:
50+
- name: OTEL_SERVICE_NAME
51+
value: "vllm-granite8b"
52+
- name: OTEL_EXPORTER_OTLP_TRACES_INSECURE
53+
value: true
54+
---
55+
```
56+
57+
With the updated vLLM image and the updated deployment, distributed trace data will be generated and collected by the opentelemetry-collector
58+
sidecar container and exported to the central observability-hub as outlined in the [README.md](./README.md) with a `TempoStack` as a tracing backend.
59+
60+
### Llamastack
61+
62+
With Llamastack, you need to specify in the run-config.yaml to enable telemetry collection with an opentelemetry receiver.
63+
Here's how to do that:
64+
65+
#### Updated manifests for telemetry trace collection with opentelemetry receiver endpoint
66+
67+
This is for traces only. There is a similar `otel_metric` sink and `otel_metric_endpoint`, however, there are currently
68+
only 4 metrics generated within Llamastack, and these are duplicates of what vLLM provides.
69+
70+
[kubernetes/llama-stack/configmap.yaml](../llama-stack/configmap.yaml)
71+
72+
```yaml
73+
---
74+
telemetry:
75+
- provider_id: meta-reference
76+
provider_type: inline::meta-reference
77+
config:
78+
service_name: ${env.OTEL_SERVICE_NAME:llama-stack}
79+
sinks: ${env.TELEMETRY_SINKS:console, otel_trace, sqlite} <-add otel_trace and/or otel_metric
80+
otel_trace_endpoint: ${env.OTEL_TRACE_ENDPOINT:} <-add ONLY if opentelemetry receiver endpoint is available.
81+
---
82+
```
83+
And, in [kubernetes/llama-stack/deployment.yaml](../llama-stack/deployment.yaml)
84+
85+
```yaml
86+
---
87+
env:
88+
- name: OTEL_SERVICE_NAME
89+
value: llamastack
90+
- name: OTEL_TRACE_ENDPOINT
91+
value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/traces
92+
#- name: OTEL_METRIC_ENDPOINT
93+
#- value: http://otel-collector-collector.observability-hub.svc.cluster.local:4318/v1/metrics
94+
---
95+
```
96+
97+
The otel-endpoint is `http://service-name-otc.namespace-of-otc.svc.cluster.local:4318/v1/traces,metrics` if exporting to
98+
central otel-collector. If using otel-collector sidecar, this would be `http://localhost:4318/v1/traces,metrics`.
99+
100+
Now that vLLM and Llamastack are configured to generate and export telemetry, follow the [observability-hub guide](./README.md) to view and analyze
101+
the data.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Use the vllm-openai image as the base
2+
FROM docker.io/vllm/vllm-openai:latest
3+
4+
# Install OpenTelemetry packages
5+
RUN pip install \
6+
"opentelemetry-sdk>=1.26.0,<1.27.0" \
7+
"opentelemetry-api>=1.26.0,<1.27.0" \
8+
"opentelemetry-exporter-otlp>=1.26.0,<1.27.0" \
9+
"opentelemetry-semantic-conventions-ai>=0.4.1,<0.5.0"

0 commit comments

Comments
 (0)