Skip to content

Commit 0a1673e

Browse files
committed
feat: Add metrics support to fs backend
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
1 parent 610bbd5 commit 0a1673e

File tree

14 files changed

+835
-32
lines changed

14 files changed

+835
-32
lines changed

kv_connectors/llmd_fs_backend/README.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -68,11 +68,29 @@ pip install -e .
6868
- `max_staging_memory_gb`: total staging memory limit
6969

7070
### Environment variables
71-
- `STORAGE_LOG_LEVEL`: set the C++ storage log level (`trace`, `debug`, `info`, `warn`, `error`). Default: `info`
71+
- `STORAGE_LOG_LEVEL`: set the log level for both C++ and Python (`trace`, `debug`, `info`, `warn`, `error`). Default: `info`
7272
- `STORAGE_CONNECTOR_DEBUG`: legacy flag — setting to `1` enables debug-level logging (equivalent to `STORAGE_LOG_LEVEL=debug`)
7373
- `USE_KERNEL_COPY_WRITE` : enable GPU-kernel-based writes using GPU SMs (default 0 - uses DMA copy).
7474
- `USE_KERNEL_COPY_READ`: enable GPU-kernel-based reads using GPU SMs (default 0 - uses DMA copy).
7575

76+
## Metrics
77+
78+
The fs backend populates vLLM's built-in offloading metrics. When Prometheus metrics are enabled in vLLM, the following metrics are automatically exported:
79+
80+
| Metric | Type | Description |
81+
|--------|------|-------------|
82+
| `vllm:kv_offload_total_bytes` | Counter | Total bytes transferred, labeled by `transfer_type` |
83+
| `vllm:kv_offload_total_time` | Counter | Total time spent on transfers (seconds), labeled by `transfer_type` |
84+
| `vllm:kv_offload_size` | Histogram | Distribution of transfer sizes in bytes, labeled by `transfer_type` |
85+
86+
The `transfer_type` label distinguishes transfer directions:
87+
- `GPU_to_SHARED_STORAGE` — GPU to storage (PUT)
88+
- `SHARED_STORAGE_to_GPU` — storage to GPU (GET)
89+
90+
These metrics are also available through vLLM's internal StatLogger.
91+
92+
For a complete monitoring setup (Prometheus, Grafana, port-forwarding, and benchmarking), see the [Monitoring Guide](./docs/monitoring.md).
93+
7694
## Example vLLM YAML
7795

7896
To load the fs backend:
@@ -126,10 +144,6 @@ Then apply the full vLLM deployment (including the offloading connector with a f
126144
kubectl apply -f ./docs/deployment/vllm-storage.yaml
127145
```
128146

129-
## Storage Cleanup
130-
131-
TBD
132-
133147
## Troubleshooting
134148

135149
### Missing `numa.h`

kv_connectors/llmd_fs_backend/csrc/storage/logger.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,6 @@ inline bool get_env_flag(const char* name, bool default_val) {
131131
__VA_OPT__(__fs_time_oss << " | "; [&]<typename... Args>(Args&&... args) { \
132132
((__fs_time_oss << args), ...); \
133133
}(__VA_ARGS__);) \
134-
FS_LOG_DEBUG(__fs_time_oss.str()); \
134+
FS_LOG_TRACE(__fs_time_oss.str()); \
135135
return __ret; \
136136
})()

kv_connectors/llmd_fs_backend/csrc/storage/thread_pool.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ auto ThreadPool::enqueue(F&& f, TaskPriority priority)
120120
(priority == TaskPriority::kHigh) ? m_high_tasks : m_normal_tasks;
121121
target_queue.emplace([task]() { (*task)(); });
122122

123-
FS_LOG_DEBUG("Enqueued task with priority "
123+
FS_LOG_TRACE("Enqueued task with priority "
124124
<< (priority == TaskPriority::kHigh ? "HIGH" : "NORMAL")
125125
<< " | high_queue=" << m_high_tasks.size()
126126
<< " normal_queue=" << m_normal_tasks.size());

kv_connectors/llmd_fs_backend/docs/deployment/monitoring/grafana-dashboard-configmap.yaml

Lines changed: 372 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
apiVersion: v1
2+
kind: ConfigMap
3+
metadata:
4+
name: grafana-datasources
5+
data:
6+
datasources.yaml: |
7+
apiVersion: 1
8+
datasources:
9+
- name: Prometheus
10+
type: prometheus
11+
access: proxy
12+
url: http://prometheus-svc:9090
13+
isDefault: true
14+
editable: true
15+
---
16+
apiVersion: v1
17+
kind: ConfigMap
18+
metadata:
19+
name: grafana-dashboard-provider
20+
data:
21+
dashboards.yaml: |
22+
apiVersion: 1
23+
providers:
24+
- name: default
25+
folder: ""
26+
type: file
27+
options:
28+
path: /var/lib/grafana/dashboards
29+
---
30+
apiVersion: apps/v1
31+
kind: Deployment
32+
metadata:
33+
name: grafana
34+
labels:
35+
app: grafana
36+
spec:
37+
replicas: 1
38+
selector:
39+
matchLabels:
40+
app: grafana
41+
template:
42+
metadata:
43+
labels:
44+
app: grafana
45+
spec:
46+
containers:
47+
- name: grafana
48+
image: grafana/grafana:11.2.0
49+
env:
50+
- name: GF_AUTH_ANONYMOUS_ENABLED
51+
value: "true"
52+
- name: GF_AUTH_ANONYMOUS_ORG_ROLE
53+
value: "Admin"
54+
- name: GF_SERVER_ROOT_URL
55+
value: "http://localhost:3000"
56+
- name: GF_SERVER_SERVE_FROM_SUB_PATH
57+
value: "false"
58+
- name: GF_SECURITY_ADMIN_PASSWORD
59+
value: "admin"
60+
ports:
61+
- containerPort: 3000
62+
volumeMounts:
63+
- name: datasources
64+
mountPath: /etc/grafana/provisioning/datasources
65+
- name: dashboard-provider
66+
mountPath: /etc/grafana/provisioning/dashboards
67+
- name: dashboards
68+
mountPath: /var/lib/grafana/dashboards
69+
volumes:
70+
- name: datasources
71+
configMap:
72+
name: grafana-datasources
73+
- name: dashboard-provider
74+
configMap:
75+
name: grafana-dashboard-provider
76+
- name: dashboards
77+
configMap:
78+
name: vllm-kv-offload-dashboard
79+
---
80+
apiVersion: v1
81+
kind: Service
82+
metadata:
83+
name: grafana-svc
84+
labels:
85+
app: grafana
86+
spec:
87+
type: ClusterIP
88+
ports:
89+
- port: 3000
90+
targetPort: 3000
91+
selector:
92+
app: grafana
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# ServiceMonitor for Prometheus Operator to scrape vLLM metrics.
2+
# Requires: prometheus-operator CRDs installed in the cluster.
3+
#
4+
# If not using the Prometheus Operator, add a scrape config to your
5+
# prometheus.yml instead:
6+
#
7+
# scrape_configs:
8+
# - job_name: vllm
9+
# kubernetes_sd_configs:
10+
# - role: endpoints
11+
# relabel_configs:
12+
# - source_labels: [__meta_kubernetes_service_label_app]
13+
# action: keep
14+
# regex: vllm-storage
15+
apiVersion: monitoring.coreos.com/v1
16+
kind: ServiceMonitor
17+
metadata:
18+
name: vllm-storage-monitor
19+
labels:
20+
app: vllm-storage
21+
spec:
22+
selector:
23+
matchLabels:
24+
app: vllm-storage
25+
endpoints:
26+
- port: default
27+
path: /metrics
28+
interval: 15s
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
apiVersion: v1
2+
kind: ConfigMap
3+
metadata:
4+
name: prometheus-config
5+
data:
6+
prometheus.yml: |
7+
global:
8+
scrape_interval: 15s
9+
scrape_configs:
10+
- job_name: vllm
11+
static_configs:
12+
- targets: ["vllm-storage-svc:8000"]
13+
---
14+
apiVersion: apps/v1
15+
kind: Deployment
16+
metadata:
17+
name: prometheus
18+
labels:
19+
app: prometheus
20+
spec:
21+
replicas: 1
22+
selector:
23+
matchLabels:
24+
app: prometheus
25+
template:
26+
metadata:
27+
labels:
28+
app: prometheus
29+
spec:
30+
containers:
31+
- name: prometheus
32+
image: prom/prometheus:v2.53.0
33+
args:
34+
- "--config.file=/etc/prometheus/prometheus.yml"
35+
- "--storage.tsdb.retention.time=7d"
36+
ports:
37+
- containerPort: 9090
38+
volumeMounts:
39+
- name: config
40+
mountPath: /etc/prometheus
41+
- name: data
42+
mountPath: /prometheus
43+
volumes:
44+
- name: config
45+
configMap:
46+
name: prometheus-config
47+
- name: data
48+
emptyDir: {}
49+
---
50+
apiVersion: v1
51+
kind: Service
52+
metadata:
53+
name: prometheus-svc
54+
labels:
55+
app: prometheus
56+
spec:
57+
type: ClusterIP
58+
ports:
59+
- port: 9090
60+
targetPort: 9090
61+
selector:
62+
app: prometheus

kv_connectors/llmd_fs_backend/docs/deployment/vllm-pvc.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@ spec:
88
resources:
99
requests:
1010
storage: 300Gi
11-
storageClassName: ocs-storagecluster-cephfs
1211
---
1312
apiVersion: v1
1413
kind: PersistentVolumeClaim
@@ -20,4 +19,3 @@ spec:
2019
resources:
2120
requests:
2221
storage: 2000Gi
23-
storageClassName: ocs-storagecluster-cephfs

kv_connectors/llmd_fs_backend/docs/deployment/vllm-storage.yaml

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,14 +34,14 @@ spec:
3434
fsGroup: 1001060000
3535
containers:
3636
- name: vllm
37-
image: vllm/vllm-openai:v0.16.0
37+
image: vllm/vllm-openai:v0.18.0
3838
imagePullPolicy: IfNotPresent
3939
command: ["/bin/sh", "-c"]
4040
args:
4141
- |
42-
pip install https://raw.githubusercontent.com/llm-d/llm-d-kv-cache/main/kv_connectors/llmd_fs_backend/wheels/llmd_fs_connector-0.16.0-cp312-cp312-linux_x86_64.whl
43-
vllm serve meta-llama/Meta-Llama-3.1-70B \
44-
--tensor-parallel-size 4 \
42+
pip install https://raw.githubusercontent.com/llm-d/llm-d-kv-cache/main/kv_connectors/llmd_fs_backend/wheels/llmd_fs_connector-0.18.0-cp312-cp312-linux_x86_64.whl
43+
vllm serve Qwen/Qwen3-32B \
44+
--tensor-parallel-size 2 \
4545
--trust-remote-code \
4646
--enable-chunked-prefill \
4747
--gpu-memory-utilization 0.85 \
@@ -66,6 +66,8 @@ spec:
6666
value: /mnt/pvc/hf
6767
- name: VLLM_LOGGING_LEVEL
6868
value: "INFO"
69+
- name: STORAGE_LOG_LEVEL
70+
value: "INFO"
6971
ports:
7072
- containerPort: 8000
7173
resources:

0 commit comments

Comments
 (0)