Skip to content

feat: Add metrics support to fs backend#460

Merged
kfirtoledo merged 1 commit intollm-d:mainfrom
kfirtoledo:metrics
Mar 26, 2026
Merged

feat: Add metrics support to fs backend#460
kfirtoledo merged 1 commit intollm-d:mainfrom
kfirtoledo:metrics

Conversation

@kfirtoledo
Copy link
Copy Markdown
Collaborator

Summary

Connect offloading connector metrics to fs_backend. Add monitoring.md with instructions for Grafana and Prometheus how to deploy.

Changes

  • __init__.py: Update Python logger to use STORAGE_LOG_LEVEL for debugging
  • worker.py: Populate TransferResult fields (size, time, type) for metrics export. Add per-request trace logging for transfer operations.
  • C++ (logger.hpp, thread_pool.hpp): Move per-request debug prints to trace level to reduce log noise
  • Monitoring:
    • Prometheus + Grafana stack with pre-configured KV offload dashboard
    • 9 panels with per-request/aggregate throughput, transfer counts, averages
    • docs/monitoring.md: Full deployment guide and KV cache offload benchmark

Metrics

Populate vLLM's built-in offloading metrics via TransferResult fields (size, time, type):

Metric Type Description
vllm:kv_offload_total_bytes Counter Total bytes transferred, labeled by transfer_type
vllm:kv_offload_total_time Counter Total time spent on transfers (seconds), labeled by transfer_type
vllm:kv_offload_size Histogram Distribution of transfer sizes in bytes, labeled by transfer_type

Grafana Dashboard

  • Per-request throughput: increase(bytes) / increase(time)
  • Aggregate throughput: rate(bytes) (30s default interval)
  • Transfer count: increase(size_count)

orozery
orozery previously approved these changes Mar 25, 2026
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
@kfirtoledo kfirtoledo merged commit e118d07 into llm-d:main Mar 26, 2026
10 checks passed
@kfirtoledo kfirtoledo deleted the metrics branch March 26, 2026 08:21
@kfirtoledo kfirtoledo mentioned this pull request Mar 26, 2026
43 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants