Skip to content

[Feature] metrics support #3534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 93 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
f8b4000
metrics support prototype
CUHKSZzxy May 9, 2025
3e4fca9
Merge branch 'main' into metrics-support
CUHKSZzxy May 9, 2025
02c46ec
Merge branch 'main' into metrics-support
CUHKSZzxy May 12, 2025
9ae6a1b
fix wrong conflict resolve
CUHKSZzxy May 12, 2025
7904d3a
add GPU KV cache usage
CUHKSZzxy May 12, 2025
4a339c8
independent logger for each DP
CUHKSZzxy May 12, 2025
8c3ede1
fix gpu cache usage
CUHKSZzxy May 13, 2025
ddeec2e
Merge branch 'main' into metrics-support
CUHKSZzxy May 13, 2025
9229aa1
rename log stats
CUHKSZzxy May 13, 2025
862a708
fix
CUHKSZzxy May 13, 2025
74dc69a
update perf_counter and comments, some bug fix
CUHKSZzxy May 15, 2025
19d81d4
Merge branch 'main' into metrics-support
CUHKSZzxy May 15, 2025
b87f099
overwrite with main branch
CUHKSZzxy May 22, 2025
d9f8e5a
Merge branch 'main' into metrics-support
CUHKSZzxy May 22, 2025
0168eed
refactor
CUHKSZzxy May 22, 2025
d774cc3
cleanup
CUHKSZzxy May 22, 2025
08200e1
fix
CUHKSZzxy May 22, 2025
a4d0ac9
add runtime cuda prometheus_client
CUHKSZzxy May 22, 2025
150d562
fix
CUHKSZzxy May 23, 2025
1f80a8e
cleanup
CUHKSZzxy May 23, 2025
aed3eea
async log
CUHKSZzxy May 23, 2025
0931746
fix gen throughput calculation
CUHKSZzxy May 26, 2025
57f3f91
update max_model_len
CUHKSZzxy May 26, 2025
4bdf89f
Merge branch 'main' into metrics-support
CUHKSZzxy May 26, 2025
83b7c60
fix running/waiting reqs calculations
CUHKSZzxy May 26, 2025
67366b1
Merge branch 'main' into metrics-support
CUHKSZzxy May 26, 2025
9729f0d
fix pr test
CUHKSZzxy May 27, 2025
9c194ac
fix
CUHKSZzxy May 27, 2025
97ccdf3
fix pr test
CUHKSZzxy May 27, 2025
72d4274
update log level
CUHKSZzxy May 27, 2025
382c500
fix
CUHKSZzxy May 27, 2025
e224bc6
Merge branch 'main' into metrics-support
CUHKSZzxy May 29, 2025
0df0473
update
CUHKSZzxy May 29, 2025
47a07b6
add grafana support
CUHKSZzxy May 30, 2025
c354a7d
fix
CUHKSZzxy May 30, 2025
4bc27e0
update
CUHKSZzxy May 30, 2025
a132cc6
update
CUHKSZzxy May 30, 2025
2c1588d
simplify some logics
CUHKSZzxy May 30, 2025
22a1dc6
Merge branch 'main' into metrics-support
CUHKSZzxy Jun 3, 2025
a59d9c3
fix lint
CUHKSZzxy Jun 3, 2025
d5f1bfe
fix lint
CUHKSZzxy Jun 3, 2025
8738bb2
refactor
CUHKSZzxy Jun 4, 2025
7bbb544
fix module init
CUHKSZzxy Jun 4, 2025
c4f0799
Merge branch 'main' into metrics-support
CUHKSZzxy Jun 4, 2025
f13dae1
fix
CUHKSZzxy Jun 4, 2025
5c92c24
reuse status logger
CUHKSZzxy Jun 4, 2025
7974365
cleanup
CUHKSZzxy Jun 4, 2025
0f66854
rename
CUHKSZzxy Jun 4, 2025
1d19ccc
add docs
CUHKSZzxy Jun 5, 2025
91319c3
update docs
CUHKSZzxy Jun 5, 2025
c976d3d
update docs
CUHKSZzxy Jun 5, 2025
aab614a
update docs
CUHKSZzxy Jun 5, 2025
eb8971b
fix typo
CUHKSZzxy Jun 5, 2025
9e20aa7
decouple prometheus_client
CUHKSZzxy Jun 5, 2025
ec31b12
update docs
CUHKSZzxy Jun 5, 2025
8cee584
change log interval
CUHKSZzxy Jun 5, 2025
3099c8f
mp router
grimoire Jun 8, 2025
2ecadfa
Merge branch 'main' into metrics-support
CUHKSZzxy Jun 9, 2025
b0f2087
minor fix
CUHKSZzxy Jun 9, 2025
1bdacbf
optimize
grimoire Jun 9, 2025
4691d74
better streaming
grimoire Jun 9, 2025
b8b5b3b
optimize streaming
grimoire Jun 9, 2025
a7476c3
Merge branch 'main' into mp-engine
grimoire Jun 9, 2025
362240a
close engine
grimoire Jun 9, 2025
899da60
safe exit
grimoire Jun 9, 2025
d49146a
support pd
grimoire Jun 10, 2025
a1e92a1
merge main
grimoire Jun 12, 2025
0be4fc8
fix loader
grimoire Jun 12, 2025
5f2939e
optimize
grimoire Jun 12, 2025
f428506
Merge branch 'main' into mp-engine
grimoire Jun 12, 2025
3285cc6
safe exit
grimoire Jun 12, 2025
3ff8d28
safe exit
grimoire Jun 12, 2025
8f32f52
refactor
CUHKSZzxy Jun 17, 2025
4838003
Merge branch 'main' into metrics-support
CUHKSZzxy Jun 17, 2025
b92b7cc
clean
CUHKSZzxy Jun 17, 2025
63865bc
fix
CUHKSZzxy Jun 18, 2025
81f6653
optimize
CUHKSZzxy Jun 18, 2025
309880f
optimize
CUHKSZzxy Jun 18, 2025
252d7ed
rename
CUHKSZzxy Jun 18, 2025
e689f3b
remove unused metrics
CUHKSZzxy Jun 19, 2025
e241c30
inplace update
CUHKSZzxy Jun 19, 2025
a678a94
clean
CUHKSZzxy Jun 19, 2025
535fa98
async update
CUHKSZzxy Jun 20, 2025
95fd4a5
update
CUHKSZzxy Jun 20, 2025
4470eb3
Merge branch 'main' into metrics-support
CUHKSZzxy Jun 20, 2025
da61d89
Merge branch 'pr-3627' into metrics-support
CUHKSZzxy Jun 23, 2025
892d5f0
optimize
CUHKSZzxy Jun 25, 2025
002e7cf
Merge branch 'main' into metrics-support
CUHKSZzxy Jun 30, 2025
c121921
fix merge
CUHKSZzxy Jun 30, 2025
5daae5f
refactor for MP engine
CUHKSZzxy Jul 2, 2025
ab8b57a
optimize
CUHKSZzxy Jul 2, 2025
b660df6
Merge branch 'main' into metrics-support
CUHKSZzxy Jul 2, 2025
b7b86e1
fix prometheus, grafana
CUHKSZzxy Jul 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/scripts/check_lmdeploy.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ def check_module_init(root: str):
continue
elif d.startswith('lmdeploy/lib'):
continue
elif d.startswith('lmdeploy/monitoring'):
continue
elif d.startswith('lmdeploy/serve/turbomind/triton_models'):
continue
elif d.startswith('lmdeploy/serve/turbomind/triton_python_backend'):
Expand Down
123 changes: 123 additions & 0 deletions docs/en/advance/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Production Metrics

LMDeploy exposes a set of metrics via Prometheus, and provides visualization via Grafana.

## Setup Guide

This section describes how to set up the monitoring stack (Prometheus + Grafana) provided in the `lmdeploy/monitoring` directory.

## Prerequisites

- [Docker](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/) installed

- LMDeploy server running with metrics system enabled

## Usage

1. **Start your LMDeploy server with metrics enabled**

```
lmdeploy serve api_server Qwen/Qwen2.5-7B-Instruct --enable-metrics
```

Replace the model path according to your needs.
By default, the metrics endpoint will be available at `http://<lmdeploy_server_host>:23333/metrics`.

2. **Navigate to the monitoring directory**

```
cd lmdeploy/monitoring
```

3. **Start the monitoring stack**

```
docker compose up
```

This command will start Prometheus and Grafana in the background.

4. **Access the monitoring interfaces**

- Prometheus: Open your web browser and go to http://localhost:9090.

- Grafana: Open your web browser and go to http://localhost:3000.

5. **Log in to Grafana**

- Default Username: `admin`

- Default Password: `admin` You will be prompted to change the password upon your first login.

6. **View the Dashboard**

The LMDeploy dashboard is pre-configured and should be available automatically.

## Troubleshooting

1. **Port conflicts**

Check if any services are occupying ports `23333` (LMDeploy server port), `9090` (Prometheus port), or `3000` (Grafana port). You can either stop the conflicting running ports or modify the config files as follows:

- Modify LMDeploy server port for Prometheus scrape

In `lmdeploy/monitoring/prometheus.yaml`

```
global:
scrape_interval: 5s
evaluation_interval: 30s

scrape_configs:
- job_name: lmdeploy
static_configs:
- targets:
- '127.0.0.1:23333' # <= Modify this LMDeploy server port 23333, need to match the running server port
```

- Modify Prometheus port

In `lmdeploy/monitoring/grafana/datasources/datasource.yaml`

```
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://localhost:9090 # <= Modify this Prometheus interface port 9090
isDefault: true
editable: false
```

- Modify Grafana port:

In `lmdeploy/monitoring/docker-compose.yaml`, for example, change the port to `3090`

Option 1: Add `GF_SERVER_HTTP_PORT` to the environment section.

```
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_SERVER_HTTP_PORT=3090 # <= Add this line
```

Option 2: Use port mapping.

```
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3090:3000" # <= Host:Container port mapping
```

- **No data on the dashboard**

Try to send some requests to the LMDeploy server to create certain traffic

```
python3 benchmark/profile_restful_api.py --backend lmdeploy --num-prompts 5000 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json
```

After refreshing, you should be able to see data on the dashboard.
1 change: 1 addition & 0 deletions docs/en/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ Documentation
advance/structed_output.md
advance/pytorch_multinodes.md
advance/pytorch_profiling.md
advance/pytorch/metrics.md

.. toctree::
:maxdepth: 1
Expand Down
122 changes: 122 additions & 0 deletions docs/zh_cn/advance/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# 生产环境指标监控

LMDeploy 通过 Prometheus 暴露监控指标,并通过 Grafana 提供可视化界面。

## 配置指南

本节介绍如何设置 `lmdeploy/monitoring` 目录中提供的监控套件(Prometheus + Grafana)

## 前提条件

- 已安装 [Docker](https://docs.docker.com/engine/install/) 和 [Docker Compose](https://docs.docker.com/compose/install/)

- 已启用指标系统的 LMDeploy 服务正在运行

## 使用说明

1. **启动已启用指标的 LMDeploy 服务**

```
lmdeploy serve api_server Qwen/Qwen2.5-7B-Instruct --enable-metrics
```

请根据需求替换模型路径。默认 metrics endpoint 位于 `http://<lmdeploy_server_host>:23333/metrics`。

2. **进入监控目录**

```
cd lmdeploy/monitoring
```

3. **启动监控套件**

```
docker compose up
```

此命令将在后台启动 Prometheus 和 Grafana。

4. **访问监控界面**

- Prometheus:浏览器访问 http://localhost:9090.

- Grafana:浏览器访问 http://localhost:3000.

5. **登录 Grafana**

- 默认用户名:`admin`

- 默认密码:`admin` (首次登录后会提示修改密码)

6. **查看仪表盘**

预配置的 LMDeploy 仪表盘将自动加载。

## 故障排除

1. **端口冲突**

检查端口 `23333` (LMDeploy 服务端口)、`9090` (Prometheus 端口) 或 `3000` (Grafana 端口) 是否被占用。解决方案,关闭冲突的端口或如下修改配置文件:

- 修改 Prometheus 抓取的 LMDeploy 服务端口

在 `lmdeploy/monitoring/prometheus.yaml` 中

```
global:
scrape_interval: 5s
evaluation_interval: 30s

scrape_configs:
- job_name: lmdeploy
static_configs:
- targets:
- '127.0.0.1:23333' # <= 修改此处的 LMDeploy 服务端口 23333,需与实际运行端口一致
```

- 修改 Prometheus 端口

在 `lmdeploy/monitoring/grafana/datasources/datasource.yaml` 中

```
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://localhost:9090 # <= 修改此处的 Prometheus 接口端口 9090
isDefault: true
editable: false
```

- 修改 Grafana 端口

在 `lmdeploy/monitoring/docker-compose.yaml` 中操作(例如改为 3090 端口):

方案一:在环境变量中添加 `GF_SERVER_HTTP_PORT`

```
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_SERVER_HTTP_PORT=3090 # <= 添加此行
```

方案二:使用端口映射

```
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3090:3000" # <= 主机端口:容器端口映射
```

- **仪表盘无数据**

尝试向 LMDeploy 服务发送请求生成流量:

```
python3 benchmark/profile_restful_api.py --backend lmdeploy --num-prompts 5000 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json
```

刷新后仪表盘应显示数据。
1 change: 1 addition & 0 deletions docs/zh_cn/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ LMDeploy 工具箱提供以下核心功能:
advance/structed_output.md
advance/pytorch_multinodes.md
advance/pytorch_profiling.md
advance/metrics.md

.. toctree::
:maxdepth: 1
Expand Down
2 changes: 2 additions & 0 deletions lmdeploy/cli/serve.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ def add_parser_api_server():
ArgumentHelper.ep(pt_group)
ArgumentHelper.enable_microbatch(pt_group)
ArgumentHelper.enable_eplb(pt_group)
ArgumentHelper.enable_metrics(pt_group)
ArgumentHelper.role(pt_group)
ArgumentHelper.migration_backend(pt_group)
# multi-node serving args
Expand Down Expand Up @@ -333,6 +334,7 @@ def api_server(args):
max_prefill_token_num=args.max_prefill_token_num,
enable_microbatch=args.enable_microbatch,
enable_eplb=args.enable_eplb,
enable_metrics=args.enable_metrics,
role=EngineRole[args.role],
migration_backend=MigrationBackend[args.migration_backend],
model_format=args.model_format)
Expand Down
5 changes: 5 additions & 0 deletions lmdeploy/cli/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -557,6 +557,11 @@ def enable_eplb(parser):

return parser.add_argument('--enable-eplb', action='store_true', help='enable eplb for specified model')

@staticmethod
def enable_metrics(parser):
"""Add argument enable_metrics to parser."""
parser.add_argument('--enable-metrics', action='store_true', default=False, help='enable metrics system')

# For Disaggregation
@staticmethod
def role(parser):
Expand Down
44 changes: 44 additions & 0 deletions lmdeploy/messages.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Copyright (c) OpenMMLab. All rights reserved.
import enum
import time
from dataclasses import dataclass, field
from typing import Callable, Dict, List, Literal, Optional

Expand Down Expand Up @@ -314,6 +315,7 @@ class PytorchEngineConfig:
it to True if you want to update weights after create the pipeline
enable_microbatch (bool): enable microbatch for specified model
enable_eplb (bool): enable eplb for specified model
enable_metrics (bool): enable metrics system
role (EngineRole): role of engin, options: ['Hybrid', 'Prefill',
'Decode']. Default to `EngineRole.Hybrid`.
migration_backend: migration backend. options: ['DLSlime'].
Expand Down Expand Up @@ -349,6 +351,7 @@ class PytorchEngineConfig:
enable_eplb: bool = False
enable_mp_engine: bool = False
model_format: str = None
enable_metrics: bool = False

role: EngineRole = EngineRole.Hybrid
migration_backend: MigrationBackend = MigrationBackend.DLSlime
Expand Down Expand Up @@ -422,6 +425,45 @@ class Response:
index: int = 0


# copy from https://github.com/vllm-project/vllm/blob/main/vllm/v1/engine/__init__.py
class EngineCoreEventType(enum.IntEnum):
"""The type of engine core request event.

QUEUED - when the request was received by the engine core and added to the scheduler queue
SCHEDULED - when the request was first scheduled for execution
PREEMPTED - the request has been put back in the waiting queue in order to make room for other requests to complete.
It will be re-scheduled in future and re-start its prefill phase
"""
QUEUED = 1
SCHEDULED = 2
PREEMPTED = 3 # FIXME, currently ignored for simplicity


# copy from https://github.com/vllm-project/vllm/blob/main/vllm/v1/engine/__init__.py
@dataclass
class EngineCoreEvent():
"""A timestamped engine core event associated with a request.

The timestamp is a monotonic timestamps and is used for by the engine frontend to calculate intervals between engine
core events. These timestamps should not be compared with timestamps from other processes.
"""
type: EngineCoreEventType
timestamp: float

@classmethod
def new_event(cls, event_type: EngineCoreEventType, timestamp: Optional[float] = None) -> 'EngineCoreEvent':
timestamp = time.perf_counter() if timestamp is None else timestamp
return cls(event_type, timestamp)


@dataclass
class MetricsInfo:
"""Metrics info from the inference engine."""
engine_core_timestamp: float = 0.0
engine_core_events: List[EngineCoreEvent] = field(default_factory=list)
scheduler_raw_info: dict = field(default_factory=dict)


@dataclass
class EngineOutput:
"""Engine output for turbomind/pytorch engine.
Expand All @@ -435,6 +477,7 @@ class EngineOutput:
position.
cache_block_ids (List[int]): send cache blocks back for migration in
Disaggregated LLM Serving when Prefill Engine is Done.
metrics_info (MetricsInfo): metrics info from the inference engine.
"""
status: ResponseType
token_ids: List[int]
Expand All @@ -444,6 +487,7 @@ class EngineOutput:
last_hidden_state: torch.Tensor = None

cache_block_ids: Optional[List[int]] = None
metrics_info: Optional[MetricsInfo] = None


@dataclass
Expand Down
1 change: 1 addition & 0 deletions lmdeploy/metrics/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Copyright (c) OpenMMLab. All rights reserved.
Loading