-
Notifications
You must be signed in to change notification settings - Fork 548
Add metrics endpoint #1423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Add metrics endpoint #1423
Conversation
感觉gpu和cpu是不是不用放里面,可以自己起一个nvidia的端口就能获得metrics,如 docker run -d --gpus all --rm -p 9400:9400 nvcr.io/nvidia/k8s/dcgm-exporter:3.1.7-3.1.4-ubuntu20.04 |
能不能添加tokens相关性能指标? |
Can we add first token time as well, so the difference between scheduling time and first token time can be used to estimate prefill time? |
Conflicts: lmdeploy/serve/async_engine.py
documentation='Number of total requests.', | ||
labelnames=labelnames) | ||
|
||
# latency metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latency类的指标可以考虑用Histogram/Summary类型的metric,方便计算分布以及不同时段的平均值,也可以简化计算逻辑
lmdeploy/serve/metrics.py
Outdated
|
||
# latency metrics | ||
self.gauge_duration_queue = Gauge( | ||
name='lmdeploy:duration_queue', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对于有单位的指标,可以在metrics_name中声明单位,如duration_queue_seconds
,写PromQL的时候会比较直观
documentation='CPU memory used bytes.', | ||
labelnames=labelnames) | ||
|
||
# requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request数量类指标建议使用Counter类型,单调递增
handle = pynvml.nvmlDeviceGetHandleByIndex(int(i)) | ||
mem_info = pynvml.nvmlDeviceGetMemoryInfo(handle) | ||
utilization = pynvml.nvmlDeviceGetUtilizationRates(handle) | ||
self.gpu_memory_used_bytes[str(i)] = str(mem_info.used) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpu的index可以用label来指定,用info的话指标变成str了,不好在PromQL中做计算
@uzuku |
Thanks for the great work. The metrics is import in production observation. Can we expect this feature to be merged in the next release? @AllentDan |
If our metrics can be compatible with vllm, it will greatly facilitate the comparison of deployment performance between lmdeploy and vllm. 如果我们的 metrics 可以和 vllm 的兼容,将为 lmdeploy 和 vllm 之间的部署性能比较,带来很大的方便。 |
好。我们尽量对齐 |
该pr什么时候能合并啊 |
Conflicts: lmdeploy/cli/serve.py lmdeploy/serve/async_engine.py lmdeploy/serve/openai/api_server.py
开启 --metrics 会降低小模型的吞吐表现,模型越小越显著。 |
May update the api_server.md guide, describing how to enable and retrieve the metrics |
May resolve the conflict |
Conflicts: lmdeploy/serve/async_engine.py
# Figure out a graceful way to handle the invalid input | ||
prompt_input = dict(input_ids=input_ids) | ||
|
||
async def get_inputs_genconfig(gen_config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to wrap it into a function?
Open http://xxxx:23333/metrics/ to view the metrics.